{ "cells": [ { "cell_type": "markdown", "id": "dfcb3bf7-3b0a-4e2b-b842-3b0d00ee93b8", "metadata": {}, "source": [ "# Quick Start Guide\n", "\n", "This guide demonstrates the three high-performance bulk upsert operations provided by `sqlalchemy_upsert_kit` using real examples with an in-memory SQLite database." ] }, { "cell_type": "markdown", "id": "179fbe05-c383-4da4-a50c-a12cc142a087", "metadata": {}, "source": [ "## Understanding Upsert Operations\n", "\n", "Upsert operations combine \"insert\" and \"update\" logic to handle data synchronization efficiently. The three strategies offered by this library each serve different use cases:\n", "\n", "- `insert_or_ignore`: Inserts only new records, ignores conflicts with existing data\n", "- `insert_or_replace`: Completely replaces existing records with new data\n", "- `insert_or_merge`: Selectively updates specified columns while preserving others\n", "\n", "All operations use temporary table staging and SQL JOIN operations to achieve ~20x performance improvements over traditional row-by-row approaches." ] }, { "cell_type": "markdown", "id": "0f96c0cb-2c10-4a20-9893-9cee4d85a435", "metadata": {}, "source": [ "## Setting Up the Database\n", "\n", "Let's start by creating an in-memory SQLite database and setting up our test table:" ] }, { "cell_type": "code", "execution_count": 1, "id": "d9695176-c592-41f3-abd2-ad2fe332c047", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "āœ… Database and table created successfully\n" ] } ], "source": [ "from datetime import datetime, timezone, timedelta\n", "\n", "import sqlalchemy as sa\n", "from sqlalchemy import MetaData, Table, Column, Integer, String, DateTime\n", "\n", "from sqlalchemy_upsert_kit.sqlite.insert_or_ignore import insert_or_ignore\n", "from sqlalchemy_upsert_kit.sqlite.insert_or_replace import insert_or_replace \n", "from sqlalchemy_upsert_kit.sqlite.insert_or_merge import insert_or_merge\n", "\n", "# Create in-memory SQLite database\n", "engine = sa.create_engine(\"sqlite:///:memory:\")\n", "\n", "# Define our test table structure\n", "metadata = MetaData()\n", "records_table = Table(\n", " 'records',\n", " metadata,\n", " Column('id', Integer, primary_key=True),\n", " Column('desc', String, nullable=True),\n", " Column('create_at', DateTime, nullable=False),\n", " Column('update_at', DateTime, nullable=False)\n", ")\n", "\n", "# Create the table\n", "metadata.create_all(engine)\n", "print(\"āœ… Database and table created successfully\")" ] }, { "cell_type": "markdown", "id": "b3696ef8-5737-4dd0-8122-b96ea7c03adf", "metadata": {}, "source": [ "## Test Data Setup\n", "\n", "We'll use a classic test scenario with:\n", "- **4 existing records** (IDs 1-4) already in the database\n", "- **3 input records** for upsert operations\n", "- **2 conflict records** (IDs 3-4) that exist in both existing and input data\n", "- **1 new record** (ID 5) that doesn't exist in the database" ] }, { "cell_type": "code", "execution_count": 2, "id": "bb420888-1305-45fd-8e29-1aa7bf1132ae", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Existing Records in Database:\n", "+----+------+---------------------+---------------------+\n", "| id | desc | create_at | update_at |\n", "+----+------+---------------------+---------------------+\n", "| 1 | v1 | 2025-07-02 16:25:01 | 2025-07-02 16:25:01 |\n", "| 2 | v1 | 2025-07-02 16:25:01 | 2025-07-02 16:25:01 |\n", "| 3 | v1 | 2025-07-02 16:25:01 | 2025-07-02 16:25:01 |\n", "| 4 | v1 | 2025-07-02 16:25:01 | 2025-07-02 16:25:01 |\n", "+----+------+---------------------+---------------------+\n", "\n", "Input Data for Upsert Operations:\n", "+----+------+---------------------------+---------------------------+\n", "| id | desc | create_at | update_at |\n", "+----+------+---------------------------+---------------------------+\n", "| 3 | v2 | 2025-07-02 16:25:01+00:00 | 2025-07-02 16:26:01+00:00 |\n", "| 4 | v2 | 2025-07-02 16:25:01+00:00 | 2025-07-02 16:26:01+00:00 |\n", "| 5 | v2 | 2025-07-02 16:26:01+00:00 | 2025-07-02 16:26:01+00:00 |\n", "+----+------+---------------------------+---------------------------+\n", "\n", "šŸ“Š Data Summary:\n", " • Existing records: 4 (IDs 1-4)\n", " • Input records: 3 (IDs 3-5)\n", " • Conflict records: 2 (IDs 3-4 exist in both)\n", " • New records: 1 (ID 5 is new)\n" ] } ], "source": [ "# Helper function to display data in table format\n", "import typing as T\n", "from sqlalchemy_upsert_kit.tests.utils import pt_from_many_dict\n", "\n", "def display_records(records: list[dict[str, T.Any]], title: str=\"Records\"):\n", " print(f\"\\n{title}:\")\n", " print(pt_from_many_dict(records))\n", "\n", "# Set up timestamps\n", "create_time = datetime.now(timezone.utc).replace(microsecond=0)\n", "update_time = create_time + timedelta(minutes=1)\n", "\n", "# Prepare existing data (4 records with IDs 1-4)\n", "existing_data = [\n", " {'id': 1, 'desc': 'v1', 'create_at': create_time, 'update_at': create_time},\n", " {'id': 2, 'desc': 'v1', 'create_at': create_time, 'update_at': create_time},\n", " {'id': 3, 'desc': 'v1', 'create_at': create_time, 'update_at': create_time},\n", " {'id': 4, 'desc': 'v1', 'create_at': create_time, 'update_at': create_time}\n", "]\n", "\n", "# Insert existing data\n", "with engine.connect() as conn:\n", " conn.execute(records_table.insert(), existing_data)\n", " conn.commit()\n", "\n", "# Prepare input data (3 records: 2 conflicts + 1 new)\n", "input_data = [\n", " {'id': 3, 'desc': 'v2', 'create_at': create_time, 'update_at': update_time}, # Conflict\n", " {'id': 4, 'desc': 'v2', 'create_at': create_time, 'update_at': update_time}, # Conflict \n", " {'id': 5, 'desc': 'v2', 'create_at': update_time, 'update_at': update_time} # New\n", "]\n", "\n", "# Display initial state\n", "with engine.connect() as conn:\n", " existing_records = conn.execute(sa.select(records_table).order_by(records_table.c.id)).mappings().fetchall()\n", "\n", "display_records([dict(r) for r in existing_records], \"Existing Records in Database\")\n", "display_records(input_data, \"Input Data for Upsert Operations\")\n", "\n", "print(f\"\\nšŸ“Š Data Summary:\")\n", "print(f\" • Existing records: 4 (IDs 1-4)\")\n", "print(f\" • Input records: 3 (IDs 3-5)\") \n", "print(f\" • Conflict records: 2 (IDs 3-4 exist in both)\")\n", "print(f\" • New records: 1 (ID 5 is new)\")" ] }, { "cell_type": "code", "execution_count": 7, "id": "e7cdec2a-0d13-45ad-a7aa-25947339ad9d", "metadata": {}, "outputs": [], "source": [ "def reset_database():\n", " # Reset database to original state for clean merge demonstration\n", " with engine.connect() as conn:\n", " conn.execute(records_table.delete())\n", " conn.execute(records_table.insert(), existing_data)\n", " conn.commit()\n", " \n", " print(\"Database reset to original state for merge demonstration\")" ] }, { "cell_type": "markdown", "id": "29fb0fd4-c349-45a8-8e08-c1b988c78fd0", "metadata": {}, "source": [ "## Operation 1: Insert or Ignore\n", "\n", "**Strategy**: Insert only new records, ignore conflicts with existing data.\n", "\n", "**Expected Behavior**:\n", "- IDs 3-4: Ignored (already exist)\n", "- ID 5: Inserted (new record)\n", "- Final result: 2 rows ignored, 1 row inserted" ] }, { "cell_type": "code", "execution_count": 9, "id": "738f7cf2-3a08-4f60-8565-3040de1ede49", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "============================================================\n", "šŸ” OPERATION 1: INSERT OR IGNORE\n", "============================================================\n", "Database reset to original state for merge demonstration\n", "Result: 2 rows ignored, 1 rows inserted\n", "\n", "Database After INSERT OR IGNORE:\n", "+----+------+---------------------+---------------------+\n", "| id | desc | create_at | update_at |\n", "+----+------+---------------------+---------------------+\n", "| 1 | v1 | 2025-07-02 16:25:01 | 2025-07-02 16:25:01 |\n", "| 2 | v1 | 2025-07-02 16:25:01 | 2025-07-02 16:25:01 |\n", "| 3 | v1 | 2025-07-02 16:25:01 | 2025-07-02 16:25:01 |\n", "| 4 | v1 | 2025-07-02 16:25:01 | 2025-07-02 16:25:01 |\n", "| 5 | v2 | 2025-07-02 16:26:01 | 2025-07-02 16:26:01 |\n", "+----+------+---------------------+---------------------+\n", "\n", "āœ… Verification:\n", " • Records 3-4: Preserved original data (desc='v1', original timestamps)\n", " • Record 5: Successfully inserted with new data\n", " • Total records: 5 (was 4, added 1)\n" ] } ], "source": [ "print(\"\\n\" + \"=\" * 60)\n", "print(\"šŸ” OPERATION 1: INSERT OR IGNORE\")\n", "print(\"=\" * 60)\n", "\n", "# Reset database to original state for clean merge demonstration\n", "reset_database()\n", "\n", "# Perform insert_or_ignore operation\n", "ignored_rows, inserted_rows = insert_or_ignore(\n", " engine=engine,\n", " table=records_table,\n", " values=input_data\n", ")\n", "\n", "print(f\"Result: {ignored_rows} rows ignored, {inserted_rows} rows inserted\")\n", "\n", "# Display final state\n", "with engine.connect() as conn:\n", " final_records = conn.execute(sa.select(records_table).order_by(records_table.c.id)).mappings().fetchall()\n", "\n", "display_records(final_records, \"Database After INSERT OR IGNORE\")\n", "\n", "print(f\"\\nāœ… Verification:\")\n", "print(f\" • Records 3-4: Preserved original data (desc='v1', original timestamps)\")\n", "print(f\" • Record 5: Successfully inserted with new data\")\n", "print(f\" • Total records: {len(final_records)} (was 4, added 1)\")" ] }, { "cell_type": "markdown", "id": "fbd302c0-c731-41ec-b7d0-c1d143d0d42f", "metadata": {}, "source": [ "## Operation 2: Insert or Replace\n", "\n", "**Strategy**: Completely replace existing records with new data, insert new records.\n", "\n", "**Expected Behavior**:\n", "- IDs 3-4: Completely replaced with new data\n", "- ID 5: Already exists from previous operation, will be replaced\n", "- Final result: 3 rows replaced, 0 rows inserted (all input records already exist)\n" ] }, { "cell_type": "code", "execution_count": 10, "id": "3301a155-2b97-4523-b2e4-7af8c20f1b9d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "============================================================\n", "šŸ”„ OPERATION 2: INSERT OR REPLACE\n", "============================================================\n", "Database reset to original state for merge demonstration\n", "Result: 2 rows replaced, 1 rows inserted\n", "\n", "Database After INSERT OR REPLACE:\n", "+----+------+---------------------+---------------------+\n", "| id | desc | create_at | update_at |\n", "+----+------+---------------------+---------------------+\n", "| 1 | v1 | 2025-07-02 16:25:01 | 2025-07-02 16:25:01 |\n", "| 2 | v1 | 2025-07-02 16:25:01 | 2025-07-02 16:25:01 |\n", "| 3 | v2 | 2025-07-02 16:25:01 | 2025-07-02 16:26:01 |\n", "| 4 | v2 | 2025-07-02 16:25:01 | 2025-07-02 16:26:01 |\n", "| 5 | v2 | 2025-07-02 16:26:01 | 2025-07-02 16:26:01 |\n", "+----+------+---------------------+---------------------+\n", "\n", "āœ… Verification:\n", " • Records 1-2: Unchanged (not in input data)\n", " • Records 3-5: Completely replaced with new data (desc='v2', new timestamps)\n", " • Total records: 5 (same count, but data replaced)\n" ] } ], "source": [ "print(\"\\n\" + \"=\" * 60)\n", "print(\"šŸ”„ OPERATION 2: INSERT OR REPLACE\")\n", "print(\"=\" * 60)\n", "\n", "# Reset database to original state for clean merge demonstration\n", "reset_database()\n", "\n", "# Perform insert_or_replace operation\n", "replaced_rows, inserted_rows = insert_or_replace(\n", " engine=engine,\n", " table=records_table,\n", " values=input_data\n", ")\n", "\n", "print(f\"Result: {replaced_rows} rows replaced, {inserted_rows} rows inserted\")\n", "\n", "# Display final state\n", "with engine.connect() as conn:\n", " final_records = conn.execute(sa.select(records_table).order_by(records_table.c.id)).mappings().fetchall()\n", "\n", "display_records(final_records, \"Database After INSERT OR REPLACE\")\n", "\n", "print(f\"\\nāœ… Verification:\")\n", "print(f\" • Records 1-2: Unchanged (not in input data)\")\n", "print(f\" • Records 3-5: Completely replaced with new data (desc='v2', new timestamps)\")\n", "print(f\" • Total records: {len(final_records)} (same count, but data replaced)\")" ] }, { "cell_type": "markdown", "id": "0e038dc7-1d48-44d8-8b98-5104ba2f558f", "metadata": {}, "source": [ "## Operation 3: Insert or Merge\n", "\n", "**Strategy**: Selectively update specified columns while preserving others, insert new records.\n", "\n", "**Expected Behavior**:\n", "- IDs 3-5: Update only `update_at` column, preserve original `desc` values\n", "- Final result: 3 rows updated, 0 rows inserted (all input records already exist)" ] }, { "cell_type": "code", "execution_count": 11, "id": "61d75685-9e4a-4493-95a6-2924c5e8f41a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "============================================================\n", "šŸ”€ OPERATION 3: INSERT OR MERGE\n", "============================================================\n", "Database reset to original state for merge demonstration\n", "Result: 2 rows updated, 1 rows inserted\n", "\n", "Database After INSERT OR MERGE:\n", "+----+------+---------------------+---------------------+\n", "| id | desc | create_at | update_at |\n", "+----+------+---------------------+---------------------+\n", "| 1 | v1 | 2025-07-02 16:25:01 | 2025-07-02 16:25:01 |\n", "| 2 | v1 | 2025-07-02 16:25:01 | 2025-07-02 16:25:01 |\n", "| 3 | v1 | 2025-07-02 16:25:01 | 2025-07-02 16:26:01 |\n", "| 4 | v1 | 2025-07-02 16:25:01 | 2025-07-02 16:26:01 |\n", "| 5 | v2 | 2025-07-02 16:26:01 | 2025-07-02 16:26:01 |\n", "+----+------+---------------------+---------------------+\n", "\n", "āœ… Verification:\n", " • Records 1-2: Unchanged (not in input data)\n", " • Records 3-4: Selective update - desc='v1' preserved, update_at updated\n", " • Record 5: New record inserted with all input data\n", " • Total records: 5 (was 4, added 1)\n" ] } ], "source": [ "print(\"\\n\" + \"=\" * 60) \n", "print(\"šŸ”€ OPERATION 3: INSERT OR MERGE\")\n", "print(\"=\" * 60)\n", "\n", "# Reset database to original state for clean merge demonstration\n", "reset_database()\n", "\n", "# Perform insert_or_merge operation - only update 'update_at' column\n", "updated_rows, inserted_rows = insert_or_merge(\n", " engine=engine,\n", " table=records_table,\n", " values=input_data,\n", " columns=[\"update_at\"] # Only update this column, preserve others\n", ")\n", "\n", "print(f\"Result: {updated_rows} rows updated, {inserted_rows} rows inserted\")\n", "\n", "# Display final state\n", "with engine.connect() as conn:\n", " final_records = conn.execute(sa.select(records_table).order_by(records_table.c.id)).mappings().fetchall()\n", "\n", "display_records(final_records, \"Database After INSERT OR MERGE\")\n", "\n", "print(f\"\\nāœ… Verification:\")\n", "print(f\" • Records 1-2: Unchanged (not in input data)\")\n", "print(f\" • Records 3-4: Selective update - desc='v1' preserved, update_at updated\")\n", "print(f\" • Record 5: New record inserted with all input data\")\n", "print(f\" • Total records: {len(final_records)} (was 4, added 1)\")" ] }, { "cell_type": "markdown", "id": "b0b86e04-50c7-4fc0-a749-dcfa5049f2ac", "metadata": {}, "source": [ "## Comparison Summary\n", "\n", "Here's how each operation handles the same input data:\n", "\n", "| Operation | Conflict Records (3-4) | New Records (5) | Total Result |\n", "|-----------|------------------------|-----------------|--------------|\n", "| **Insert or Ignore** | Ignored (preserved) | Inserted | 2 ignored, 1 inserted |\n", "| **Insert or Replace** | Completely replaced | Inserted | 2 replaced, 1 inserted |\n", "| **Insert or Merge** | Selectively updated | Inserted | 2 updated, 1 inserted |\n", "\n", "*Note: In the replace example, all input records already existed, so 0 new insertions occurred.\n", "\n", "## Transaction Management\n", "\n", "All upsert operations are **atomically executed within a single transaction**, ensuring data consistency and integrity. The library supports two transaction modes:\n", "\n", "### Auto-Managed Transactions (Default)\n", "```python\n", "# Library automatically manages the transaction\n", "result = insert_or_ignore(engine, table, data)\n", "# Transaction is committed automatically on success, rolled back on failure\n", "```\n", "\n", "### User-Managed Transactions\n", "```python\n", "# You control the transaction scope\n", "with engine.connect() as conn:\n", " with conn.begin() as trans:\n", " # you can do database operation here, just make sure don't commit\n", " result1 = insert_or_ignore(engine, table, data1, conn=conn, trans=trans)\n", " # you can do database operation here, just make sure don't commit\n", " result2 = insert_or_replace(engine, table, data2, conn=conn, trans=trans)\n", " # you can do database operation here, just make sure don't commit\n", " # all operations are part of the same transaction\n", " # Commit/rollback handled by context manager\n", "```\n", "\n", "For detailed information about transaction modes, error handling, and advanced usage patterns, please refer to the :ref:`transaction-mode` documentation section.\n", "\n", "## Next Steps\n", "\n", "Continue reading the documentation to learn about advanced features, error handling, and best practices for production usage." ] }, { "cell_type": "code", "execution_count": null, "id": "dcce73d2-433a-4308-86bb-db2ff58a11ba", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.8" } }, "nbformat": 4, "nbformat_minor": 5 }