[Python] Integrating Databases with Pandas (read_sql / to_sql)

In data analysis workflows, processing data directly from a database (DB) as a Pandas DataFrame, or saving processed DataFrames back to a database, are very important tasks. Pandas has high affinity with SQL and can perform these operations seamlessly using standard features.

This article explains how to read and write (SELECT / INSERT) data to databases using the standard library sqlite3 and SQLAlchemy.

目次

Basic Operations Using SQLite

First, I will introduce an example of reading and writing to an in-memory database using the sqlite3 module, which is included in Python by default. Here, we assume a scenario of managing store inventory data.

1. Reading from Database to DataFrame (read_sql)

By using the pd.read_sql function, you can retrieve the results of a SQL query directly as a DataFrame.

  • sql: The SQL statement you want to execute (e.g., SELECT statement).
  • con: The database connection object.
  • index_col: The column name to use as the DataFrame index.

2. Writing from DataFrame to Database (to_sql)

The df.to_sql method allows you to INSERT the contents of a DataFrame into a table (or create a table).

  • name: The target table name.
  • con: The database connection object.
  • if_exists: Behavior if the table already exists.
    • 'fail': Raises an error (Default).
    • 'replace': Deletes the table and recreates it.
    • 'append': Adds rows to the existing table.
  • index: Whether to write the DataFrame index as a column.

Below is the complete implementation code combining these operations.

import sqlite3
import pandas as pd

def manage_inventory_db():
    """
    Function demonstrating reading and writing Pandas data to an SQLite database
    """
    
    # Connect to in-memory SQLite database
    # (To use a real file, specify "file.db" instead of ":memory:")
    with sqlite3.connect(":memory:") as conn:
        
        # --- Preparation: Create test table and data ---
        cur = conn.cursor()
        cur.execute("CREATE TABLE inventory (item_id INTEGER PRIMARY KEY, price INTEGER, stock INTEGER)")
        
        # Insert initial data
        initial_data = [
            (101, 1500, 50),
            (102, 2800, 30),
            (103, 900, 100)
        ]
        cur.executemany("INSERT INTO inventory VALUES (?, ?, ?)", initial_data)
        conn.commit()
        print("--- Initial Database State ---")
        # Direct output via SQL for verification
        for row in cur.execute("SELECT * FROM inventory"):
            print(row)
        print("\n")


        # --- 1. Reading from Database to DataFrame (read_sql) ---
        print("=== Reading with read_sql ===")
        
        # Execute SQL query and store result in DataFrame
        # Set item_id column as DataFrame index
        df_inventory = pd.read_sql(
            "SELECT item_id, price, stock FROM inventory",
            conn,
            index_col="item_id"
        )
        
        print(df_inventory)
        print("\n")


        # --- 2. Writing from DataFrame to Database (to_sql) ---
        print("=== Appending with to_sql ===")
        
        # Create new item data
        new_items_data = {
            "price": [3200, 450],
            "stock": [15, 200]
        }
        # Create DataFrame explicitly specifying index (item_id)
        df_new_items = pd.DataFrame(new_items_data, index=[104, 105])
        df_new_items.index.name = "item_id"  # Match index name with DB column name
        
        print("--- Data to Append ---")
        print(df_new_items)
        
        # Write to database
        # Add to existing table with if_exists='append'
        df_new_items.to_sql(
            "inventory",
            conn,
            if_exists="append",
            index=True,          # Write index (item_id) as well
            index_label="item_id"
        )
        
        # --- Check results ---
        print("\n--- Database State After Append ---")
        df_updated = pd.read_sql("SELECT * FROM inventory", conn, index_col="item_id")
        print(df_updated)

if __name__ == "__main__":
    manage_inventory_db()

Execution Result

--- Initial Database State ---
(101, 1500, 50)
(102, 2800, 30)
(103, 900, 100)

=== Reading with read_sql ===
         price  stock
item_id              
101       1500     50
102       2800     30
103        900    100

=== Appending with to_sql ===
--- Data to Append ---
         price  stock
item_id              
104       3200     15
105        450    200

--- Database State After Append ---
         price  stock
item_id              
101       1500     50
102       2800     30
103        900    100
104       3200     15
105        450    200

General-Purpose Connection Using SQLAlchemy

In practical work using RDBMS such as MySQL, PostgreSQL, or Oracle, it is common to manage connections using the Engine from the SQLAlchemy library. Pandas’ read_sql and to_sql can accept this Engine object as the connection argument.

import pandas as pd
from sqlalchemy import create_engine

def connect_external_database():
    """
    Example connection using SQLAlchemy (Code example only)
    * Requires actual DB server environment to run
    """
    
    # Definition of connection string
    # Format: dialect+driver://username:password@host:port/database
    db_connection_str = 'mysql+pymysql://myuser:mypassword@localhost:3306/mydatabase'
    
    # Create engine
    db_engine = create_engine(db_connection_str)

    try:
        # Manage connection with context manager
        with db_engine.connect() as connection:
            
            # Write data
            # By setting index=False, the DataFrame index column is not saved to the DB
            df_sample = pd.DataFrame({"col1": [1, 2], "col2": ["A", "B"]})
            df_sample.to_sql(
                "target_table_name",
                con=connection,
                if_exists="append",
                index=False
            )
            
            print("Data written successfully.")

    except Exception as e:
        print(f"Error occurred during database operation: {e}")

# Note: This code will not run without an execution environment

Key Points

  • Connection String: A URL-formatted string passed to create_engine specifying the database type and driver.
  • Versatility: Other than the SQL statements, the Pandas code (like to_sql) rarely needs to change even if the destination changes from SQLite to MySQL. This allows for flexible adaptation even if the analysis environment and production environment use different databases.
よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次