In data analysis workflows, processing data directly from a database (DB) as a Pandas DataFrame, or saving processed DataFrames back to a database, are very important tasks. Pandas has high affinity with SQL and can perform these operations seamlessly using standard features.
This article explains how to read and write (SELECT / INSERT) data to databases using the standard library sqlite3 and SQLAlchemy.
Basic Operations Using SQLite
First, I will introduce an example of reading and writing to an in-memory database using the sqlite3 module, which is included in Python by default. Here, we assume a scenario of managing store inventory data.
1. Reading from Database to DataFrame (read_sql)
By using the pd.read_sql function, you can retrieve the results of a SQL query directly as a DataFrame.
- sql: The SQL statement you want to execute (e.g., SELECT statement).
- con: The database connection object.
- index_col: The column name to use as the DataFrame index.
2. Writing from DataFrame to Database (to_sql)
The df.to_sql method allows you to INSERT the contents of a DataFrame into a table (or create a table).
- name: The target table name.
- con: The database connection object.
- if_exists: Behavior if the table already exists.
'fail': Raises an error (Default).'replace': Deletes the table and recreates it.'append': Adds rows to the existing table.
- index: Whether to write the DataFrame index as a column.
Below is the complete implementation code combining these operations.
import sqlite3
import pandas as pd
def manage_inventory_db():
"""
Function demonstrating reading and writing Pandas data to an SQLite database
"""
# Connect to in-memory SQLite database
# (To use a real file, specify "file.db" instead of ":memory:")
with sqlite3.connect(":memory:") as conn:
# --- Preparation: Create test table and data ---
cur = conn.cursor()
cur.execute("CREATE TABLE inventory (item_id INTEGER PRIMARY KEY, price INTEGER, stock INTEGER)")
# Insert initial data
initial_data = [
(101, 1500, 50),
(102, 2800, 30),
(103, 900, 100)
]
cur.executemany("INSERT INTO inventory VALUES (?, ?, ?)", initial_data)
conn.commit()
print("--- Initial Database State ---")
# Direct output via SQL for verification
for row in cur.execute("SELECT * FROM inventory"):
print(row)
print("\n")
# --- 1. Reading from Database to DataFrame (read_sql) ---
print("=== Reading with read_sql ===")
# Execute SQL query and store result in DataFrame
# Set item_id column as DataFrame index
df_inventory = pd.read_sql(
"SELECT item_id, price, stock FROM inventory",
conn,
index_col="item_id"
)
print(df_inventory)
print("\n")
# --- 2. Writing from DataFrame to Database (to_sql) ---
print("=== Appending with to_sql ===")
# Create new item data
new_items_data = {
"price": [3200, 450],
"stock": [15, 200]
}
# Create DataFrame explicitly specifying index (item_id)
df_new_items = pd.DataFrame(new_items_data, index=[104, 105])
df_new_items.index.name = "item_id" # Match index name with DB column name
print("--- Data to Append ---")
print(df_new_items)
# Write to database
# Add to existing table with if_exists='append'
df_new_items.to_sql(
"inventory",
conn,
if_exists="append",
index=True, # Write index (item_id) as well
index_label="item_id"
)
# --- Check results ---
print("\n--- Database State After Append ---")
df_updated = pd.read_sql("SELECT * FROM inventory", conn, index_col="item_id")
print(df_updated)
if __name__ == "__main__":
manage_inventory_db()
Execution Result
--- Initial Database State ---
(101, 1500, 50)
(102, 2800, 30)
(103, 900, 100)
=== Reading with read_sql ===
price stock
item_id
101 1500 50
102 2800 30
103 900 100
=== Appending with to_sql ===
--- Data to Append ---
price stock
item_id
104 3200 15
105 450 200
--- Database State After Append ---
price stock
item_id
101 1500 50
102 2800 30
103 900 100
104 3200 15
105 450 200
General-Purpose Connection Using SQLAlchemy
In practical work using RDBMS such as MySQL, PostgreSQL, or Oracle, it is common to manage connections using the Engine from the SQLAlchemy library. Pandas’ read_sql and to_sql can accept this Engine object as the connection argument.
import pandas as pd
from sqlalchemy import create_engine
def connect_external_database():
"""
Example connection using SQLAlchemy (Code example only)
* Requires actual DB server environment to run
"""
# Definition of connection string
# Format: dialect+driver://username:password@host:port/database
db_connection_str = 'mysql+pymysql://myuser:mypassword@localhost:3306/mydatabase'
# Create engine
db_engine = create_engine(db_connection_str)
try:
# Manage connection with context manager
with db_engine.connect() as connection:
# Write data
# By setting index=False, the DataFrame index column is not saved to the DB
df_sample = pd.DataFrame({"col1": [1, 2], "col2": ["A", "B"]})
df_sample.to_sql(
"target_table_name",
con=connection,
if_exists="append",
index=False
)
print("Data written successfully.")
except Exception as e:
print(f"Error occurred during database operation: {e}")
# Note: This code will not run without an execution environment
Key Points
- Connection String: A URL-formatted string passed to
create_enginespecifying the database type and driver. - Versatility: Other than the SQL statements, the Pandas code (like
to_sql) rarely needs to change even if the destination changes from SQLite to MySQL. This allows for flexible adaptation even if the analysis environment and production environment use different databases.
