The Pandas DataFrame is a 2-dimensional data structure with rows and columns, similar to an Excel spreadsheet. In data analysis, DataFrames are not only loaded from CSVs or databases but are also frequently generated dynamically from Python lists or dictionaries.
This article explains the basic procedures for creating DataFrames from lists and dictionaries, and how to convert the data types (dtype) of the created data using the astype method.
Generating DataFrames from Lists and Dictionaries
The most common way to create a DataFrame is by using the pd.DataFrame constructor. There are two main patterns depending on how you pass the data:
- From a 2D List: Defining data row by row.
- From a Dictionary: Defining data column by column.
Below is the implementation code using store sales data as an example.
import pandas as pd
import numpy as np
def create_and_convert_dataframe():
"""
Function to demonstrate DataFrame generation from lists and dictionaries,
and data type conversion using astype.
"""
print("=== 1. Generation from List (2D Array) ===")
# Data by row: [Store_ID, Store_Name, Quantity]
sales_data_list = [
[101, "Tokyo_Main", 150],
[102, "Osaka_Branch", 80],
[103, "Nagoya_Shop", 120]
]
# Define column names with the columns argument
# Define row labels with the index argument (sequential numbers from 0 if omitted)
df_from_list = pd.DataFrame(
sales_data_list,
columns=["Store_ID", "Store_Name", "Quantity"],
index=["row_1", "row_2", "row_3"]
)
print("--- DataFrame created from List ---")
print(df_from_list)
print("\n")
print("=== 2. Generation from Dictionary (Key=Column Name) ===")
# Data by column: {"Column Name": [Data List]}
sales_data_dict = {
"Store_ID": [101, 102, 103],
"Store_Name": ["Tokyo_Main", "Osaka_Branch", "Nagoya_Shop"],
"Quantity": [150, 80, 120]
}
# Dictionary keys automatically become column names
df_from_dict = pd.DataFrame(
sales_data_dict,
index=["row_1", "row_2", "row_3"]
)
print("--- DataFrame created from Dictionary ---")
print(df_from_dict)
print("\n")
print("=== 3. Check and Convert Data Types (astype) ===")
# Check current data types
print("--- dtypes before conversion ---")
print(df_from_dict.dtypes)
# Execute type conversion
# Store_ID: Integer (int) -> Floating point (float)
# Store_Name: Object -> Pandas specific string type (string)
df_converted = df_from_dict.astype({
"Store_ID": np.float64,
"Store_Name": pd.StringDtype()
})
print("\n--- dtypes after conversion ---")
print(df_converted.dtypes)
print("\n--- DataFrame after conversion ---")
print(df_converted)
if __name__ == "__main__":
create_and_convert_dataframe()
Execution Result
=== 1. Generation from List (2D Array) ===
--- DataFrame created from List ---
Store_ID Store_Name Quantity
row_1 101 Tokyo_Main 150
row_2 102 Osaka_Branch 80
row_3 103 Nagoya_Shop 120
=== 2. Generation from Dictionary (Key=Column Name) ===
--- DataFrame created from Dictionary ---
Store_ID Store_Name Quantity
row_1 101 Tokyo_Main 150
row_2 102 Osaka_Branch 80
row_3 103 Nagoya_Shop 120
=== 3. Check and Convert Data Types (astype) ===
--- dtypes before conversion ---
Store_ID int64
Store_Name object
Quantity int64
dtype: object
--- dtypes after conversion ---
Store_ID float64
Store_Name string
Quantity int64
dtype: object
--- DataFrame after conversion ---
Store_ID Store_Name Quantity
row_1 101.0 Tokyo_Main 150
row_2 102.0 Osaka_Branch 80
row_3 103.0 Nagoya_Shop 120
Explanation of Key Points
Distinguishing Between Lists and Dictionaries
- From Lists: Suitable when data is grouped by rows (e.g., processing CSV files row by row). You must specify column names separately using the
columnsargument. - From Dictionaries: Suitable when data is grouped by columns (e.g., processing JSON data). Since dictionary keys become column names directly, readability is high.
Controlling Data Types (astype)
In Pandas, string data is typically handled as the object type. However, Pandas 1.0 and later introduced StringDtype (displayed as string) for more rigorous string manipulation.
- np.float64: Used when you want to treat integers as decimals by specifying a type from the NumPy numerical calculation library.
- pd.StringDtype(): A Pandas-specific string type. It is more specialized for handling text data than the conventional
objecttype.
By passing a dictionary to the astype method, you can batch convert different columns to different types. This helps optimize memory usage and prevent errors in subsequent analysis processes.
