[Python] Creating Pandas DataFrames (Lists/Dictionaries) and Type Conversion (astype)

The Pandas DataFrame is a 2-dimensional data structure with rows and columns, similar to an Excel spreadsheet. In data analysis, DataFrames are not only loaded from CSVs or databases but are also frequently generated dynamically from Python lists or dictionaries.

This article explains the basic procedures for creating DataFrames from lists and dictionaries, and how to convert the data types (dtype) of the created data using the astype method.

目次

Generating DataFrames from Lists and Dictionaries

The most common way to create a DataFrame is by using the pd.DataFrame constructor. There are two main patterns depending on how you pass the data:

  1. From a 2D List: Defining data row by row.
  2. From a Dictionary: Defining data column by column.

Below is the implementation code using store sales data as an example.

import pandas as pd
import numpy as np

def create_and_convert_dataframe():
    """
    Function to demonstrate DataFrame generation from lists and dictionaries,
    and data type conversion using astype.
    """

    print("=== 1. Generation from List (2D Array) ===")
    # Data by row: [Store_ID, Store_Name, Quantity]
    sales_data_list = [
        [101, "Tokyo_Main", 150],
        [102, "Osaka_Branch", 80],
        [103, "Nagoya_Shop", 120]
    ]
    
    # Define column names with the columns argument
    # Define row labels with the index argument (sequential numbers from 0 if omitted)
    df_from_list = pd.DataFrame(
        sales_data_list,
        columns=["Store_ID", "Store_Name", "Quantity"],
        index=["row_1", "row_2", "row_3"]
    )
    
    print("--- DataFrame created from List ---")
    print(df_from_list)
    print("\n")


    print("=== 2. Generation from Dictionary (Key=Column Name) ===")
    # Data by column: {"Column Name": [Data List]}
    sales_data_dict = {
        "Store_ID": [101, 102, 103],
        "Store_Name": ["Tokyo_Main", "Osaka_Branch", "Nagoya_Shop"],
        "Quantity": [150, 80, 120]
    }
    
    # Dictionary keys automatically become column names
    df_from_dict = pd.DataFrame(
        sales_data_dict,
        index=["row_1", "row_2", "row_3"]
    )
    
    print("--- DataFrame created from Dictionary ---")
    print(df_from_dict)
    print("\n")


    print("=== 3. Check and Convert Data Types (astype) ===")
    # Check current data types
    print("--- dtypes before conversion ---")
    print(df_from_dict.dtypes)
    
    # Execute type conversion
    # Store_ID: Integer (int) -> Floating point (float)
    # Store_Name: Object -> Pandas specific string type (string)
    df_converted = df_from_dict.astype({
        "Store_ID": np.float64,
        "Store_Name": pd.StringDtype()
    })
    
    print("\n--- dtypes after conversion ---")
    print(df_converted.dtypes)
    
    print("\n--- DataFrame after conversion ---")
    print(df_converted)

if __name__ == "__main__":
    create_and_convert_dataframe()

Execution Result

=== 1. Generation from List (2D Array) ===
--- DataFrame created from List ---
       Store_ID    Store_Name  Quantity
row_1       101    Tokyo_Main       150
row_2       102  Osaka_Branch        80
row_3       103   Nagoya_Shop       120


=== 2. Generation from Dictionary (Key=Column Name) ===
--- DataFrame created from Dictionary ---
       Store_ID    Store_Name  Quantity
row_1       101    Tokyo_Main       150
row_2       102  Osaka_Branch        80
row_3       103   Nagoya_Shop       120


=== 3. Check and Convert Data Types (astype) ===
--- dtypes before conversion ---
Store_ID       int64
Store_Name    object
Quantity       int64
dtype: object

--- dtypes after conversion ---
Store_ID      float64
Store_Name     string
Quantity        int64
dtype: object

--- DataFrame after conversion ---
       Store_ID    Store_Name  Quantity
row_1     101.0    Tokyo_Main       150
row_2     102.0  Osaka_Branch        80
row_3     103.0   Nagoya_Shop       120

Explanation of Key Points

Distinguishing Between Lists and Dictionaries

  • From Lists: Suitable when data is grouped by rows (e.g., processing CSV files row by row). You must specify column names separately using the columns argument.
  • From Dictionaries: Suitable when data is grouped by columns (e.g., processing JSON data). Since dictionary keys become column names directly, readability is high.

Controlling Data Types (astype)

In Pandas, string data is typically handled as the object type. However, Pandas 1.0 and later introduced StringDtype (displayed as string) for more rigorous string manipulation.

  • np.float64: Used when you want to treat integers as decimals by specifying a type from the NumPy numerical calculation library.
  • pd.StringDtype(): A Pandas-specific string type. It is more specialized for handling text data than the conventional object type.

By passing a dictionary to the astype method, you can batch convert different columns to different types. This helps optimize memory usage and prevent errors in subsequent analysis processes.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次