[Python] Creating Pandas Series: Generation from Lists and Specifying Types with dtype

In the Python data analysis library Pandas, the basic units for handling data are “Series” and “DataFrame.”

  • DataFrame: A 2-dimensional tabular data structure with rows and columns.
  • Series: A 1-dimensional array data structure with an index (labels).

A DataFrame can be thought of as multiple Series combined as columns. Therefore, understanding how to generate a Series and control its data type (dtype) is the first step to mastering Pandas.

This article explains the basic method of creating a Series from a list and how to explicitly specify data types (integers, floating-point numbers, dates) during generation.

目次

Basic Syntax for Series Generation

A Series is generated using the pd.Series() constructor. The most basic usage is to pass a Python list. Also, by using the index argument, you can freely set labels (index) for each data point.

Specifying Data Types (dtype)

Pandas usually automatically infers the appropriate data type from the input data. However, if you want to force a specific type (such as float or datetime) for memory efficiency or calculation purposes, use the dtype argument.

  • List of Integers: Defaults to int64.
  • Floating-point: Specify dtype=np.float64 or dtype='float'.
  • Date/Time: By specifying dtype='datetime64[ns]' for string date data, it is converted into a format capable of date calculations.

Implementation Code Example

Below is the code summarizing three patterns for generating a Series from a list: basic generation, specifying floating-point numbers, and specifying date types.

import pandas as pd
import numpy as np

def demonstrate_series_creation():
    """
    Function to verify the behavior of Pandas Series generation and dtype specification
    """
    
    print("=== 1. Basic Series Generation (Integer & Index Specification) ===")
    # Preparation of data list and index list
    score_data = [85, 92, 78]
    student_labels = ["Student_A", "Student_B", "Student_C"]
    
    # Generating Series
    # If the index argument is omitted, a sequential number starting from 0 is automatically assigned
    series_int = pd.Series(score_data, index=student_labels)
    
    print("--- Generated Series ---")
    print(series_int)
    print(f"Data Type (dtype): {series_int.dtype}\n")


    print("=== 2. Generating with NumPy Type Specification (Floating-point) ===")
    # Specify np.float64 with dtype argument
    # Even if original data is integer, it is forcibly treated as floating-point number
    series_float = pd.Series(score_data, index=student_labels, dtype=np.float64)
    
    print("--- Generated Series ---")
    print(series_float)
    print(f"Data Type (dtype): {series_float.dtype}\n")


    print("=== 3. Generating as Date Type (datetime64) ===")
    # List of strings representing dates
    timestamp_strs = [
        "2026-04-01 09:00:00",
        "2026-04-01 10:00:00",
        "2026-04-01 11:00:00"
    ]
    
    # Generated by specifying dtype="datetime64[ns]"
    # This allows it to be handled as date objects instead of strings
    series_date = pd.Series(timestamp_strs, dtype="datetime64[ns]")
    
    print("--- Generated Series ---")
    print(series_date)
    print(f"Data Type (dtype): {series_date.dtype}")

if __name__ == "__main__":
    demonstrate_series_creation()

Execution Result

=== 1. Basic Series Generation (Integer & Index Specification) ===
--- Generated Series ---
Student_A    85
Student_B    92
Student_C    78
dtype: int64
Data Type (dtype): int64

=== 2. Generating with NumPy Type Specification (Floating-point) ===
--- Generated Series ---
Student_A    85.0
Student_B    92.0
Student_C    78.0
dtype: float64
Data Type (dtype): float64

=== 3. Generating as Date Type (datetime64) ===
--- Generated Series ---
0   2026-04-01 09:00:00
1   2026-04-01 10:00:00
2   2026-04-01 11:00:00
dtype: datetime64[ns]
Data Type (dtype): datetime64[ns]

Explanation of Key Points

  • Role of Index: By setting an index like Student_A, you can access data using labels, just like a dictionary.
  • int64 and float64: Even with a list of integers, specifying dtype allows you to hold data as floating-point types from the start. This is useful when the data might contain missing values (NaN) or when you need to handle decimals in later calculations.
  • datetime64[ns]: When handling time-series data, you cannot perform time calculations if the data remains as strings (object type). Specifying this type when creating the Series enables time difference calculations and filtering by periods.

In data processing with Pandas, creating a Series with the correct data type is important for preventing errors and improving processing speed.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次