[Python] Introduction to Pandas: Basic Structure of Series and DataFrame and Data Extraction with iloc

Pandas is the de facto standard library for data analysis in Python. It allows you to handle table-like data, similar to Excel, flexibly and at high speeds within your programs. It is an essential tool for data preprocessing, aggregation, and preparing for visualization.

This article explains how to create the two core data structures of Pandas, “Series” and “DataFrame,” and how to extract data using position-based indexing with iloc.

目次

Major Data Structures and Basic Operations in Pandas

The basic concepts you need to understand when working with Pandas are as follows:

  • Series: A 1-dimensional array-like data structure. It can be thought of as a list with an index (labels).
  • DataFrame: A 2-dimensional tabular data structure. It has rows and columns and can be interpreted as a bundle of multiple Series.
  • Index / Columns: Row headers (Index) and column headers (Columns).
  • iloc: A feature to extract data by specifying row or column numbers (Integer Location).

Below is the implementation code covering these concepts.

# Please execute `pip install pandas` in advance
import pandas as pd

def demonstrate_pandas_basics():
    """
    Function to demonstrate basic data structures and referencing in Pandas
    """
    print("=== 1. Creating a Series ===")
    # Create a Series from a list
    # Explicitly specify labels with the index argument
    sales_list = [150, 200, 120]
    dates = ["2023-01-01", "2023-01-02", "2023-01-03"]
    
    series_data = pd.Series(sales_list, index=dates, name="Sales")
    print("--- Created Series ---")
    print(series_data)
    print(f"Type: {type(series_data)}\n")

    print("=== 2. Creating a DataFrame ===")
    # Create a DataFrame from a dictionary
    # Keys become column names (columns), values become data
    shop_data = {
        "Product_A": [100, 120, 150, 130],
        "Product_B": [90, 80, 110, 95],
        "Product_C": [200, 210, 205, 190]
    }
    # Create with specified row labels (index)
    shop_index = ["Store_Tokyo", "Store_Osaka", "Store_Nagoya", "Store_Fukuoka"]
    
    df = pd.DataFrame(shop_data, index=shop_index)
    
    print("--- Created DataFrame ---")
    print(df)
    print("\n--- Checking Components ---")
    print(f"Columns (Column Names): {df.columns.values}")
    print(f"Index (Row Names): {df.index.values}\n")

    print("=== 3. Data Extraction using iloc ===")
    # iloc is accessed via [row_index, col_index] (0-based)
    
    # Example: Get all data for the 1st row (Osaka)
    row_1 = df.iloc[1]
    print(f"--- Data for 1st row (Store_Osaka) ---\n{row_1}\n")

    # Example: Get value of 2nd row (Nagoya), 0th column (Product_A)
    val_2_0 = df.iloc[2, 0]
    print(f"--- Value at 2nd row, 0th column (Store_Nagoya, Product_A) ---")
    print(f"Value: {val_2_0}")

    # Example: Range specification using slicing (rows 0 to 2, columns 1 to end)
    subset = df.iloc[0:2, 1:]
    print("\n--- Extraction by range (Rows 0:2, Cols 1:) ---")
    print(subset)

if __name__ == "__main__":
    demonstrate_pandas_basics()

Execution Result

=== 1. Creating a Series ===
--- Created Series ---
2023-01-01    150
2023-01-02    200
2023-01-03    120
Name: Sales, dtype: int64
Type: <class 'pandas.core.series.Series'>

=== 2. Creating a DataFrame ===
--- Created DataFrame ---
               Product_A  Product_B  Product_C
Store_Tokyo          100         90        200
Store_Osaka          120         80        210
Store_Nagoya         150        110        205
Store_Fukuoka        130         95        190

--- Checking Components ---
Columns (Column Names): ['Product_A' 'Product_B' 'Product_C']
Index (Row Names): ['Store_Tokyo' 'Store_Osaka' 'Store_Nagoya' 'Store_Fukuoka']

=== 3. Data Extraction using iloc ===
--- Data for 1st row (Store_Osaka) ---
Product_A    120
Product_B     80
Product_C    210
Name: Store_Osaka, dtype: int64

--- Value at 2nd row, 0th column (Store_Nagoya, Product_A) ---
Value: 150

--- Extraction by range (Rows 0:2, Cols 1:) ---
             Product_B  Product_C
Store_Tokyo         90        200
Store_Osaka         80        210

Explanation: Relationship between Series and DataFrame

The data structure of Pandas is easier to understand if organized as follows:

  • Series: Data aligned in a single column. It possesses an index.
  • DataFrame: Multiple Series concatenated horizontally, sharing a common index (similar to a spreadsheet or a SQL table).

Integer Index Reference with iloc

In data analysis, there are many situations where you want to extract “data at a specific position.” The iloc property is used for this purpose.

  • It is used in the format df.iloc[row_index, col_index].
  • Like standard Python lists, positions are specified using 0-based integers.
  • By using slicing (:), you can select ranges of rows or columns to create a partial DataFrame (subset).

First, mastering these basic operations will allow you to load data and check specific parts of it.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次