How to Extract Data from Specific Levels of a MultiIndex in Python

目次

Managing Hierarchical Data with MultiIndex

In Pandas, you can use a “MultiIndex” to manage data with multiple levels of row indexes. This is very useful for organizing data with complex structures. For example, you can manage data with a parent-child relationship, such as “Region” and “Branch Name,” in a single DataFrame.

Building a DataFrame with a MultiIndex

First, let’s create an index with a two-level structure (Region and Branch) and prepare a DataFrame containing sales and customer count data.

import pandas as pd

# Define index levels (Region, Branch)
index_structure = [
    ("East", "Store_A"),
    ("East", "Store_B"),
    ("West", "Store_C"),
    ("West", "Store_D"),
    ("South", "Store_E"),
    ("South", "Store_F")
]

# Create MultiIndex
m_index = pd.MultiIndex.from_tuples(index_structure, names=["Region", "Branch"])

# Define statistical data (Sales, Customers)
store_stats = {
    "Sales_Amount": [850000, 920000, 780000, 880000, 650000, 710000],
    "Customer_Count": [450, 510, 390, 480, 320, 350]
}

# Create DataFrame with the index
df = pd.DataFrame(store_stats, index=m_index)

print("--- Created MultiIndex DataFrame ---")
print(df)

How to Extract Data Using loc

For DataFrames with a MultiIndex, you use the loc attribute to extract data from specific levels or rows.

Extracting a Specific Level (First Level)

You can get all data belonging to a category by specifying the label of the first level (top level).

# Extract all data for the "East" region
east_region_data = df.loc["East"]

print("\n--- Extraction result for East region ---")
print(east_region_data)

Extracting a Specific Row (Specifying All Levels)

If you want to get a specific row, you specify each level of the index as a tuple.

# Extract data for "Store_A" in the "East" region
target_store = df.loc[("East", "Store_A")]

print("\n--- Individual data for East region Store_A ---")
print(target_store)

Execution Results

When you run the code above, the data is extracted correctly based on the hierarchical structure.

--- Created MultiIndex DataFrame ---
                Sales_Amount  Customer_Count
Region Branch                               
East   Store_A        850000             450
       Store_B        920000             510
West   Store_C        780000             390
       Store_D        880000             480
South  Store_E        650000             320
       Store_F        710000             350

--- Extraction result for East region ---
         Sales_Amount  Customer_Count
Branch                               
Store_A        850000             450
Store_B        920000             510

--- Individual data for East region Store_A ---
Sales_Amount      850000
Customer_Count       450
Name: (East, Store_A), dtype: int64

Important Notes on Data Extraction

If you specify only the top level with loc, the result is a new DataFrame where the specified level is removed. However, if you use a tuple to specify all levels, you get a Series object containing the values for that specific row. Understanding how the return type changes based on the extraction level helps you perform data operations more effectively.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次