Managing Hierarchical Data with MultiIndex
In Pandas, you can use a “MultiIndex” to manage data with multiple levels of row indexes. This is very useful for organizing data with complex structures. For example, you can manage data with a parent-child relationship, such as “Region” and “Branch Name,” in a single DataFrame.
Building a DataFrame with a MultiIndex
First, let’s create an index with a two-level structure (Region and Branch) and prepare a DataFrame containing sales and customer count data.
import pandas as pd
# Define index levels (Region, Branch)
index_structure = [
("East", "Store_A"),
("East", "Store_B"),
("West", "Store_C"),
("West", "Store_D"),
("South", "Store_E"),
("South", "Store_F")
]
# Create MultiIndex
m_index = pd.MultiIndex.from_tuples(index_structure, names=["Region", "Branch"])
# Define statistical data (Sales, Customers)
store_stats = {
"Sales_Amount": [850000, 920000, 780000, 880000, 650000, 710000],
"Customer_Count": [450, 510, 390, 480, 320, 350]
}
# Create DataFrame with the index
df = pd.DataFrame(store_stats, index=m_index)
print("--- Created MultiIndex DataFrame ---")
print(df)
How to Extract Data Using loc
For DataFrames with a MultiIndex, you use the loc attribute to extract data from specific levels or rows.
Extracting a Specific Level (First Level)
You can get all data belonging to a category by specifying the label of the first level (top level).
# Extract all data for the "East" region
east_region_data = df.loc["East"]
print("\n--- Extraction result for East region ---")
print(east_region_data)
Extracting a Specific Row (Specifying All Levels)
If you want to get a specific row, you specify each level of the index as a tuple.
# Extract data for "Store_A" in the "East" region
target_store = df.loc[("East", "Store_A")]
print("\n--- Individual data for East region Store_A ---")
print(target_store)
Execution Results
When you run the code above, the data is extracted correctly based on the hierarchical structure.
--- Created MultiIndex DataFrame ---
Sales_Amount Customer_Count
Region Branch
East Store_A 850000 450
Store_B 920000 510
West Store_C 780000 390
Store_D 880000 480
South Store_E 650000 320
Store_F 710000 350
--- Extraction result for East region ---
Sales_Amount Customer_Count
Branch
Store_A 850000 450
Store_B 920000 510
--- Individual data for East region Store_A ---
Sales_Amount 850000
Customer_Count 450
Name: (East, Store_A), dtype: int64
Important Notes on Data Extraction
If you specify only the top level with loc, the result is a new DataFrame where the specified level is removed. However, if you use a tuple to specify all levels, you get a Series object containing the values for that specific row. Understanding how the return type changes based on the extraction level helps you perform data operations more effectively.
