Summarizing Data with pivot_table Method
The pivot_table method in Pandas allows you to organize large amounts of data and create cross-tabulations. Similar to the pivot table feature in Excel, it is a powerful tool for calculating averages or totals by specifying rows and columns.
Main Parameters and Their Meanings
The following table summarizes the parameters frequently used with pivot_table.
| Parameter | Meaning |
| index | Column names to use as row indexes. You can specify multiple columns. |
| columns | Column names to use as column indexes. You can specify multiple columns. |
| values | Column names of the numerical data to be aggregated. |
| fill_value | Value to replace missing data (NaN). |
| aggfunc | Function used for aggregation (e.g., mean, sum). The default is mean. |
Example of Creating a Pivot Table
In this example, we create code to manage shipping fees based on “Product type” and “Delivery area” across multiple warehouses.
import pandas as pd
import numpy as np
# Define logistics data
# Warehouse, Product, Area, Shipping_Fee
logistics_data = {
"Warehouse": ["East", "East", "East", "West", "West", "West", "North", "North", "North"],
"Product": ["Food", "Daily", "Daily", "Food", "Daily", "Food", "Food", "Daily", "Daily"],
"Area": ["Tokyo", "Tokyo", "Osaka", "Osaka", "Tokyo", "Osaka", "Tokyo", "Osaka", "Tokyo"],
"Shipping_Fee": [850, 600, 1200, 950, 700, 1100, 900, 1150, 650]
}
df = pd.DataFrame(logistics_data)
# Perform cross-tabulation using pivot_table
# Place Warehouse as rows, Product as columns, and calculate mean shipping fee
# Fill missing combinations with 0
pivot_df = df.pivot_table(
index="Warehouse",
columns="Product",
values="Shipping_Fee",
fill_value=0,
aggfunc=np.mean
)
print("--- Original Logistics Data ---")
print(df)
print("\n--- Average Shipping Fee by Warehouse and Product (Pivot Table) ---")
print(pivot_df)
Execution Results
Running the code above produces an aggregated result with organized rows and columns.
--- Original Logistics Data ---
Warehouse Product Area Shipping_Fee
0 East Food Tokyo 850
1 East Daily Tokyo 600
2 East Daily Osaka 1200
3 West Food Osaka 950
4 West Daily Tokyo 700
5 West Food Osaka 1100
6 North Food Tokyo 900
7 North Daily Osaka 1150
8 North Daily Tokyo 650
--- Average Shipping Fee by Warehouse and Product (Pivot Table) ---
Product Daily Food
Warehouse
East 900.0 850.0
North 900.0 900.0
West 700.0 1025.0
Code Explanation and Additional Information
- How aggregation works: By specifying
aggfunc=np.mean, the code calculates the average value when multiple data points exist for the same “Warehouse” and “Product”. - Handling missing values: By setting
fill_value=0, any combination that results in NaN (Not a Number)—such as when a specific product does not exist in a specific warehouse—is automatically replaced with 0. - Flexible aggregation: You can change
aggfuncto other functions likenp.sum(total),len(count), ornp.max(maximum value) depending on your needs.
Using pivot_table helps you quickly extract trends necessary for business decisions from complex raw data.
