[Python] How to Cross-Tabulate and Aggregate Data with Pandas pivot_table

目次

Summarizing Data with pivot_table Method

The pivot_table method in Pandas allows you to organize large amounts of data and create cross-tabulations. Similar to the pivot table feature in Excel, it is a powerful tool for calculating averages or totals by specifying rows and columns.

Main Parameters and Their Meanings

The following table summarizes the parameters frequently used with pivot_table.

ParameterMeaning
indexColumn names to use as row indexes. You can specify multiple columns.
columnsColumn names to use as column indexes. You can specify multiple columns.
valuesColumn names of the numerical data to be aggregated.
fill_valueValue to replace missing data (NaN).
aggfuncFunction used for aggregation (e.g., mean, sum). The default is mean.

Example of Creating a Pivot Table

In this example, we create code to manage shipping fees based on “Product type” and “Delivery area” across multiple warehouses.

import pandas as pd
import numpy as np

# Define logistics data
# Warehouse, Product, Area, Shipping_Fee
logistics_data = {
    "Warehouse": ["East", "East", "East", "West", "West", "West", "North", "North", "North"],
    "Product": ["Food", "Daily", "Daily", "Food", "Daily", "Food", "Food", "Daily", "Daily"],
    "Area": ["Tokyo", "Tokyo", "Osaka", "Osaka", "Tokyo", "Osaka", "Tokyo", "Osaka", "Tokyo"],
    "Shipping_Fee": [850, 600, 1200, 950, 700, 1100, 900, 1150, 650]
}

df = pd.DataFrame(logistics_data)

# Perform cross-tabulation using pivot_table
# Place Warehouse as rows, Product as columns, and calculate mean shipping fee
# Fill missing combinations with 0
pivot_df = df.pivot_table(
    index="Warehouse",
    columns="Product",
    values="Shipping_Fee",
    fill_value=0,
    aggfunc=np.mean
)

print("--- Original Logistics Data ---")
print(df)
print("\n--- Average Shipping Fee by Warehouse and Product (Pivot Table) ---")
print(pivot_df)

Execution Results

Running the code above produces an aggregated result with organized rows and columns.

--- Original Logistics Data ---
  Warehouse Product   Area  Shipping_Fee
0      East    Food  Tokyo           850
1      East   Daily  Tokyo           600
2      East   Daily  Osaka          1200
3      West    Food  Osaka           950
4      West   Daily  Tokyo           700
5      West    Food  Osaka          1100
6     North    Food  Tokyo           900
7     North   Daily  Osaka          1150
8     North   Daily  Tokyo           650

--- Average Shipping Fee by Warehouse and Product (Pivot Table) ---
Product    Daily    Food
Warehouse               
East       900.0   850.0
North      900.0   900.0
West       700.0  1025.0

Code Explanation and Additional Information

  • How aggregation works: By specifying aggfunc=np.mean, the code calculates the average value when multiple data points exist for the same “Warehouse” and “Product”.
  • Handling missing values: By setting fill_value=0, any combination that results in NaN (Not a Number)—such as when a specific product does not exist in a specific warehouse—is automatically replaced with 0.
  • Flexible aggregation: You can change aggfunc to other functions like np.sum (total), len (count), or np.max (maximum value) depending on your needs.

Using pivot_table helps you quickly extract trends necessary for business decisions from complex raw data.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次