[Python] Calculating Basic Statistics with Pandas (Mean, Max, Min, etc.)

2026年1月4日

In the initial stages of data analysis, it is crucial to understand the overall picture of your data by checking basic statistics (descriptive statistics) such as the “mean,” “median,” and “standard deviation.” Pandas DataFrames provide a wealth of methods to calculate these figures.

This article explains how to calculate individual statistics and how to check major statistics all at once using the describe() method.

1. List of Major Statistics Calculation Methods

The main methods that can be called on a DataFrame or Series are as follows. By default, these are calculated for each “column.”

Method	Meaning	Remarks
count()	Number of elements	Excludes missing values (NaN).
mean()	Mean value	Average.
median()	Median value	The middle value when data is sorted.
mode()	Mode	The most frequently occurring value (multiple may exist).
max()	Maximum value	The largest value.
min()	Minimum value	The smallest value.
std()	Standard deviation	Measure of data dispersion (Unbiased standard deviation).
var()	Variance	Square of standard deviation (Unbiased variance).
sample()	Random sampling	Not a statistic, but used for sampling data.
describe()	Summary statistics	Calculates the major values listed above all at once.

2. Implementation Sample Code

Here, we will calculate various statistics using a list of real estate properties (Price, Area, Age) as the subject.

import pandas as pd

def calculate_statistics():
    """
    Pandas DataFrameを用いて基本統計量を算出する関数
    """
    
    # 1. サンプルデータの作成: 不動産物件データ
    # Price: 価格(万円), Area: 面積(m2), Age: 築年数
    property_data = {
        "Price": [3500, 4200, 2800, 5500, 4200],
        "Area": [45.5, 60.0, 38.2, 85.0, 55.0],
        "Age": [15, 5, 25, 2, 12]
    }
    
    df = pd.DataFrame(property_data)
    
    print("--- 元のデータセット ---")
    print(df)
    print("\n")


    # 2. 個別の統計量を算出
    print("=== 個別の統計量 ===")
    
    # 平均値 (Mean)
    # 各列の平均がSeriesとして返されます
    mean_val = df.mean()
    print(f"[平均値]\n{mean_val}\n")
    
    # 中央値 (Median)
    median_val = df.median()
    print(f"[中央値]\n{median_val}\n")
    
    # 最頻値 (Mode)
    # 最頻値は複数存在する可能性があるため、DataFrame形式で返されます
    mode_val = df["Price"].mode()
    print(f"[価格の最頻値]: {mode_val[0]} 万円\n")

    # 標準偏差 (Std)
    std_val = df.std()
    print(f"[標準偏差]\n{std_val}\n")


    # 3. 要約統計量の一括取得 (describe)
    print("=== 要約統計量 (describe) ===")
    # count, mean, std, min, 25%, 50%, 75%, max が一度に計算されます
    description = df.describe()
    print(description)
    print("\n")
    
    
    # 4. ランダムサンプリング (sample)
    print("=== ランダムサンプリング ===")
    # ランダムに2件のデータを抽出
    # random_stateを固定すると再現性が保たれます
    sampled_df = df.sample(n=2, random_state=1)
    print(sampled_df)

if __name__ == "__main__":
    calculate_statistics()

3. Execution Result

--- 元のデータセット ---
   Price  Area  Age
0   3500  45.5   15
1   4200  60.0    5
2   2800  38.2   25
3   5500  85.0    2
4   4200  55.0   12


=== 個別の統計量 ===
[平均値]
Price    4040.00
Area       56.74
Age        11.80
dtype: float64

[中央値]
Price    4200.0
Area       55.0
Age        12.0
dtype: float64

[価格の最頻値]: 4200 万円

[標準偏差]
Price    993.981891
Area      17.954052
Age        9.093954
dtype: float64


=== 要約統計量 (describe) ===
             Price       Area        Age
count     5.000000   5.000000   5.000000
mean   4040.000000  56.740000  11.800000
std     993.981891  17.954052   9.093954
min    2800.000000  38.200000   2.000000
25%    3500.000000  45.500000   5.000000
50%    4200.000000  55.000000  12.000000
75%    4200.000000  60.000000  15.000000
max    5500.000000  85.000000  25.000000


=== ランダムサンプリング ===
   Price  Area  Age
2   2800  38.2   25
1   4200  60.0    5

4. Explanation: Convenience of the describe() Method

The describe() method is a very powerful tool for instantly grasping data trends.

count: Number of data points (excluding missing values).
mean: Average value.
std: Standard deviation.
min / max: Minimum and Maximum values.
25%, 50%, 75%: Quartiles (50% is the same as the median).

Executing df.describe() immediately after loading a DataFrame to check the general distribution and for outliers (e.g., checking for extreme max/min values) is a standard practice in data analysis.

よかったらシェアしてね！