In actual data analysis, it is rare for all data to be perfectly complete. Dealing with “missing values (NaN / None)” caused by measurement errors or system failures is essential. Pandas is equipped with features to efficiently detect and appropriately process (remove or fill) these missing values.
This article explains how to check for the presence of missing values using isnull() and how to delete missing data using dropna().
Basic Methods for Handling Missing Values
isnull()/isna(): ReturnsTrueif the element is a missing value, andFalseotherwise.any(): ReturnsTrueif there is at least oneTruein the column or row. This is used in combination withisnull().dropna(): Removes rows (or columns) that contain missing values.
Implementation Sample Code
Here, we will use “sensor log data” from a factory production line as the subject. We assume a situation where temperature or vibration data is not recorded (missing) at certain times due to communication errors.
import pandas as pd
import numpy as np
def handle_missing_values():
"""
Function to demonstrate detection and removal of missing values (NaN) in a DataFrame
"""
# 1. Create Sample Data
# Intentionally create missing values by including np.nan and None
sensor_data = {
"Time": ["09:00", "09:10", "09:20", "09:30", "09:40"],
"Temperature": [120.5, np.nan, 119.8, 121.2, np.nan], # 2 missing values
"Vibration": [0.05, 0.06, 0.04, None, 0.05] # 1 missing value
}
df = pd.DataFrame(sensor_data)
print("--- Original Sensor Data (With Missing Values) ---")
print(df)
print("\n")
# 2. Detecting Missing Values (isnull + any)
print("=== Detecting Missing Values ===")
# Check the entire DataFrame
# Returns True/False for "Does this column contain missing values?"
has_null = df.isnull().any()
print("--- Presence of missing values in each column ---")
print(has_null)
# Check only a specific column
is_temp_null = pd.isnull(df["Temperature"]).any()
print(f"\nIs there missing data in the Temperature column: {is_temp_null}")
# 3. Removing Missing Values (Series Operation)
print("\n=== Removing Missing Values (Series) ===")
# Extract the Temperature column and delete missing data
temp_series = df["Temperature"]
clean_temp_series = temp_series.dropna()
print("--- Temperature Column After Removal ---")
print(clean_temp_series)
print(f"Original count: {len(temp_series)} -> After removal: {len(clean_temp_series)}")
# 4. Removing Missing Values (DataFrame Operation)
print("\n=== Removing Missing Values (DataFrame) ===")
# Deletes all rows containing "at least one" missing value
# (Default behavior: how='any', axis=0)
df_clean = df.dropna()
print("--- DataFrame with rows containing missing values removed ---")
print(df_clean)
# Note: Indices will be discontinuous, so reset if necessary
# df_clean = df_clean.reset_index(drop=True)
if __name__ == "__main__":
handle_missing_values()
Execution Result
--- Original Sensor Data (With Missing Values) ---
Time Temperature Vibration
0 09:00 120.5 0.05
1 09:10 NaN 0.06
2 09:20 119.8 0.04
3 09:30 121.2 NaN
4 09:40 NaN 0.05
=== Detecting Missing Values ===
--- Presence of missing values in each column ---
Time False
Temperature True
Vibration True
dtype: bool
Is there missing data in the Temperature column: True
=== Removing Missing Values (Series) ===
--- Temperature Column After Removal ---
0 120.5
2 119.8
3 121.2
Name: Temperature, dtype: float64
Original count: 5 -> After removal: 3
=== Removing Missing Values (DataFrame) ===
--- DataFrame with rows containing missing values removed ---
Time Temperature Vibration
0 09:00 120.5 0.05
2 09:20 119.8 0.04
Explanation
- isnull(): This method has exactly the same function as
isna(). You can use either, but it is desirable to be consistent within your project. - Behavior of dropna(): When executed on a DataFrame, the default behavior is to delete “rows where
NaNexists in any column”.how='all': Specify this when you want to delete rows only if all columns areNaN.subset=['Temperature']: Specify this when you want to delete rows only if there are missing values in specific columns.
As a first step in data cleaning, it is necessary to grasp the status of missing data and decide whether to “delete” or “fill with a specific value (such as the mean)” according to the purpose of the analysis.
