[Python] Detecting and Removing Missing Values in Pandas DataFrames (isnull / dropna)

In actual data analysis, it is rare for all data to be perfectly complete. Dealing with “missing values (NaN / None)” caused by measurement errors or system failures is essential. Pandas is equipped with features to efficiently detect and appropriately process (remove or fill) these missing values.

This article explains how to check for the presence of missing values using isnull() and how to delete missing data using dropna().

目次

Basic Methods for Handling Missing Values

  • isnull() / isna(): Returns True if the element is a missing value, and False otherwise.
  • any(): Returns True if there is at least one True in the column or row. This is used in combination with isnull().
  • dropna(): Removes rows (or columns) that contain missing values.

Implementation Sample Code

Here, we will use “sensor log data” from a factory production line as the subject. We assume a situation where temperature or vibration data is not recorded (missing) at certain times due to communication errors.

import pandas as pd
import numpy as np

def handle_missing_values():
    """
    Function to demonstrate detection and removal of missing values (NaN) in a DataFrame
    """
    
    # 1. Create Sample Data
    # Intentionally create missing values by including np.nan and None
    sensor_data = {
        "Time": ["09:00", "09:10", "09:20", "09:30", "09:40"],
        "Temperature": [120.5, np.nan, 119.8, 121.2, np.nan],  # 2 missing values
        "Vibration": [0.05, 0.06, 0.04, None, 0.05]            # 1 missing value
    }
    
    df = pd.DataFrame(sensor_data)
    
    print("--- Original Sensor Data (With Missing Values) ---")
    print(df)
    print("\n")


    # 2. Detecting Missing Values (isnull + any)
    print("=== Detecting Missing Values ===")
    
    # Check the entire DataFrame
    # Returns True/False for "Does this column contain missing values?"
    has_null = df.isnull().any()
    
    print("--- Presence of missing values in each column ---")
    print(has_null)
    
    # Check only a specific column
    is_temp_null = pd.isnull(df["Temperature"]).any()
    print(f"\nIs there missing data in the Temperature column: {is_temp_null}")


    # 3. Removing Missing Values (Series Operation)
    print("\n=== Removing Missing Values (Series) ===")
    
    # Extract the Temperature column and delete missing data
    temp_series = df["Temperature"]
    clean_temp_series = temp_series.dropna()
    
    print("--- Temperature Column After Removal ---")
    print(clean_temp_series)
    print(f"Original count: {len(temp_series)} -> After removal: {len(clean_temp_series)}")


    # 4. Removing Missing Values (DataFrame Operation)
    print("\n=== Removing Missing Values (DataFrame) ===")
    
    # Deletes all rows containing "at least one" missing value
    # (Default behavior: how='any', axis=0)
    df_clean = df.dropna()
    
    print("--- DataFrame with rows containing missing values removed ---")
    print(df_clean)

    # Note: Indices will be discontinuous, so reset if necessary
    # df_clean = df_clean.reset_index(drop=True)

if __name__ == "__main__":
    handle_missing_values()

Execution Result

--- Original Sensor Data (With Missing Values) ---
    Time  Temperature  Vibration
0  09:00        120.5       0.05
1  09:10          NaN       0.06
2  09:20        119.8       0.04
3  09:30        121.2        NaN
4  09:40          NaN       0.05


=== Detecting Missing Values ===
--- Presence of missing values in each column ---
Time           False
Temperature     True
Vibration       True
dtype: bool

Is there missing data in the Temperature column: True

=== Removing Missing Values (Series) ===
--- Temperature Column After Removal ---
0    120.5
2    119.8
3    121.2
Name: Temperature, dtype: float64
Original count: 5 -> After removal: 3

=== Removing Missing Values (DataFrame) ===
--- DataFrame with rows containing missing values removed ---
    Time  Temperature  Vibration
0  09:00        120.5       0.05
2  09:20        119.8       0.04

Explanation

  • isnull(): This method has exactly the same function as isna(). You can use either, but it is desirable to be consistent within your project.
  • Behavior of dropna(): When executed on a DataFrame, the default behavior is to delete “rows where NaN exists in any column”.
    • how='all': Specify this when you want to delete rows only if all columns are NaN.
    • subset=['Temperature']: Specify this when you want to delete rows only if there are missing values in specific columns.

As a first step in data cleaning, it is necessary to grasp the status of missing data and decide whether to “delete” or “fill with a specific value (such as the mean)” according to the purpose of the analysis.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次