[Python] Efficiently Removing Blank Lines from Text Data

2025年12月8日

Text data read from external files or obtained via web scraping often contains unnecessary blank lines. As a preliminary step in data processing, you may want to remove these empty lines (or lines containing only whitespace) to clean up the data.

In Python, you can implement this cleansing process concisely by combining string manipulation methods with list comprehensions. Here, I will introduce a code example for formatting text data that contains irregular blank lines.

Logic for Removing Blank Lines
Implementation Example: Formatting Address Data
Code Explanation

Logic for Removing Blank Lines

The basic approach to removing blank lines is as follows:

Line Splitting: Convert the entire text into a list of lines.
Judgment and Extraction: For each line, check “if the string remains after removing whitespace characters” and extract only the lines that meet this condition.
Rejoining: Join the extracted lines back together.

Implementation Example: Formatting Address Data

The following code is an example of compacting and formatting address data that contains irregular blank lines, such as those caused by manual entry.

def main():
    # Text data containing irregular blank lines or lines with only spaces
    raw_address_data = """
Tokyo, Shinjuku
    
Osaka, Umeda
   
   
Nagoya, Sakae

Fukuoka, Hakata
"""

    # 1. Split the string into lines
    # splitlines() handles newline codes (\n, \r\n) appropriately
    lines = raw_address_data.splitlines()

    # 2. Create a new list excluding blank lines
    # line.strip() removes leading and trailing whitespace.
    # Keep in the list only if the string exists (True) after removing whitespace.
    clean_lines = [line for line in lines if line.strip()]

    # 3. Join the formatted lines with newline codes
    formatted_text = "\n".join(clean_lines)

    # Output result
    print("--- Data Before Formatting (For Check) ---")
    print(f"'{raw_address_data}'")
    print("\n--- Data After Formatting ---")
    print(formatted_text)

if __name__ == "__main__":
    main()

Code Explanation

Judgment using `line.strip()`

The strip() method removes all whitespace characters (spaces, tabs, newlines, etc.) from the beginning and end of a string.

Line with text: " Tokyo " -> "Tokyo" (Evaluated as True)
Empty line or line with only spaces: " " -> "" (Evaluated as False)

By utilizing this property and writing if line.strip():, you can filter out lines that contain no substantial data.

Advantage of `splitlines()`

If you use split('\n'), unintended behavior may occur, such as an empty string being added to the end of the list if the last line contains a newline code. splitlines() handles these edge cases appropriately, making it more suitable for line-by-line processing.

This process is useful for many text processing tasks, such as pre-processing for CSV file loading or normalizing user input text.

よかったらシェアしてね！