[Python] How to Extract Lines Containing Specific Strings from Multi-line Text

In text data processing, such as handling log files or CSV data, you often need to extract only the lines that contain specific keywords.

In Python, you can implement this process very simply and quickly by combining string splitting methods with list comprehensions. In this article, I will introduce a code example that filters and extracts only the lines containing specific errors from a system log.

目次

Table of Contents

  1. String Filtering Process
  2. Implementation Example: Extracting Error Logs
  3. Code Explanation

String Filtering Process

The procedure to extract lines containing a specific string is as follows:

  1. Split into lines: Break down the target text data into a list of lines using newline codes.
  2. Filtering: Use list comprehension to extract only the lines that match the condition (lines containing the specific string).
  3. Rejoin: Join the list of extracted lines back into a single text data string using newline codes.

Implementation Example: Extracting Error Logs

The following code is an example of extracting only lines containing the string [ERROR] from text data mimicking a system log.

def main():
    # Sample data mimicking system logs
    log_data = """[INFO] 2023-10-01 10:00:00 System booted successfully.
[INFO] 2023-10-01 10:05:23 User 'admin' logged in.
[ERROR] 2023-10-01 10:15:00 Database connection failed.
[WARNING] 2023-10-01 10:20:00 High memory usage detected.
[ERROR] 2023-10-01 10:22:30 Critical exception in module X.
[INFO] 2023-10-01 10:30:00 Scheduled maintenance started."""

    # 1. Split string line by line
    # splitlines() automatically handles newline codes like \n or \r\n
    lines = log_data.splitlines()

    # 2. Extract only lines containing "[ERROR]" using list comprehension
    # Check if the target string is contained in the line
    target_keyword = "[ERROR]"
    error_lines = [line for line in lines if target_keyword in line]

    # 3. Join extracted lines with newlines to create a new string
    filtered_log = "\n".join(error_lines)

    # Output result
    print("--- Extracted Error Log ---")
    print(filtered_log)

if __name__ == "__main__":
    main()

Code Explanation

  • splitlines() Method: This splits the string at line breaks and returns it as a list. While it is possible to write split('\n'), splitlines() is more robust because it automatically handles differences in line endings between operating systems (such as CRLF on Windows and LF on Unix-like systems).
  • List Comprehension: This refers to the [line for line in lines if target_keyword in line] part. Compared to the traditional way of using a for loop to append to an empty list, this method is faster and offers better readability.
  • in Operator: By writing if target_keyword in line, you can determine (True/False) whether the target_keyword is contained within the string line. It is a fundamental operator for performing simple partial match searches.

This method is applicable not only to log analysis but also to a wide range of uses, such as simple CSV data filtering or extracting configuration items that meet specific conditions.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次