In text data processing, such as handling log files or CSV data, you often need to extract only the lines that contain specific keywords.
In Python, you can implement this process very simply and quickly by combining string splitting methods with list comprehensions. In this article, I will introduce a code example that filters and extracts only the lines containing specific errors from a system log.
Table of Contents
- String Filtering Process
- Implementation Example: Extracting Error Logs
- Code Explanation
String Filtering Process
The procedure to extract lines containing a specific string is as follows:
- Split into lines: Break down the target text data into a list of lines using newline codes.
- Filtering: Use list comprehension to extract only the lines that match the condition (lines containing the specific string).
- Rejoin: Join the list of extracted lines back into a single text data string using newline codes.
Implementation Example: Extracting Error Logs
The following code is an example of extracting only lines containing the string [ERROR] from text data mimicking a system log.
def main():
# Sample data mimicking system logs
log_data = """[INFO] 2023-10-01 10:00:00 System booted successfully.
[INFO] 2023-10-01 10:05:23 User 'admin' logged in.
[ERROR] 2023-10-01 10:15:00 Database connection failed.
[WARNING] 2023-10-01 10:20:00 High memory usage detected.
[ERROR] 2023-10-01 10:22:30 Critical exception in module X.
[INFO] 2023-10-01 10:30:00 Scheduled maintenance started."""
# 1. Split string line by line
# splitlines() automatically handles newline codes like \n or \r\n
lines = log_data.splitlines()
# 2. Extract only lines containing "[ERROR]" using list comprehension
# Check if the target string is contained in the line
target_keyword = "[ERROR]"
error_lines = [line for line in lines if target_keyword in line]
# 3. Join extracted lines with newlines to create a new string
filtered_log = "\n".join(error_lines)
# Output result
print("--- Extracted Error Log ---")
print(filtered_log)
if __name__ == "__main__":
main()
Code Explanation
splitlines()Method: This splits the string at line breaks and returns it as a list. While it is possible to writesplit('\n'),splitlines()is more robust because it automatically handles differences in line endings between operating systems (such as CRLF on Windows and LF on Unix-like systems).- List Comprehension: This refers to the
[line for line in lines if target_keyword in line]part. Compared to the traditional way of using aforloop toappendto an empty list, this method is faster and offers better readability. inOperator: By writingif target_keyword in line, you can determine (True/False) whether thetarget_keywordis contained within the stringline. It is a fundamental operator for performing simple partial match searches.
This method is applicable not only to log analysis but also to a wide range of uses, such as simple CSV data filtering or extracting configuration items that meet specific conditions.
