[Python] Leveraging Regex Flags: A Complete Guide to Multiline and DOTALL Modes

2025年12月11日

When handling text data containing newlines in Python’s re module, specifying flags is essential.

In particular, if you do not correctly understand the behavior of re.MULTILINE (for line-by-line matching) and re.DOTALL (for matching across newlines), you may not get the intended results.

In this article, I will explain the impact of these flags on special characters (^, $, .) and how to use them effectively, using chat logs as an example.

Correspondence Table of Special Characters and Flags
Implementation Example: Chat Log Analysis
Source Code
Execution Result
Explanation

1. Correspondence Table of Special Characters and Flags

Special Character	Default Behavior	re.MULTILINE	re.DOTALL
`.` (Dot)	Matches 1 char except newline	No change	Matches all chars including newline
`^` (Caret)	Start of the entire string only	Start of each line	No change
`$` (Dollar)	End of the entire string only	End of each line	No change

2. Implementation Example: Chat Log Analysis

We will assume a chat log where user messages may span multiple lines, and we will perform two different types of extractions.

Scenario

From the log format below, we want to extract:

Only the header line of each message (Username and Timestamp).
The entire speech block, including the header and the message body.

Source Code

import re

# Analysis target: Chat app log data
# Contains username and timestamp lines, followed by messages (potentially multi-line)
chat_log = """[UserA] 10:00
Hello everyone.
Check this out.

[UserB] 10:05
Good morning!
I will check it later.

[UserC] 10:10
Thanks."""

# Pattern A: Extract only lines starting with [User...]
# ^ : Start of line, .+ : 1 or more characters
pattern_line = r"^\[User.*\].+$"

# Pattern B: Extract the entire block from [User...] to the next empty line (or end)
# ^ : Start of line, .+? : Lazy match, $ : End of line (End of block)
pattern_block = r"^\[User.*?\].+?$"

print("--- 1. Using re.MULTILINE only ---")
# Match ^ and $ to the start/end of "each line"
# Result: Only header lines are extracted (message bodies are ignored)
headers = re.findall(pattern_line, chat_log, flags=re.MULTILINE)
for h in headers:
    print(f"Header found: {h}")

print("\n--- 2. Combining re.MULTILINE | re.DOTALL ---")
# MULTILINE : Matches ^ to the start position of each block
# DOTALL    : Makes . match newlines, including multi-line messages
# Result: The entire speech block for each user is extracted
blocks = re.findall(pattern_block, chat_log, flags=re.MULTILINE | re.DOTALL)

for i, block in enumerate(blocks, 1):
    print(f"--- Block {i} ---\n{block}")

Execution Result

--- 1. Using re.MULTILINE only ---
Header found: [UserA] 10:00
Header found: [UserB] 10:05
Header found: [UserC] 10:10

--- 2. Combining re.MULTILINE | re.DOTALL ---
--- Block 1 ---
[UserA] 10:00
Hello everyone.
Check this out.
--- Block 2 ---
[UserB] 10:05
Good morning!
I will check it later.
--- Block 3 ---
[UserC] 10:10
Thanks.

5. Explanation

1. Processing each line individually (`re.MULTILINE`)

By default, ^ only matches the very beginning of the entire string (before [UserA]).

By specifying re.MULTILINE, ^ is interpreted as the “start of each line,” allowing it to detect the start of the lines for [UserB] and [UserC] as well. This is effective when you want to list only the headers of a log.

2. Grouping ranges including newlines (`re.DOTALL`)

If the message body contains newlines, the standard . will stop matching at the newline.

By specifying re.DOTALL, . will also match the newline character (\n), allowing you to capture text blocks that span multiple lines at once.

You can specify these two flags simultaneously using the bitwise OR operator |, like flags=re.MULTILINE | re.DOTALL. This enables flexible searches that “start from the beginning of a line and include content spanning multiple lines.”

よかったらシェアしてね！