In regular expressions, quantifiers such as * (0 or more) and + (1 or more) behave greedily by default. This means they try to match “the longest possible string” that satisfies the condition.
On the other hand, if you want to get the “minimum range” up to a specific delimiter, you need to switch to non-greedy (also called Lazy or Reluctant) behavior.
This article explains how to prevent unintended long matches and extract only the necessary parts accurately.
Table of Contents
- The Problem to Solve
- Implementation Example: Extracting Configuration Values
- Source Code
- Execution Result
- Explanation
The Problem to Solve
Consider a scenario where you want to extract values enclosed in quotes " from a configuration file or code.
If you simply write ".*", a problem occurs where it matches from the first quote of the line to the very last quote, swallowing the delimiters and other values in between.
Implementation Example: Extracting Configuration Values
Below is code that extracts values individually from a string containing multiple parameters. We will compare the behavior of greedy and lazy patterns.
Source Code
import re
# Text to be analyzed (simulating configuration values)
# Target: We want to extract individual values enclosed in double quotes ("10s", "3", "True")
config_text = 'timeout="10s"; retries="3"; verbose="True";'
# 1. Greedy Match Pattern
# .* : Matches any character as long as possible
# Result: Matches from the first " to the last " as a single string
greedy_pattern = r'".*"'
# 2. Lazy (Shortest) Match Pattern
# .*? : Matches any character, but stops as soon as possible
# Result: Matches from " to the next immediate " individually
lazy_pattern = r'".*?"'
# Execution and Result Display
print("--- Greedy Match ---")
greedy_matches = re.findall(greedy_pattern, config_text)
for m in greedy_matches:
print(f"Match: {m}")
print("\n--- Lazy Match ---")
lazy_matches = re.findall(lazy_pattern, config_text)
for m in lazy_matches:
print(f"Match: {m}")
Execution Result
--- Greedy Match ---
Match: "10s"; retries="3"; verbose="True"
--- Lazy Match ---
Match: "10s"
Match: "3"
Match: "True"
Explanation
Greedy Behavior
When using the pattern r'".*"', the regular expression engine finds the first " and then looks for the " furthest away (towards the end of the line). As a result, the entire string, including separators like ; retries=, is extracted as a single match.
Switching to Lazy Matching
You can change the behavior to “Lazy Match” (or non-greedy) by appending a ? immediately after the quantifier (*, +, ?, {m,n}).
*(Greedy) →*?(Lazy)+(Greedy) →+?(Lazy)
In the code above, by using r'".*?"', the condition becomes “start from the first " and stop at the next found ",” allowing individual values to be correctly split and extracted.
This lazy matching is essential when parsing structures with clear closing characters, such as extracting HTML tags (<div>...</div>).
