When parsing CSV data or breaking down sentences into words, you often need to split a single long string into multiple parts according to specific rules.
Python’s string type provides the standard split() method for this purpose.
In this article, I will explain the basic usage of split() and the important differences in behavior, especially when “splitting by whitespace.”
Table of Contents
- Splitting by specifying a delimiter
- Splitting by whitespace (Difference based on arguments)
- Limiting the number of splits (maxsplit)
- Summary
1. Splitting by specifying a delimiter
If you specify a “delimiter” in the first argument of the split() method, the string is divided at the places where that character appears and is returned as a list.
Syntax:
list_variable = string_variable.split(delimiter)
Specific Example: Comma-Separated Data (CSV)
# Data separated by commas
csv_line = "apple,banana,orange,grape"
# Specify "," as the delimiter
fruit_list = csv_line.split(",")
print(f"Original : {csv_line}")
print(f"Split List: {fruit_list}")
Execution Result:
Original : apple,banana,orange,grape
Split List: ['apple', 'banana', 'orange', 'grape']
2. Splitting by whitespace (Difference based on arguments)
When dividing sentences into words, you often want to split by “whitespace (space).” However, the behavior differs significantly depending on whether you pass an argument to split().
No argument split(): Handles whitespace smartly (Recommended)
If you omit the argument, it treats all whitespace characters (spaces, tabs, newlines) as delimiters. Furthermore, consecutive whitespace is treated as a single delimiter, and extra leading or trailing whitespace is ignored. This is usually the preferred method.
text = " Python is fun. "
# No argument (Smart processing of whitespace)
words = text.split()
print(f"No argument: {words}")
Execution Result:
No argument: ['Python', 'is', 'fun.']
With argument split(" "): Strictly splits by single space
If you explicitly specify a half-width space " ", the method splits “once per space”. Therefore, if spaces are consecutive, empty strings '' are generated between them.
text = " Python is fun. "
# Explicitly specify " "
words_strict = text.split(" ")
print(f"With argument: {words_strict}")
Execution Result:
With argument: ['', '', 'Python', '', '', 'is', '', '', 'fun.', '', '']
As shown above, this results in a list full of empty strings, making it unsuitable for simple word splitting.
3. Limiting the number of splits (maxsplit)
By specifying a number (maxsplit) as the second argument, you can limit how many times the split occurs. The remaining part is not split and is stored as a single element at the end of the list.
data = "Key:Value:Detail:Info"
# Split only the first time (useful for separating key and value)
result = data.split(":", 1)
print(result)
Execution Result:
['Key', 'Value:Detail:Info']
Summary
str.split(sep): Splits the string by the specifiedsepand returns a list.str.split()(No argument): Ignores consecutive whitespace and splits by word. This is best for text analysis.str.split(" "): Splits strictly by each space, so consecutive spaces result in empty strings.maxsplit: Use this when you want to limit the number of splits.
