Splitting Strings into Lists in Python: Usage of split() and Whitespace Handling

When parsing CSV data or breaking down sentences into words, you often need to split a single long string into multiple parts according to specific rules.

Python’s string type provides the standard split() method for this purpose.

In this article, I will explain the basic usage of split() and the important differences in behavior, especially when “splitting by whitespace.”

目次

Table of Contents

  1. Splitting by specifying a delimiter
  2. Splitting by whitespace (Difference based on arguments)
  3. Limiting the number of splits (maxsplit)
  4. Summary

1. Splitting by specifying a delimiter

If you specify a “delimiter” in the first argument of the split() method, the string is divided at the places where that character appears and is returned as a list.

Syntax:

list_variable = string_variable.split(delimiter)

Specific Example: Comma-Separated Data (CSV)

# Data separated by commas
csv_line = "apple,banana,orange,grape"

# Specify "," as the delimiter
fruit_list = csv_line.split(",")

print(f"Original  : {csv_line}")
print(f"Split List: {fruit_list}")

Execution Result:

Original  : apple,banana,orange,grape
Split List: ['apple', 'banana', 'orange', 'grape']

2. Splitting by whitespace (Difference based on arguments)

When dividing sentences into words, you often want to split by “whitespace (space).” However, the behavior differs significantly depending on whether you pass an argument to split().

No argument split(): Handles whitespace smartly (Recommended)

If you omit the argument, it treats all whitespace characters (spaces, tabs, newlines) as delimiters. Furthermore, consecutive whitespace is treated as a single delimiter, and extra leading or trailing whitespace is ignored. This is usually the preferred method.

text = "  Python   is   fun.  "

# No argument (Smart processing of whitespace)
words = text.split()

print(f"No argument: {words}")

Execution Result:

No argument: ['Python', 'is', 'fun.']

With argument split(" "): Strictly splits by single space

If you explicitly specify a half-width space " ", the method splits “once per space”. Therefore, if spaces are consecutive, empty strings '' are generated between them.

text = "  Python   is   fun.  "

# Explicitly specify " "
words_strict = text.split(" ")

print(f"With argument: {words_strict}")

Execution Result:

With argument: ['', '', 'Python', '', '', 'is', '', '', 'fun.', '', '']

As shown above, this results in a list full of empty strings, making it unsuitable for simple word splitting.


3. Limiting the number of splits (maxsplit)

By specifying a number (maxsplit) as the second argument, you can limit how many times the split occurs. The remaining part is not split and is stored as a single element at the end of the list.

data = "Key:Value:Detail:Info"

# Split only the first time (useful for separating key and value)
result = data.split(":", 1)

print(result)

Execution Result:

['Key', 'Value:Detail:Info']

Summary

  • str.split(sep): Splits the string by the specified sep and returns a list.
  • str.split() (No argument): Ignores consecutive whitespace and splits by word. This is best for text analysis.
  • str.split(" "): Splits strictly by each space, so consecutive spaces result in empty strings.
  • maxsplit: Use this when you want to limit the number of splits.
よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次