The standard String.Split method can only split text by fixed characters like commas or spaces. However, using Regex.Split allows you to convert strings into arrays using flexible rules, such as “splitting where a number appears” or “splitting by comma, semicolon, OR whitespace.”
Table of Contents
- Implementation Handling Multiple Delimiters
- Sample Code
- Execution Result
- Explanation and Technical Points
- Difference from String.Split
- Keeping the Delimiters (Using Parentheses)
- Removing Empty Strings
Implementation Handling Multiple Delimiters
In the following sample code, I implement a process to break down a string of “tags (keywords)” entered freely by a user into an array.
The user mixes “commas,” “semicolons,” “vertical bars (pipes),” and “spaces” as delimiters. By using regular expressions, we can process all of these at once.
Sample Code
using System;
using System.Text.RegularExpressions;
using System.Linq; // Used for removing empty strings
public class Program
{
public static void Main()
{
// Tag string entered by user
// Delimiters are inconsistent (comma, semicolon, pipe, space) and contain extra spaces
string inputTags = "C# , Java ; Python | Ruby GoLang";
Console.WriteLine($"Original String: \"{inputTags}\"\n");
// ---------------------------------------------------------
// Split using Regex Pattern
// ---------------------------------------------------------
// Meaning of the pattern:
// \s* : Whitespace appearing 0 or more times (leading space)
// [,;|] : Any one character of comma, semicolon, or pipe
// \s* : Whitespace appearing 0 or more times (trailing space)
// | : OR
// \s+ : Consecutive whitespace (for space-delimited parts without other symbols)
string pattern = @"\s*[,;|]\s*|\s+";
string[] result = Regex.Split(inputTags, pattern);
Console.WriteLine("--- Split Result ---");
// The split result might contain empty strings, so filter them out with Where
foreach (var tag in result.Where(s => !string.IsNullOrEmpty(s)))
{
Console.WriteLine($"Tag: {tag}");
}
}
}
Execution Result
Original String: "C# , Java ; Python | Ruby GoLang"
--- Split Result ---
Tag: C#
Tag: Java
Tag: Python
Tag: Ruby
Tag: GoLang
Explanation and Technical Points
1. Difference from String.Split
The biggest advantage of Regex.Split is the ability to split by “patterns.”
A simple String.Split(',') cannot remove whitespace around commas in data like "C# , Java". However, the regex \s*,\s* treats “the comma and its surrounding whitespace” as a single delimiter and removes it entirely, making a subsequent trimming step unnecessary.
2. Keeping the Delimiters (Using Parentheses)
Usually, the parts matched by Regex.Split (the delimiters) are removed from the result. However, if you enclose the regex pattern in parentheses () (making it a capture group), the delimiter itself will be included in the resulting array.
Example: Breaking a formula into numbers and operators
string formula = "100+200-50";
// By wrapping the pattern in parenthesis ([+\-]), operators are kept in the array
string[] tokens = Regex.Split(formula, @"([+\-])");
// Result: "100", "+", "200", "-", "50"
3. Removing Empty Strings
Depending on how the split pattern is written, “empty strings” may occur at the beginning, end, or between consecutive delimiters. Therefore, in practical data processing, it is standard practice to use LINQ’s .Where(s => !string.IsNullOrEmpty(s)) to extract only valid data.
