[C#] Splitting Strings by a Specified Regex Pattern (Regex.Split)

The standard String.Split method can only split text by fixed characters like commas or spaces. However, using Regex.Split allows you to convert strings into arrays using flexible rules, such as “splitting where a number appears” or “splitting by comma, semicolon, OR whitespace.”

目次

Table of Contents

  • Implementation Handling Multiple Delimiters
  • Sample Code
  • Execution Result
  • Explanation and Technical Points
    1. Difference from String.Split
    2. Keeping the Delimiters (Using Parentheses)
    3. Removing Empty Strings

Implementation Handling Multiple Delimiters

In the following sample code, I implement a process to break down a string of “tags (keywords)” entered freely by a user into an array.

The user mixes “commas,” “semicolons,” “vertical bars (pipes),” and “spaces” as delimiters. By using regular expressions, we can process all of these at once.

Sample Code

using System;
using System.Text.RegularExpressions;
using System.Linq; // Used for removing empty strings

public class Program
{
    public static void Main()
    {
        // Tag string entered by user
        // Delimiters are inconsistent (comma, semicolon, pipe, space) and contain extra spaces
        string inputTags = "C# , Java ;  Python | Ruby   GoLang";

        Console.WriteLine($"Original String: \"{inputTags}\"\n");

        // ---------------------------------------------------------
        // Split using Regex Pattern
        // ---------------------------------------------------------
        // Meaning of the pattern:
        // \s* : Whitespace appearing 0 or more times (leading space)
        // [,;|]  : Any one character of comma, semicolon, or pipe
        // \s* : Whitespace appearing 0 or more times (trailing space)
        // |      : OR
        // \s+    : Consecutive whitespace (for space-delimited parts without other symbols)
        string pattern = @"\s*[,;|]\s*|\s+";

        string[] result = Regex.Split(inputTags, pattern);

        Console.WriteLine("--- Split Result ---");
        
        // The split result might contain empty strings, so filter them out with Where
        foreach (var tag in result.Where(s => !string.IsNullOrEmpty(s)))
        {
            Console.WriteLine($"Tag: {tag}");
        }
    }
}

Execution Result

Original String: "C# , Java ;  Python | Ruby   GoLang"

--- Split Result ---
Tag: C#
Tag: Java
Tag: Python
Tag: Ruby
Tag: GoLang

Explanation and Technical Points

1. Difference from String.Split

The biggest advantage of Regex.Split is the ability to split by “patterns.”

A simple String.Split(',') cannot remove whitespace around commas in data like "C# , Java". However, the regex \s*,\s* treats “the comma and its surrounding whitespace” as a single delimiter and removes it entirely, making a subsequent trimming step unnecessary.

2. Keeping the Delimiters (Using Parentheses)

Usually, the parts matched by Regex.Split (the delimiters) are removed from the result. However, if you enclose the regex pattern in parentheses () (making it a capture group), the delimiter itself will be included in the resulting array.

Example: Breaking a formula into numbers and operators

string formula = "100+200-50";

// By wrapping the pattern in parenthesis ([+\-]), operators are kept in the array
string[] tokens = Regex.Split(formula, @"([+\-])");

// Result: "100", "+", "200", "-", "50"

3. Removing Empty Strings

Depending on how the split pattern is written, “empty strings” may occur at the beginning, end, or between consecutive delimiters. Therefore, in practical data processing, it is standard practice to use LINQ’s .Where(s => !string.IsNullOrEmpty(s)) to extract only valid data.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次