[C#] How to Safely Handle HTML Special Characters via Encoding and Decoding

目次

Overview

This process involves converting special characters such as < and > into a harmless format (character entities) when displaying user input on a web page. This is called escaping. It is used to prevent Cross-Site Scripting (XSS) attacks and to stop the browser from misinterpreting text as HTML tags, which can break the page layout. This article also explains how to decode these strings back to their original state.

Specifications (Input/Output)

  • Input: A string containing HTML tags or special characters.
  • Output: An encoded string (e.g., converted to &lt;) and a decoded string restored to its original form.
  • Prerequisite: Uses the standard .NET library (System.Net).

Basic Usage

Use the static methods of the System.Net.WebUtility class.

string rawText = "<div>Test</div>";

// Encoding: <div>Test</div> -> &lt;div&gt;Test&lt;/div&gt;
string safeText = WebUtility.HtmlEncode(rawText);

// Decoding: &lt;div&gt; -> <div>
string originalText = WebUtility.HtmlDecode(safeText);

Full Code Example

The following code simulates handling a user comment that might contain malicious script tags to ensure it is displayed safely.

using System;
using System.Net;

class Program
{
    static void Main()
    {
        // 1. User input (containing malicious code and special characters)
        string userInput = "<script>alert('XSS Attack!');</script> \"Rock & Roll\"";

        Console.WriteLine("--- 1. Original String ---");
        Console.WriteLine(userInput);

        // 2. HTML Encoding (Sanitization)
        // Converts special characters like <, >, &, and " into HTML entities
        string encodedContent = WebUtility.HtmlEncode(userInput);

        Console.WriteLine("\n--- 2. Encoded String (Safe) ---");
        Console.WriteLine(encodedContent);
        // Output: &lt;script&gt;alert(&#39;XSS Attack!&#39;);&lt;/script&gt; &quot;Rock &amp; Roll&quot;

        // 3. HTML Decoding (Restoration)
        // Used when retrieving data from a database to display in an editor
        string decodedContent = WebUtility.HtmlDecode(encodedContent);

        Console.WriteLine("\n--- 3. Decoded String (Restored) ---");
        Console.WriteLine(decodedContent);
        
        // Verification
        if (userInput == decodedContent)
        {
            Console.WriteLine("\nResult: Successfully restored.");
        }
    }
}

Customization Points

  • URL Encoding: If you want to make a string safe for a URL parameter instead of an HTML body, use WebUtility.UrlEncode. The rules are different (e.g., spaces become +).
  • Embedding in JavaScript: If you are placing values inside a <script> block, WebUtility.HtmlEncode might not be enough. Consider using JavaScript-specific escaping like HttpUtility.JavaScriptStringEncode.

Important Notes

  • WebUtility vs HttpUtility: Older ASP.NET applications used System.Web.HttpUtility. However, System.Net.WebUtility is the standard for .NET Core and console applications.
  • Double Encoding: If you encode an already encoded string, it becomes unreadable (e.g., &amp;lt;). Check if your data is already escaped.
  • HTML Attributes: HtmlEncode is mainly for text content inside elements. If you embed values in HTML attributes (like href="..."), ensure quotes are handled correctly. WebUtility converts single quotes to &#39;, which is generally safe.

Advanced Application

More Precise Control with HtmlEncoder (.NET Core / 5+)

If you prioritize performance or want to prevent Japanese characters from being escaped (the default might convert them to &#x...;), use System.Text.Encodings.Web.

using System.Text.Encodings.Web;
using System.Text.Unicode;

// Create a setting that does not escape Japanese characters
var options = new System.Text.Encodings.Web.TextEncoderSettings(UnicodeRanges.All);
var encoder = HtmlEncoder.Create(options);

string text = "<p>Hello (こんにちは)</p>";
string result = encoder.Encode(text); 
// Result: &lt;p&gt;Hello (こんにちは)&lt;/p&gt;

Conclusion

HTML encoding is a fundamental and critical process for preventing XSS vulnerabilities. Always perform HTML encoding when handling user input in web applications or HTML emails. For standard use, System.Net.WebUtility.HtmlEncode is the most convenient tool.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次