[C#] Safely Handling HTML Special Characters with Encoding and Decoding

目次

Overview

HTML encoding is the process of converting special characters like < and > into a harmless format known as character entities (e.g., &lt; and &gt;). This is a critical security measure used to prevent Cross-Site Scripting (XSS) attacks and to stop browsers from misinterpreting text as HTML tags, which can break your page layout. This article also covers how to decode these strings back to their original form.


Specifications (Input/Output)

  • Input: A string containing HTML tags or special symbols.
  • Output: An encoded string (sanitized for HTML) and a decoded string (restored to the original text).
  • Prerequisite: Uses the standard .NET library (System.Net).

Basic Usage

Use the static methods provided by the System.Net.WebUtility class.

string rawText = "<div>Test</div>";

// Encoding: <div>Test</div> -> &lt;div&gt;Test&lt;/div&gt;
string safeText = WebUtility.HtmlEncode(rawText);

// Decoding: &lt;div&gt; -> <div>
string originalText = WebUtility.HtmlDecode(safeText);

Full Code Example

This example simulates a scenario where a user submits a comment. We ensure that even if the comment contains malicious script tags, it is handled and displayed safely.

using System;
using System.Net;

class Program
{
    static void Main()
    {
        // 1. User input (simulating a string with malicious code and special characters)
        string userInput = "<script>alert('XSS Attack!');</script> \"Rock & Roll\"";

        Console.WriteLine("--- 1. Original String ---");
        Console.WriteLine(userInput);

        // 2. HTML Encoding (Sanitization)
        // Converts characters like <, >, &, and " into HTML entities
        string encodedContent = WebUtility.HtmlEncode(userInput);

        Console.WriteLine("\n--- 2. Encoded String (Safe for HTML) ---");
        Console.WriteLine(encodedContent);
        // Result: &lt;script&gt;alert(&#39;XSS Attack!&#39;);&lt;/script&gt; &quot;Rock &amp; Roll&quot;

        // 3. HTML Decoding (Restoration)
        // Useful when you need to bring the data back to its original state for editing
        string decodedContent = WebUtility.HtmlDecode(encodedContent);

        Console.WriteLine("\n--- 3. Decoded String (Restored) ---");
        Console.WriteLine(decodedContent);
        
        // Restoration Check
        if (userInput == decodedContent)
        {
            Console.WriteLine("\nResult: String was perfectly restored.");
        }
    }
}

Customization Points

  • URL Encoding: If you need to make a string safe for a URL parameter (where spaces become + or %20), use WebUtility.UrlEncode instead.
  • JavaScript Strings: If you are embedding a value directly into a <script> block, HtmlEncode may not be sufficient. Consider using HttpUtility.JavaScriptStringEncode for JavaScript-specific escaping.

Important Notes

  • WebUtility vs. HttpUtility: Older ASP.NET applications used System.Web.HttpUtility. For modern .NET (Core/5+) and console applications, System.Net.WebUtility is the standard and does not require extra dependencies.
  • Double Encoding: Avoid encoding a string that is already encoded. This results in broken text like &amp;lt;. Always keep track of whether your data is currently “raw” or “safe.”
  • HTML Attributes: While HtmlEncode is great for text between tags, be careful when placing values inside HTML attributes (like href="..."). WebUtility converts single quotes to &#39;, which is generally safe, but always verify based on the specific attribute context.

Advanced Application

Precise Control with HtmlEncoder (.NET Core / 5+)

For high-performance scenarios or to prevent Japanese characters from being escaped (as they sometimes are by default), use System.Text.Encodings.Web.

using System.Text.Encodings.Web;
using System.Text.Unicode;

// Create settings that allow all Unicode ranges (preventing Japanese from being escaped)
var options = new System.Text.Encodings.Web.TextEncoderSettings(UnicodeRanges.All);
var encoder = HtmlEncoder.Create(options);

string text = "<p>Hello (こんにちは)</p>";
string result = encoder.Encode(text); 
// Result: &lt;p&gt;Hello (こんにちは)&lt;/p&gt;

Conclusion

HTML encoding is your first line of defense against XSS vulnerabilities. Whenever you display user-provided content in a web application or an HTML email, ensure it is properly encoded. For most standard C# applications, System.Net.WebUtility.HtmlEncode is the most effective and straightforward tool for the job.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次