[Python] Restoring Unicode Escape Strings: Using the codecs Module

In API responses or server logs, text is sometimes recorded in the \uXXXX format (Unicode Escape). This article explains how to decode such data back into its original string format using Python’s codecs module.

We will implement a system that parses a “Status Code” string received from a server.

目次

Implementation Example: Analyzing Server Responses

In this scenario, we convert escape sequences contained in raw data (strings) received from a server into a human-readable format.

Source Code

import codecs

# 1. Received raw data (Unicode escaped string)
# Scenario: A message "Price: <Euro Symbol> 50" is received in escaped format.
# \u20AC represents the Euro sign (€)
raw_response = "Price: \\u20AC 50"

print(f"Received Data: {raw_response}")
print("-" * 30)

# 2. Decoding using codecs.decode
# Step A: First, convert the string to bytes (encode)
# Step B: Restore using codecs.decode specifying 'unicode-escape'
# Note: codecs.decode(obj, encoding) accepts bytes and returns a string

# Convert to bytes
encoded_bytes = raw_response.encode('utf-8')

# Execute decode
decoded_message = codecs.decode(encoded_bytes, 'unicode-escape')

print(f"Parsed Result: {decoded_message}")
print(f"Result Type  : {type(decoded_message)}")

# 3. (Reference) Using the result in logic
if "€" in decoded_message:
    print(">> Euro currency symbol detected.")

Execution Result

Received Data: Price: \u20AC 50
------------------------------
Parsed Result: Price: € 50
Result Type  : <class 'str'>
>> Euro currency symbol detected.

Explanation

codecs.decode(obj, encoding)

By using the standard library codecs module, you can explicitly specify the encoding for conversion.

  • Input: Must be bytes. Therefore, if your variable is a string (str), you must first convert it to bytes using .encode() before passing it to codecs.decode.
  • encoding=”unicode-escape”: Specifying this interprets sequences like \uXXXX as their corresponding Unicode characters.

Use Cases

This method is frequently used in system integration where non-ASCII characters are passed in an escaped state, such as when handling JSON data parsing errors or reading Java property files.

よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次