In API responses or server logs, text is sometimes recorded in the \uXXXX format (Unicode Escape). This article explains how to decode such data back into its original string format using Python’s codecs module.
We will implement a system that parses a “Status Code” string received from a server.
Implementation Example: Analyzing Server Responses
In this scenario, we convert escape sequences contained in raw data (strings) received from a server into a human-readable format.
Source Code
import codecs
# 1. Received raw data (Unicode escaped string)
# Scenario: A message "Price: <Euro Symbol> 50" is received in escaped format.
# \u20AC represents the Euro sign (€)
raw_response = "Price: \\u20AC 50"
print(f"Received Data: {raw_response}")
print("-" * 30)
# 2. Decoding using codecs.decode
# Step A: First, convert the string to bytes (encode)
# Step B: Restore using codecs.decode specifying 'unicode-escape'
# Note: codecs.decode(obj, encoding) accepts bytes and returns a string
# Convert to bytes
encoded_bytes = raw_response.encode('utf-8')
# Execute decode
decoded_message = codecs.decode(encoded_bytes, 'unicode-escape')
print(f"Parsed Result: {decoded_message}")
print(f"Result Type : {type(decoded_message)}")
# 3. (Reference) Using the result in logic
if "€" in decoded_message:
print(">> Euro currency symbol detected.")
Execution Result
Received Data: Price: \u20AC 50
------------------------------
Parsed Result: Price: € 50
Result Type : <class 'str'>
>> Euro currency symbol detected.
Explanation
codecs.decode(obj, encoding)
By using the standard library codecs module, you can explicitly specify the encoding for conversion.
- Input: Must be bytes. Therefore, if your variable is a string (
str), you must first convert it to bytes using.encode()before passing it tocodecs.decode. - encoding=”unicode-escape”: Specifying this interprets sequences like
\uXXXXas their corresponding Unicode characters.
Use Cases
This method is frequently used in system integration where non-ASCII characters are passed in an escaped state, such as when handling JSON data parsing errors or reading Java property files.
