[Python] How to Convert Strings to Unicode Escape Format

There are cases where you need to convert multi-byte characters (such as Japanese or emojis) into an ASCII-only string composed of “Unicode escape sequences” in the format \uXXXX. This is often used for data transmission to legacy systems that only support ASCII, or to prevent character encoding issues in configuration files.

In Python, there are two main ways to achieve this: using the built-in ascii() function and using the string method .encode("unicode-escape").

目次

1. Using the ascii() Function

The ascii() function returns a string representation (repr) of an object containing only ASCII characters. Non-ASCII characters are automatically escaped into the \u format.

Implementation Example

# String to convert (Japanese Kanji for "Mt. Fuji")
original_text = "富士山"

# Convert using the ascii() function
# The return value is a string (str), but it is wrapped in quotes (')
escaped_text = ascii(original_text)

print(f"Original : {original_text}")
print(f"Converted: {escaped_text}")
print(f"Type     : {type(escaped_text)}")

Execution Result

Original : 富士山
Converted: '\u5bcc\u58eb\u5c71'
Type     : <class 'str'>

Note: The result of ascii() always includes single quotes ' at the beginning and end.

2. Using .encode(“unicode-escape”)

For more practical data processing where you want a pure escaped string without the surrounding quotes, use the .encode() method.

Implementation Example

original_text = "富士山"

# 1. Encode with the unicode-escape codec
# This converts it into a bytes object
escaped_bytes = original_text.encode("unicode-escape")

print(f"Bytes : {escaped_bytes}")

# 2. Decode to a string (str) if necessary
# Use this if you want to handle the result as a standard string
escaped_str = escaped_bytes.decode("utf-8")

print(f"String: {escaped_str}")

Execution Result

Bytes : b'\\u5bcc\\u58eb\\u5c71'
String: \u5bcc\u58eb\u5c71

Explanation

Usage Differences

  • ascii(obj): Best for debugging or logging. Use this when you just want to quickly check the character codes inside a string.
  • .encode("unicode-escape"): Best for system development (saving to files, communication protocols). Use this when you need accurate data conversion without extra quotes.

Reverse Conversion (Decoding)

If you want to convert a Unicode escaped string (e.g., \u5bcc\u58eb\u5c71) back to its original text, you can decode it using unicode-escape as follows:

# Define as a raw string to treat backslashes literally
s = r"\u5bcc\u58eb\u5c71" 

# Encode to bytes, then decode using unicode-escape
print(s.encode().decode("unicode-escape")) 
# Output: 富士山
よかったらシェアしてね!
  • URLをコピーしました!
  • URLをコピーしました!

この記事を書いた人

私が勉強したこと、実践したこと、してることを書いているブログです。
主に資産運用について書いていたのですが、
最近はプログラミングに興味があるので、今はそればっかりです。

目次