Text to Unicode
Transform text into Unicode escape sequences and decode them.
Text to Unicode
What This Tool Does
- Text to Unicode converts text into Unicode escape sequences (\uXXXX notation) and vice versa for code, config files, and data export.
- Generate escaped Unicode for JSON, JavaScript, HTML, and other formats that require percent or backslash encoding.
Usage
- Choose text-to-escape (text → Unicode escapes) or escape-to-text (Unicode escapes → text) mode.
- Provide input and run conversion.
- Review output for correctness.
- Copy escaped output for use in JSON, JavaScript, HTML, XML, or config files.
Examples
- Generate escaped content for JSON fixtures containing non-ASCII: "hello" → "\u0068\u0065\u006c\u006c\u006f".
- Decode legacy escaped strings from API logs to inspect actual text content.
- Create Unicode escape sequences for emoji and international characters in JavaScript.
- Escape special characters in XML/HTML attributes.
Limitations
- Results should be validated in your target runtime before production use.
- Extremely large input payloads may be constrained by browser memory and performance limits.
Common Mistakes
- Wrong escape format in different contexts: \uXXXX (JavaScript/JSON), &#xXXXX; (HTML), %uXXXX (URL) are not interchangeable.
- Surrogate pair confusion: Characters > FFFF need TWO \u sequences in JavaScript. Missing pair breaks decoding.
- Encoding > U+FFFF without pairs: 4-digit \u only covers BMP. Emoji (> FFFF) require surrogate pairs or ES6 \u{} syntax.
- Malformed escapes: \u41G (G is invalid hex), \u041 (only 3 digits), \u41\u42\u43 (no delimiter) all fail.
- Context mismatch: Escaped JSON string must use \" for quotes inside. Forgetting escaping breaks JSON parse.
- Assuming all escapes are Unicode: \n, \t, \r are control character escapes, not Unicode escapes (though \u000A = newline).
Technical Reference Guide
- Unicode escape: \uXXXX notation where XXXX = 4 hex digits. Example: A = \u0041, é = \u00E9.
- BMP (Basic Multilingual Plane): Unicode code points 0000–FFFF representable in \uXXXX format.
- Supplementary planes: Code points > FFFF use \uXXXX\uXXXX pairs (surrogate pairs in JavaScript) or \U notation.
- JSON escaping: JSON supports \uXXXX for Unicode characters. Also supports escape sequences: \", \\, \/, \b, \f, \n, \r, \t.
- JavaScript escaping: Supports \uXXXX and \u{XXXXX} (ES6+). Also \xXX for Latin-1 (0–FF).
- UTF-16 surrogates: JavaScript internally uses UTF-16. Characters > FFFF require two \u sequences (surrogate pairs).
- HTML escaping: Uses &#XXXXX; (decimal) or &#xXXXX; (hex) for character references, not \uXXXX notation.
Specifications & Standards
FAQ
Which escape style is used?
The tool uses standard \uXXXX escape formatting for BMP code points (0000–FFFF). Supplementary planes may use surrogate pairs.
Can malformed escapes be decoded?
Malformed sequences (wrong digit count, invalid hex) are flagged as errors. Correct them before reliable decoding.
What about emoji?
Emoji are > FFFF Unicode. Represented as surrogate pairs in JavaScript: 👍 = \uD83D\uDC4D.
How do I escape quotes in JSON?
Use \" for double quotes inside JSON strings. Single quotes do not need escaping inside double-quoted JSON strings.
Are HTML character entities the same as Unicode escapes?
No. HTML uses &#xHEXX; (hex) or &#DECIMAL; (decimal). Unicode escapes \uXXXX are JavaScript/JSON specific.
Can I use \uXXXX in HTML directly?
No. HTML does not recognize \uXXXX notation. Use &#xXXXX; or &#DECIMAL; in HTML. \uXXXX is for JavaScript/JSON only.
Related Tools
Explore related utilities inside the Data Workshop workshop for complementary engineering workflows.
View all Data Workshop tools