Text to Unicode

Transform text into Unicode escape sequences and decode them.

Text to Unicode

What This Tool Does

  • Text to Unicode converts text into Unicode escape sequences (\uXXXX notation) and vice versa for code, config files, and data export.
  • Generate escaped Unicode for JSON, JavaScript, HTML, and other formats that require percent or backslash encoding.

Usage

  1. Choose text-to-escape (text → Unicode escapes) or escape-to-text (Unicode escapes → text) mode.
  2. Provide input and run conversion.
  3. Review output for correctness.
  4. Copy escaped output for use in JSON, JavaScript, HTML, XML, or config files.

Examples

  • Generate escaped content for JSON fixtures containing non-ASCII: "hello" → "\u0068\u0065\u006c\u006c\u006f".
  • Decode legacy escaped strings from API logs to inspect actual text content.
  • Create Unicode escape sequences for emoji and international characters in JavaScript.
  • Escape special characters in XML/HTML attributes.

Limitations

  • Results should be validated in your target runtime before production use.
  • Extremely large input payloads may be constrained by browser memory and performance limits.

Common Mistakes

  • Wrong escape format in different contexts: \uXXXX (JavaScript/JSON), &#xXXXX; (HTML), %uXXXX (URL) are not interchangeable.
  • Surrogate pair confusion: Characters > FFFF need TWO \u sequences in JavaScript. Missing pair breaks decoding.
  • Encoding > U+FFFF without pairs: 4-digit \u only covers BMP. Emoji (> FFFF) require surrogate pairs or ES6 \u{} syntax.
  • Malformed escapes: \u41G (G is invalid hex), \u041 (only 3 digits), \u41\u42\u43 (no delimiter) all fail.
  • Context mismatch: Escaped JSON string must use \" for quotes inside. Forgetting escaping breaks JSON parse.
  • Assuming all escapes are Unicode: \n, \t, \r are control character escapes, not Unicode escapes (though \u000A = newline).

Technical Reference Guide

  • Unicode escape: \uXXXX notation where XXXX = 4 hex digits. Example: A = \u0041, é = \u00E9.
  • BMP (Basic Multilingual Plane): Unicode code points 0000–FFFF representable in \uXXXX format.
  • Supplementary planes: Code points > FFFF use \uXXXX\uXXXX pairs (surrogate pairs in JavaScript) or \U notation.
  • JSON escaping: JSON supports \uXXXX for Unicode characters. Also supports escape sequences: \", \\, \/, \b, \f, \n, \r, \t.
  • JavaScript escaping: Supports \uXXXX and \u{XXXXX} (ES6+). Also \xXX for Latin-1 (0–FF).
  • UTF-16 surrogates: JavaScript internally uses UTF-16. Characters > FFFF require two \u sequences (surrogate pairs).
  • HTML escaping: Uses &#XXXXX; (decimal) or &#xXXXX; (hex) for character references, not \uXXXX notation.

FAQ

  • Which escape style is used?

    The tool uses standard \uXXXX escape formatting for BMP code points (0000–FFFF). Supplementary planes may use surrogate pairs.

  • Can malformed escapes be decoded?

    Malformed sequences (wrong digit count, invalid hex) are flagged as errors. Correct them before reliable decoding.

  • What about emoji?

    Emoji are > FFFF Unicode. Represented as surrogate pairs in JavaScript: 👍 = \uD83D\uDC4D.

  • How do I escape quotes in JSON?

    Use \" for double quotes inside JSON strings. Single quotes do not need escaping inside double-quoted JSON strings.

  • Are HTML character entities the same as Unicode escapes?

    No. HTML uses &#xHEXX; (hex) or &#DECIMAL; (decimal). Unicode escapes \uXXXX are JavaScript/JSON specific.

  • Can I use \uXXXX in HTML directly?

    No. HTML does not recognize \uXXXX notation. Use &#xXXXX; or &#DECIMAL; in HTML. \uXXXX is for JavaScript/JSON only.

Related Tools

Explore related utilities inside the Data Workshop workshop for complementary engineering workflows.

View all Data Workshop tools