Unicode Converter — Text to Unicode Online

Convert any text to Unicode code points (U+XXXX format) or decode code points back to characters. Supports all Unicode planes including emoji.

What Is Unicode?

Unicode is the universal standard for encoding text in computers. It assigns a unique numerical identifier — called a code point — to every character in every writing system ever used by humans, plus technical symbols, mathematical notation, musical notation, and emoji. As of the latest version, Unicode defines over 154,000 characters from 161 scripts, making it possible to represent text from any language in a single consistent encoding.

Code points are written in the format U+XXXX, where XXXX is a hexadecimal number. The basic Latin letter A is U+0041, the Greek letter alpha is U+03B1, the Chinese character for "water" is U+6C34, and the thumbs-up emoji is U+1F44D. The full range extends from U+0000 to U+10FFFF, providing space for over 1.1 million characters.

The Problem Unicode Solves

Before Unicode, the computing world was fragmented across dozens of incompatible character encoding standards. ASCII covered only English. ISO 8859-1 added Western European languages. Shift_JIS handled Japanese. GB2312 handled Chinese. Each standard could represent only a subset of the world's writing systems, and text encoded in one standard would display as garbled characters (mojibake) on systems using a different standard.

Unicode solved this by creating a single, comprehensive encoding that encompasses all characters from all writing systems. A document encoded in Unicode can seamlessly mix English, Chinese, Arabic, Hindi, emoji, and mathematical symbols without any encoding conflicts. This universality is why Unicode has become the default text encoding on the modern web and in modern operating systems.

Unicode Encodings: UTF-8, UTF-16, UTF-32

Unicode code points are abstract numbers — the actual binary representation used to store and transmit these numbers is defined by Unicode Transformation Formats (UTFs). UTF-8 is the dominant encoding on the web, used by over 98% of websites. It uses 1 to 4 bytes per character and is backward-compatible with ASCII. UTF-16 is used internally by Windows and Java, using 2 or 4 bytes per character. UTF-32 uses a fixed 4 bytes per character, which is simpler but less space-efficient.

Unicode Planes and Blocks

Unicode organizes its code points into 17 planes, each containing 65,536 code points. Plane 0 (U+0000 to U+FFFF) is the Basic Multilingual Plane (BMP), containing the most commonly used characters from all modern writing systems. Plane 1 is the Supplementary Multilingual Plane, containing emoji, musical symbols, and historical scripts. Plane 2 contains additional CJK (Chinese/Japanese/Korean) ideographs. Planes 3-13 are largely unassigned, reserved for future use.

Working with Unicode in Programming

Modern programming languages provide native Unicode support. In JavaScript, String.fromCodePoint(0x1F600) creates the grinning face emoji, and "A".codePointAt(0) returns 65. Python 3 strings are Unicode by default: ord('A') returns 65 and chr(0x4E16) returns the Chinese character for "world." Understanding Unicode is essential for internationalization (i18n), proper text handling, emoji processing, and avoiding encoding bugs that can corrupt data.

Emoji and Special Characters

Emoji are full Unicode citizens with assigned code points, typically in the Supplementary Multilingual Plane (U+1F000 and above). Some emoji are composed of multiple code points joined by Zero Width Joiners (U+200D) — for example, the family emoji can be a sequence of individual person emoji joined by ZWJ characters. Skin tone modifiers (U+1F3FB through U+1F3FF) and gender modifiers add further complexity to the emoji system.

Frequently Asked Questions

Related Tools