Unicode

This is a near-universal converter for various Unicode encodings.

Source: Target:

Supported formats

Encodings

UTF-8: A variable-length encoding that produces between 1-4 bytes per character. Equivalent to ASCII on the ASCII character set.
UTF-16: A variable-length encoding that produces either 2 or 4 bytes per character.
UTF-32 / UCS-4: A fixed-length encoding that produces 4 bytes per character.
Codepoints: The codepoint of each character, separated by spaces.
Punycode: An ASCII representation of Unicode used in internet hostnames, defined by RFC-3492. Note that the xn-- prefix of internationalized domain names is not part of Punycode and must be omitted when decoding.

The Raw format simply prints the string as it should be represented. Note that most valid code points are either unassigned or do not have a glyph in the current font; characters may either be missing or represented as a square box.

This converter is only for Unicode text. Use the binary converter for arbitrary byte sequences.

Bases

Numbers can be represented in binary, octal, decimal or hexadecimal bases. When encoding bytes, the binary, octal and hexadecimal representations are padded to a fixed length (8, 3 and 2 digits per byte). The decimal representation, and all representations of codepoints, are space-separated.

Base64 is a special encoding that represents byte sequences as alphanumeric characters (plus the characters /, + and =), using four characters for every three bytes.

The PGP word list encodes bytes as a sequence of words, and is useful for conveying data over an audio channel.

Note that both Base64 and PGP words encode a byte stream, and cannot be used with codepoints, which are integers rather than bytes.