computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems / From Wikipedia, the free encyclopedia

Unicode is a standard, promoted by the Unicode Consortium, for encoding the text of most of the world's writing systems, using variable-width encodings, such as UTF-8. Its goal is to replace current and previous character encoding standards with one worldwide standard for all languages. It has already done that to a large degree; for example, it is dominant on the web, in the form of the UTF-8 encoding. UTF-16 is also common, for example, on Windows, while Microsoft recommends UTF-8. The standard supports emojis and other symbols that older standards did not support.

Older standards for (English) text could not represent all languages of the world, for example, Chinese or Japanese. They also could not represent languages such as Arabic or Hebrew, which are written from right to left—at least not when mixed with other languages that are written from left to right. Unicode supports such mixing. It also allows for sorting (collating), which is not easy when languages are mixed.

Unicode provides many printable characters, such as letters, digits, diacritics (things that attach to letters), and punctuation marks. It also provides characters that do not actually print, but instead control how text is processed. For example, a newline and a character that makes text go from right to left are both characters that do not print.