Byte order mark

Unicode character (U+FEFF) From Wikipedia, the free encyclopedia

Remove ads

A byte order mark (BOM) is a sequence of bytes used to indicate the Unicode encoding style of a text file. The encoding dictates how text is serialized into a sequence of bytes. If the least significant byte is placed in the initial position, this is referred to as "little-endian," whereas if the most significant byte is placed in the initial position, the method is known as "big-endian."

In addition to indicating the byte order, a BOM can also be used as a file signature to identify the encoding of a text file.[1] The UTF-8 file signature (commonly also referred to as a "BOM") identifies the encoding format rather than the byte order of the document. UTF-8 is a linear sequence of bytes (not a sequence of 2-byte or 4-byte units where the byte order is important as in UTF-16 and UTF-32). The following table shows the byte-order marks for various encodings.

More information Encoding Form ...

BOM use is optional. If used, it must be at the very beginning of the text. The BOM gives the producer of the text a way to describe the encoding such as UTF-8 or UTF-16, and in the case of UTF-16 and UTF-32, its endianness. The BOM is important for text interchange, when files move between systems that use different byte orders or different encodings, rather than in normal text handling in a closed environment.

As UTF-8 has become the most common text encoding, EFBBBF (shown here as three hexadecimal values) is the most commonly occurring BOM form, also known as the UTF-8 signature. HTML5 browsers are required to recognize the UTF-8 BOM and use it to detect the encoding of the page.[2] Software may alternatively recognize UTF-8 encoding by looking for bytes with the high order bit set (values 0x80 through 0xFF) followed by bytes that define valid UTF-8 sequences.

The Unicode Standard neither requires nor recommends the use of the BOM for UTF-8, but warns that it may be encountered at the start of a file.[3]

Most modern software applications recognize a BOM and may insert it when saving a text file with UTF encoding. The presence of the UTF-8 BOM may cause problems with some software, especially legacy software not designed to handle UTF-8, in which case it may appear as the characters "".

Remove ads

References

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads