Byte order mark
Abyte order mark(BOM) is a sequence of bytes used to indicate theUnicodeencoding style of a text file. The encoding dictates how text is serialized into a sequence of bytes. If the least significant byte is placed in the initial position, this is referred to as "little-endian," whereas if the most significant byte is placed in the initial position, the method is known as "big-endian."
In addition to indicating the byte order, a BOM can also be used as a file signature to identify the encoding of a text file.[1]The UTF-8 file signature (commonly also referred to as a "BOM" ) identifies the encoding format rather than the byte order of the document. UTF-8 is a linear sequence of bytes (not a sequence of 2-byte or 4-byte units where the byte order is important as in UTF-16 and UTF-32). The following table shows the byte-order marks for various encodings.
Byte Order Mark (BOM) | Encoding Form |
---|---|
EF BB BF | UTF-8 |
FE FF | UTF-16, big-endian |
FF FE | UTF-16, little-endian |
00 00 FE FF | UTF-32, big-endian |
FF FE 00 00 | UTF-32, little-endian |
BOM use is optional. If used, it must be at the very beginning of the text. The BOM gives the producer of the text a way to describe the encoding such asUTF-8orUTF-16,and in the case of UTF-16 and UTF-32, itsendianness.The BOM is important for text interchange, when files move between systems that use different byte orders or different encodings, rather than in normal text handling in a closed environment.
As UTF-8 has become the most common text encoding,EFBBBF
(shown here as three hexadecimal values) is the most commonly occurring BOM form, also known as theUTF-8 signature.HTML5 browsers are required to recognize the UTF-8 BOM and use it to detect the encoding of the page.[2]Software may alternatively recognize UTF-8 encoding by looking for bytes with the high order bit set (values0x80
through0xFF
) followed by bytes that define valid UTF-8 sequences.
The Unicode Standard neither requires nor recommends the use of the BOM for UTF-8, but warns that it may be encountered at the start of a file.[3]
Most modern software applications recognize a BOM and may insert it when saving a text file with UTF encoding. The presence of the UTF-8 BOM may cause problems with some software, especially legacy software not designed to handle UTF-8, in which case it may appear as the characters "".
References
[change|change source]- ↑"Byte order mark - Globalization | Microsoft Learn".Retrieved2023-09-25.
- ↑"The byte-order mark (BOM) in HTML".w3.org.Retrieved2019-10-24.
- ↑"The Unicode Standard – Chapter 2"(PDF).p. 30.