UTF-8 Encoding

8-bit Unicode Transformation format, called UTF-8, is a variable width character encoding that can encode all of the 1.111.064 valid code points in Unicode wit one to four 8-bit bytes. The number “8” means 8-bit blocks are used by UTF for representing a character.

Since 2009, UTF-8 has been the leading encoding for the World Wide Web.

For characters that are equal to or below 127 (hex 0x7F), the UTF-8 representation is one byte. This is similar to the ASCII value.

For any character equal to or below 2047 (hex 0x07FF), the UTF-8 representation is scattered over two bytes.

For any character that is equal to or greater than 2048 but less than 65535 (0xFFFF), the UTF-8 representation will be spread across three bytes.

The list below shows some UTF-8 character codes which are supported by HTML5:

Character CodesDecimalHexadecimal
C0 Controls and Basic Latin0-1270000-007F
C1 Controls and Latin-1 Supplement128-2550080-00FF
Latin Extended-A256-3830100-017F
Latin Extended-B384-5910180-024F
Spacing Modifiers688-76702B0-02FF
Diacritical Marks768-8790300-036F
Greek and Coptic880-10230370-03FF
Cyrillic Basic1024-12790400-04FF
Cyrillic Supplement1280-13270500-052F
General Punctuation8192-83032000-206F
Currency Symbols8352-839920A0-20CF
Letterlike Symbols8448-85272100-214F
Arrows8592-87032190-21FF
Mathmetical Operators8704-89592200-22FF
Box Drawings9472-95992500-257F
Block Elements9600-96312580-259F
Geometric Shapes9632-972725A0-25FF
Miscellaneous Symbols9728-99832600-26FF
Dingbats9984-101752700-27BF

Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *