Top Qs
Timeline
Chat
Perspective
Cork encoding
Latin script character encoding used by LaTeX From Wikipedia, the free encyclopedia
Remove ads
The Cork (also known as T1 or EC) encoding is a character encoding used for encoding glyphs in fonts.[1] It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX.[1] It contains 256 characters supporting most west- and east-European languages with the Latin alphabet.[2]
This article relies largely or entirely on a single source. (November 2012) |
Remove ads
Details
In 8-bit TeX engines the font encoding has to match the encoding of hyphenation patterns where this encoding is most commonly used.[3] In LaTeX one can switch to this encoding with \usepackage[T1]{fontenc}
, while in ConTeXt MkII this is the default encoding already. In modern engines such as XeTeX and LuaTeX Unicode is fully supported and the 8-bit font encodings are obsolete.
Character set
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ` 0060 |
´ 00B4 |
ˆ 02C6 |
˜ 02DC |
¨ 00A8 |
˝ 02DD |
˚ 02DA |
ˇ 02C7 |
˘ 02D8 |
¯ 00AF |
˙ 02D9 |
¸ 00B8 |
˛ 02DB |
‚ 201A |
‹ 2039 |
› 203A |
1x | “ 201C |
” 201D |
„ 201E |
« 00AB |
» 00BB |
– 2013 |
— 2014 |
ZWSP[a] 200B |
₀[b] 2080 |
ı[c] 0131 |
ȷ[c] 0237 |
ff FB00 |
fi FB01 |
fl FB02 |
ffi FB03 |
ffl FB04 |
2x | SP | ! | " | # | $ | % | & | ’ 2019 |
( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ‘ 2018 |
a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | SHY[d] |
8x | Ă 0102 |
Ą 0104 |
Ć 0106 |
Č 010C |
Ď 010E |
Ě 011A |
Ę 0118 |
Ğ 011E |
Ĺ 0139 |
Ľ 013D |
Ł 0141 |
Ń 0143 |
Ň 0147 |
Ŋ 014A |
Ő 0150 |
Ŕ 0154 |
9x | Ř 0158 |
Ś 015A |
Š 0160 |
Ş 015E |
Ť 0164 |
Ţ 0162 |
Ű 0170 |
Ů 016E |
Ÿ 0178 |
Ź 0179 |
Ž 017D |
Ż 017B |
IJ 0132 |
İ 0130 |
đ 0111 |
§ 00A7 |
Ax | ă 0103 |
ą 0105 |
ć 0107 |
č 010D |
ď 010F |
ě 011B |
ę 0119 |
ğ 011F |
ĺ 013A |
ľ 013E |
ł 0142 |
ń 0144 |
ň 0148 |
ŋ 014B |
ő 0151 |
ŕ 0155 |
Bx | ř 0159 |
ś 015B |
š 0161 |
ş 015F |
ť 0165 |
ţ 0163 |
ű 0171 |
ů 016F |
ÿ 00FF |
ź 017A |
ž 017E |
ż 017C |
ij 0133 |
¡ 00A1 |
¿ 00BF |
£ 00A3 |
Cx | À | Á | Â | Ã | Ä | Å | Æ | Ç | È | É | Ê | Ë | Ì | Í | Î | Ï |
Dx | Ð[e] | Ñ | Ò | Ó | Ô | Õ | Ö | Œ 0152 |
Ø | Ù | Ú | Û | Ü | Ý | Þ | SS[f] 1E9E |
Ex | à | á | â | ã | ä | å | æ | ç | è | é | ê | ë | ì | í | î | ï |
Fx | ð | ñ | ò | ó | ô | õ | ö | œ 0153 |
ø | ù | ú | û | ü | ý | þ | ß 00DF |
Remove ads
Notes
- Hexadecimal values under the characters in the table are the Unicode character codes.
- The first 12 characters are often used as combining characters.
- 0x17 is dubbed a “compound word mark” (CWM) in the Cork encoding, and is an innovation of this standard. It is an invisible character that separates compounds in a complex word, for instance in German, in order to disallow esthetic ligatures at compound boundaries.[2] It is mapped to the Unicode “zero-width space” (ZWSP, U+200B), defined at about the same time, whose purpose is similar, if not identical.
- 0x18 is a “small o”, used to compose ‰ or ‱ (or arbitrary smaller quantities) out of percent sign (%).[2]
- Dotless i and dotless j may be used to compose accented variants like i with macron (ī).
- 0x7F is the hyphenation character, not really a soft hyphen (SHY) as defined by Unicode.
- 0xD0 is used both as Eth (Ð, U+00D0) and as D with stroke (Đ, U+0110) which might be a problem at some occasions (like copying text from PDF, hyphenation, ...)
Supported languages
The encoding supports most European languages written in Latin alphabet. Notable exceptions are:
- Esperanto and Maltese language (using IL3)
- Latvian language and Lithuanian language (using L7X)
- Welsh language
Languages with slightly suboptimal support include:
- Galician language, Portuguese language and Spanish language – due to the lack of characters ª and º, which are not superscript versions of lowercase "a" and "o" (superscripts are thinner) and they are often underlined
- Croatian language, Bosnian language, Serbian language – due to the shared use of the slot for Đ
- Turkish language – due to dotless i having different uppercase and lowercase combinations than in other languages
Remove ads
References
External links
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads