Cork encoding

Details

In 8-bit TeX engines the font encoding has to match the encoding of hyphenation patterns where this encoding is most commonly used.^[3] In LaTeX one can switch to this encoding with \usepackage[T1]{fontenc}, while in ConTeXt MkII this is the default encoding already. In modern engines such as XeTeX and LuaTeX Unicode is fully supported and the 8-bit font encodings are obsolete.

Character set

Cork encoding
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0x	` 0060	´ 00B4	ˆ 02C6	˜ 02DC	¨ 00A8	˝ 02DD	˚ 02DA	ˇ 02C7	˘ 02D8	¯ 00AF	˙ 02D9	¸ 00B8	˛ 02DB	‚ 201A	‹ 2039	› 203A
1x	“ 201C	” 201D	„ 201E	« 00AB	» 00BB	– 2013	— 2014	ZWSP^[a] 200B	₀^[b] 2080	ı^[c] 0131	ȷ^[c] 0237	ﬀ FB00	ﬁ FB01	ﬂ FB02	ﬃ FB03	ﬄ FB04
2x	SP	!	"	#	$	%	&	’ 2019	(	)	*	+	,	-	.	/
3x	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5x	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
6x	‘ 2018	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7x	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~	SHY^[d]
8x	Ă 0102	Ą 0104	Ć 0106	Č 010C	Ď 010E	Ě 011A	Ę 0118	Ğ 011E	Ĺ 0139	Ľ 013D	Ł 0141	Ń 0143	Ň 0147	Ŋ 014A	Ő 0150	Ŕ 0154
9x	Ř 0158	Ś 015A	Š 0160	Ş 015E	Ť 0164	Ţ 0162	Ű 0170	Ů 016E	Ÿ 0178	Ź 0179	Ž 017D	Ż 017B	Ĳ 0132	İ 0130	đ 0111	§ 00A7
Ax	ă 0103	ą 0105	ć 0107	č 010D	ď 010F	ě 011B	ę 0119	ğ 011F	ĺ 013A	ľ 013E	ł 0142	ń 0144	ň 0148	ŋ 014B	ő 0151	ŕ 0155
Bx	ř 0159	ś 015B	š 0161	ş 015F	ť 0165	ţ 0163	ű 0171	ů 016F	ÿ 00FF	ź 017A	ž 017E	ż 017C	ĳ 0133	¡ 00A1	¿ 00BF	£ 00A3
Cx	À	Á	Â	Ã	Ä	Å	Æ	Ç	È	É	Ê	Ë	Ì	Í	Î	Ï
Dx	Ð^[e]	Ñ	Ò	Ó	Ô	Õ	Ö	Œ 0152	Ø	Ù	Ú	Û	Ü	Ý	Þ	SS^[f] 1E9E
Ex	à	á	â	ã	ä	å	æ	ç	è	é	ê	ë	ì	í	î	ï
Fx	ð	ñ	ò	ó	ô	õ	ö	œ 0153	ø	ù	ú	û	ü	ý	þ	ß 00DF

Remove ads

Notes

Hexadecimal values under the characters in the table are the Unicode character codes.
The first 12 characters are often used as combining characters.

[a]
0x17 is dubbed a “compound word mark” (CWM) in the Cork encoding, and is an innovation of this standard. It is an invisible character that separates compounds in a complex word, for instance in German, in order to disallow esthetic ligatures at compound boundaries.^[2] It is mapped to the Unicode “zero-width space” (ZWSP, U+200B), defined at about the same time, whose purpose is similar, if not identical.
[b]
0x18 is a “small o”, used to compose ‰ or ‱ (or arbitrary smaller quantities) out of percent sign (%).^[2]
[c]
Dotless i and dotless j may be used to compose accented variants like i with macron (ī).
[d]
0x7F is the hyphenation character, not really a soft hyphen (SHY) as defined by Unicode.
[e]
0xD0 is used both as Eth (Ð, U+00D0) and as D with stroke (Đ, U+0110) which might be a problem at some occasions (like copying text from PDF, hyphenation, ...)
[f]
0xDF contains SS (two letters S). It allows TeX to automatically convert the German lowercase ß into the uppercase form.

Supported languages

The encoding supports most European languages written in Latin alphabet. Notable exceptions are:

Esperanto and Maltese language (using IL3)
Latvian language and Lithuanian language (using L7X)
Welsh language

Languages with slightly suboptimal support include:

Galician language, Portuguese language and Spanish language – due to the lack of characters ª and º, which are not superscript versions of lowercase "a" and "o" (superscripts are thinner) and they are often underlined
Croatian language, Bosnian language, Serbian language – due to the shared use of the slot for Đ
Turkish language – due to dotless i having different uppercase and lowercase combinations than in other languages

Remove ads

Details

Character set

Notes

Supported languages

References

External links

Wikiwand - on