UTF-8

From Wikipedia, the free encyclopedia

UTF-8 pian-bé sī tsi̍t-tsióng iōng-teh tiān-tsí thong-sìn ê khó-piàn tn̂g-tōo jī-hû pian-bé. Iû Unicode piau-tsún tīng-gī, bîng-tshing guân-tsū Unicode (hi̍k-tsiá Thong-iōng pian-bé jī-hû tsi̍p (Universal Coded Character Set))) tsuán-uānn kik-sik – 8-bit.[1]

Quick Facts Standard, Classification ...
UTF-8
Standard Unicode Standard
Classification Unicode Transformation Format, extended ASCII, variable-length encoding
Extends ASCII
Transforms / Encodes ISO/IEC 10646 (Unicode)
Preceded by UTF-1
Close

UTF-8 ē-tàng sú-iōng 1 kàu 4-ê tan-uī byte (8-bit) tāi-bé tan-guân, tuì Unicode tang-tiong êsóo-ū 1,112,064[lower-alpha 1] ê ū-hāu jī-hû tāi-bé tiám tsìn-hîng pian-bé. Kū-iú khah-kkē sòo-ti̍t ê tāi-bé tiám, óng-óng koh-khah pîn-huân teh tshut-hiān, sú-iōng khah-tsió ê jī-tsiat (byte) tsìn-hîng pian-bé. UTF-8 sī uī-tio̍h kah ASCII hiòng-āu kiam-iông jî-lâi siat-kè ê: Unicode ê tsiân 128-ê jī-hû kap ASCII it-tuì-it teh tuì-ìng, sú-iōng hām ASCII kū-iú sio-siâng ê 2 tsín-tsè ti̍t ê tan-ê jī-tsiat tsìn-hîng pian-bé; in-tshú ū-hāu ê ASCII bûn-pún sī ū-hāu ê UTF-8 pian-bé Unicode ma-s án-ne.

UTF-8 hông siat-kè tsò UTF-1 ê koh-khah hó ê tāi-thè phín, UTF-1 sī tsi̍t-tsióng kiàn-gī ê khó-piàn tngt-oo pian-bé, kū-iú pōo-hūn ASCII kiam-iông sìng; m̄-ku khiàm-khuat tsi̍t-kuá-á kong-lîng, pau-kuat tsū sio-tuè (tông-pōo) hām uân-tsuân ASCII kiam-iông ê jī-hû tshú-lí, pí-jû siâ-suànn. Ken Thompson hām Rob Pike tī 1992-nî 9-gue̍h thâu-pái si̍t-hiān Plan 9 tshau-tsok hē-thóng.[2][3] Tse tō tì-sú X/Open tshái-iōng UTF-8 tsok-uî FSS-UTF ê kui-huān,[4] tī 1993-nî 1-gue̍h thâu-pái teh USENIX[5] tíng-kuân tsìng-sik the̍h-tshut, suî-āu hōo internet kang-tîng jīm-bū tsoo (IETF) teh RFC 2277 lāi-té tshái-iōng (BCP 18)[6], iōng-teh bī-lâi ê internet piau-tsún khang-khuè, í-tshù tshú-tāi kū-pán RFC tang-tiong ê tan-jī jī-tsiat (byte) jī-hû tsi̍p, pí-jû Lating-1.

Kah jīm-hô tāi-thè ê bûn-jī pian-bé sio-pí, UTF-8 tsō-sîng ê kok-tsè-huà būn-tê koh-khah tsió[7][8]; pīng-tshiánn UTF-8 í-king teh sóo-iú ê hiān-tāi tshau-tsok hē-thóng (pau-kuat Microsoft Windows) kap JSON tíng-tíng ê piau-tsún tang-tiong si̍t-hiān, kî-tiong jû-lâi jû-tsē ê tsîng-hóng tō sī, UTF-8 sī uî-it ún-tsún ê Unicode hîng-sik.

Tsia̍t-tsí 2023-nî, UTF-8 sī World Wide Web (hām internet ki-su̍t) ê tsú-iàu pian-bé, tsiàm soou bāng-ia̍h ê 98.0%, tsiân 10,000 ê ia̍h-bīn ê 99.0%, tuì tsiânn-tsē gú-giân lóng kuân-kàu 100%.[9] Tsha-put-to sóo-ū ê kok-ka hām gú-giân teh bāng-lōo tíng-kuân lóng-ū 95% hi̍k-tsiá í-siōng lóng teh sú-iōng UTF-8 pian-bé.

Tsù-sik

  1. 17 planes times 216 code points per plane, minus 211 technically-invalid surrogates.

Tsù-kái

Tsham-ua̍t

Guā-pōo liân-kiat

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.