Top Qs
Timeline
Chat
Perspective
Stroke-based sorting
Sorting method used in Chinese dictionaries From Wikipedia, the free encyclopedia
Remove ads
Stroke-based sorting, also called stroke-based ordering or stroke-based order, is one of the five sorting methods frequently used in modern Chinese dictionaries, the others being radical-based sorting, pinyin-based sorting, bopomofo and the four-corner method.[1] In addition to functioning as an independent sorting method, stroke-based sorting is often employed to support the other methods.[2] For example, in Xinhua Dictionary (新华字典), Xiandai Hanyu Cidian (现代汉语词典) and Oxford Chinese Dictionary,[3] stroke-based sorting is used to sort homophones in Pinyin sorting, while in radical-based sorting it helps to sort the radical list, the characters under a common radical, as well as the list of characters difficult to lookup by radicals.
In stroke-based sorting, Chinese characters are ordered by different features of strokes, including stroke counts, stroke forms, stroke orders, stroke combinations, stroke positions, etc.[4]
Remove ads
Stroke-count sorting
This method arranges characters according to their numbers of strokes ascendingly. A character with less strokes is put before those of more strokes. For example, the different characters in "漢字筆劃, 汉字笔画" (Chinese character strokes) are sorted into "汉(5)字(6)画(8)笔(10)[筆(12)畫(12)]漢(14)", where stroke counts are put in brackets. (Please note that both 筆 and 畫 are of 12 strokes and their order is not determinable by stroke-count sorting.).
Stroke-count sorting was first used in Zihui to arrange the radicals and the characters under each radical when the dictionary was published in 1615.[5] It was also used in Kangxi Chinese Character Dictionary when the dictionary was first compiled in 1710s.[5]
Remove ads
Stroke-count–stroke-order sorting
This is a combination of stroke-count sorting and stroke-order sorting. Characters are first arranged by stroke-counts in ascending order. Then Stroke-order sorting is employed to sort characters with the same number of strokes. The characters are firstly arranged by their first strokes according to an order of stroke form groups, such as “heng (横, ㇐), shu (竖, ㇑), pie (撇, ㇓), dian (点, ㇔), zhe (折, ㇕)”, or “dian (点), heng (横), shu (竖), pie (撇), zhe (折)”. If the first strokes of two characters belong to the same group, then sort by their second strokes in a similar way, and so on.
In our example of the previous section, both 筆 and 畫 are of 12 strokes. 筆 starts with stroke "㇓" of the pie (撇) group, and 畫 starts with "㇕" of the zhe (折) group, and pie is before zhe in the groups order, so 筆 comes before 畫. Hence the different characters in "汉字笔画, 漢字筆劃" are finally sorted into "汉(5)字(6)画(8)笔(10)筆(12㇓)畫(12㇕)漢(14)", where each character is put at its unique position.
Stroke-count-stroke-order sorting was used in Xinhua Dictionary and Xiandai Hanyu Cidian before the national standard for stroke-based sorting was released in 1999.
Remove ads
GB stroke-based order
Summarize
Perspective
The Standard of GB13000.1 Character Set Chinese Character Order (Stroke-Based Order) (GB13000.1字符集汉字字序(笔画序)规范))[6] is a standard released by the National Language Commission of China in 1999 for Chinese characters sorting by strokes. This is an enhanced version of the traditional stroke-count–stroke-order sorting.
According to this standard,
- Two characters are first sorted by stroke counts.
- If they are of the same stroke counts, sort by stroke order (of the five families of heng, shu, pie, dian and zhe).
- If the characters are of the same stroke order, they will be sorted by the primary-secondary stroke order.
- For example, 子 and 孑 each have three strokes and are written, in stroke-order, ㇐㇚㇐ and ㇐㇚㇀. ㇐ and ㇀ both belong to the heng family, so there is a tie under (2). Under (3), ㇐ is considered a primary stroke and sorts before the secondary stroke ㇀. As a result, 子 sorts before 孑.
 
- If two characters are of the same stroke count, stroke order and primary-secondary stroke, then sort them according to their modes of stroke combination. Stroke separation comes before stroke connection, and connection comes before stroke intersection. 
- For example, 八, 人, 乂 all have 2 strokes in the order of ㇓㇏. They sort in the order of 八, 人, 乂, because 八 has separated strokes, 人 has a simple connection, and 乂 has an intersection.
 
This standard has been employed by the new editions of Xinhua Dictionary[7] and Xiandai Hanyu Cidian.[8]
YES sorting
YES is a simplified stroke-based sorting method free of stroke counting and grouping, without compromise in accuracy. Briefly speaking, YES arranges Chinese characters according to their stroke orders and an "alphabet" of 30 strokes:
㇐ ㇕ ㇅ ㇎ ㇡ ㇋ ㇊ ㇍ ㇈ ㇆ ㇇ ㇌  ㇀ ㇑ ㇗ ㇞ ㇉ ㄣ ㇙ ㇄ ㇟ ㇚ ㇓ ㇜ ㇛ ㇢ ㇔ ㇏ ㇂
 ㇀ ㇑ ㇗ ㇞ ㇉ ㄣ ㇙ ㇄ ㇟ ㇚ ㇓ ㇜ ㇛ ㇢ ㇔ ㇏ ㇂ 
built on the basis of Unicode CJK strokes.[9][10]
To compare the sort-order of two characters, one expands each character into a string of strokes and compare them using the sort-order of the 30 strokes, much like one sorts two words in a dictionary using the sort-order of letters. Equivalently, one first decides whether the first stroke is sufficient to result in a sort (for example, because 汉 starts with ㇔ and 笔 starts with ㇚, 笔 sorts before 汉); if they happen to be identical, then one moves on to the second stroke (for example, 汉 expands to ㇔㇔... and 字 expands to ㇔㇑..., hence 字 sorts before 汉).
The YES order of the different characters in "汉字笔画, 漢字筆劃" is "画畫筆笔字漢汉", where each character is put at its unique position.
YES sorting has been applied to the indexing of all the characters in Xinhua Zidian and Xiandai Hanyu Cidian.[10]
Remove ads
Word-sorting
All of the aforementioned examples describe the sorting of single characters. To sort two words that consists of multiple characters:
- Select a method for comparing two characters.
- If the first character of word #1 sorts before the first character of word #2, then word #1 sorts before word #2.
- Otherwise, advance until a character that sorts differently is found, or if a word ends, in which case the shorter word sorts before the longer one.
This method is used in the YES-CEDICT Chinese Dictionary, using YES for character comparison.[11]
Remove ads
See also
References
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads
