Truecasing
From Wikipedia, the free encyclopedia
Truecasing, also called capitalization recovery,[1] capitalization correction,[2] or case restoration,[3] is the problem in natural language processing (NLP) of determining the proper capitalization of words where such information is unavailable. This commonly comes up due to the standard practice (in English and many other languages) of automatically capitalizing the first word of a sentence. It can also arise in badly cased or noncased text (for example, all-lowercase or all-uppercase text messages).
This article needs additional citations for verification. (October 2010) |
Truecasing is unnecessary in languages whose scripts do not have a distinction between uppercase and lowercase letters. This includes all languages not written in the Latin, Greek, Cyrillic or Armenian alphabets, such as Korean, Japanese, Chinese, Thai, Hebrew, Arabic, Hindi, and Georgian.