Big five

Computer Chinese Character Set Standard
Collection
zero Useful+1
zero
synonym Big5 (Big5) generally means five yards larger
Big5 , also known as Big five or Five yards , is using Traditional Chinese (Traditional Chinese) Most commonly used computers in the community chinese characters character set Standard, including 13060 Chinese characters.
Chinese name
Big five
Foreign name
Big5
Alias
Five yards

brief introduction

Announce
edit
Big5 , also known as Big five or Five yards , is using Traditional Chinese (Traditional Chinese) Most commonly used computers in the community chinese characters character set Standard, 13060 in total chinese characters
Chinese code is divided into Internal code and Interchange code Two types, Big5 belongs to Chinese internal code, which is well-known in Chinese Interchange code yes CCCII CNS11643
Although Big5 is popular in Taiwan, Hong Kong and Macao Traditional Chinese, but it has not been the local national/regional standard or official standard for a long time Industry standards The character sets of Yitian Chinese system, Windows Traditional Chinese version and other major systems are all based on Big5, but the manufacturers have added different character creation and character creation areas to generate different versions.
In 2003, Big5 was included in the appendix of CNS11643 Chinese Standard Interchange Code, gaining a more formal status. This latest version is called Big5-2003 [1]

History and name

Announce
edit
Big5 was created by Taiwan Information Industry Policy Promotion Meeting of Consortium Legal Person by Five Chinese software packages The designed Chinese common internal code was announced in December 1983. In March of the next year, the Information Industry Policy Promotion Conference signed the "16 digit personal computer suite software cooperative development (BIG-5) project" with 13 Taiwanese manufacturers( Five Chinese software packages )”, because this Chinese internal code is Taiwan It is designed by the self-developed "Five Chinese Package Software", so it is called Big5 Chinese internal code. Although the five Chinese software packages did not replace foreign software packages as expected Guoqiao Chinese system and Yitian Chinese system The success in Taiwan market has made Big5 code have a profound impact on traditional Chinese computers Internal code , until today. The English name "Big5" of "Five Big Codes" was later translated back into Chinese in English word order, so that now there are two Chinese names "Five Big Codes" and "Big Five Codes".
Big5 code was created because personal computer There is no common internal code, so the Chinese application software launched by the manufacturer cannot be promoted, and IBM 5550 Wang Anma Internal codes are incompatible with each other; On the other hand, Taiwan had not yet introduced a Chinese coding standard. In this context of time and space, a project adopted to enable Taiwan to enter the information age as soon as possible; At the same time, this project has also had a long-term impact on the Asian traditional Chinese character circle with Taiwan as the core.
Before Big5 production, research and development Chinese computer Of Zhu Bangfu He believed that the internal code character set should include all orthographic characters and variant characters to meet the needs of applications such as household administration. Therefore, at the internal code conference at that time, he proposed to use his more than 50000 character font. The engineer thinks that although its technology is feasible, the internal code with a length of three bytes (more than two bytes) will cause the problem of text misalignment when English display screen images are mapped to Chinese images, because the popular Reliance Chinese system images were mapped into a Chinese text pattern with a width of two bytes, In English software, as long as a Chinese character is displayed in two English word widths, the picture will not be disordered, causing Chinese system operators to prefer two byte internal codes; In addition, the internal code compressed from the Cangjie input code has no sorting function, so it is not used. In 1983, Zhu Bangfu was falsely accused of the Communist Party Its research results are even less likely to be adopted.
After the birth of Big5, most computer software in Taiwan used Big5 Yitian Chinese system The high popularity of Microsoft Windows 3.x And so on. Although there are still various programs in Taiwan that want to replace Big5 code, such as those implemented by the Chinese system of Yitian Heaven Reliant Code Promoted by Taipei Computer Association Guild code However, since Big5 has been used for many years, it can never become a mainstream character when habits are not easy to change. Taiwan later developed National standard CNS 11643 Chinese standard exchange code Due to the unusual internal code system, which is intended for exchange, and is limited by nature, it must use at least three bytes to represent a Chinese character, so its popularity is far lower than that of Big5 code.
In the early 1990s, when Chinese Mainland Of E-mail And transcoding software Shenzhen Hong Kong and Taiwan companies have also used the Big5 system to facilitate document exchange with the headquarters and avoid writing a different internal code system for mainland offices. use Simplified Chinese Community, most commonly used GB 2312 GBK And subsequent National standard code GB 18030 )。
In addition to Taiwan, other regions that use traditional Chinese characters, such as Hong Kong Hong Kong Supplementary Character Set )、 Macao Macao Supplementary Character Set ), and overseas Chinese who use traditional Chinese characters used Big5 as the Chinese internal code and exchange code. [2]

Byte structure

Announce
edit
Big5 is a set Double byte character set , a double octet storage method is used to place a word with two bytes. The first byte is called "high byte", and the second byte is called "low byte".
"High byte" uses 0x81-0xFE, "Low byte" uses 0x40-0x7E, and 0xA1-0xFE. In the partition of Big5:
0x8140-0xA0FE
Reserved for user-defined characters( Lettering District)
0xA140-0xA3BF
punctuation Greek alphabet And special symbols,
Including 0xA259-0xA261, nine Chinese characters for measurement : It should be kept in the air.
0xA3C0-0xA3FE
retain. This area is not open for word making.
0xA440-0xC67E
Common Chinese Characters , press first stroke Press again Radical Sort.
0xC6A1-0xC8FE
Reserved for user-defined characters (word making area)
0xC940-0xF9D5
Secondary common Chinese characters , also sorted by strokes and then by radicals.
0xF9D6-0xFEFE
Reserved for user-defined characters (word making area)
It is worth noting that Big5 repeatedly includes two identical words: "Wu, Wu" (0xA461 [U+5140] and 0xC94A [U+FA0C]), "嗀, 嗀" (0xDCD1 [U+55C0] and 0xDDFC [U+FA0D]). In addition, "ten" and "thirty" are repeated again in the symbolic area retrieval system It will cause no words to be queried.

Code punching problem

Because low bit characters contain programing language shell script Special characters commonly used in strings or commands, such as 0x5C "", 0x7C "|", etc. "" is used as an escape symbol in strings for many purposes, and is also called an escape character, such as n (newline), r (homing), t (tab), (its own symbol), "(quotation mark), etc UNIX Most operating systems are used as command pipelines, such as "ls - la | more". If there are these special escape characters in the string, they will be program or interpreter It is interpreted as a special purpose. However, because it is Chinese, it cannot be correctly interpreted as the above behavior, so the program may ignore this escape symbol or interrupt the operation. If so, it violates the user's original intention to Chinese characters Part of the original intention of use.
Low bit characters overlap ASCII characters as follows:
@ A-Z [ \ ] ^ _ ` a-z { | } ~ 
It often appears in common words such as "Gong" (0xA55C), "Xu" (0xB35C), "Gai" (0xBB5C), and "Yu" (0xA87C), which makes many software unable to correctly process Big5 encoded strings or files. This question is jokingly personified as“ Xu Gonggai ”Or“ Xu Gaigong ”(These three words all have this problem).
The general solution is to add an additional "" character, because "" will be interpreted as "", so the "success factor" string can be correctly treated by the program as the "success factor" string. However, the additional trouble is that some output functions do not treat "" as a special character, so some programs or web pages will incorrectly often appear "" after the words "Xu Gonggai".
Conflicting with line drawing characters
The first byte of Big5 code character will be the same as DOS Code page 437 Of Underlined character Confrontation results in garbled code.

Private lettering area

stay Yitian Chinese system , and later Windows 3.1 ninety-five and ninety-eight Define four private font areas: 0xFA40-0xFEFE, 0x8E40-0xA0FE, 0x8140-0x8DFE, 0xC6A1-0xC8FE.
The original idea of the private character creation area is to allow users to add characters that were originally missing in the code table. However, when each user adds different characters at different places, it is difficult for them to know exactly what a code is intended to express when exchanging data. [2]

development

Announce
edit
As the Big5 extensions launched by manufacturers and governments are incompatible with each other Garbled code Question. Whereas Unicode Can correctly handle more than 70000 Chinese characters operating system And applications such as Apple Computer Mac OS X And with Cocoa API writing procedures Microsoft Windows 2000 And later Microsoft Office 2000 and later Mozilla Browser Internet Explorer Browser Java Language, etc.), Unicode encoding has been used instead. Unfortunately, there are still some old software (such as Visual Basic 6. Part Telnet or BBS Software), which does not support Unicode encoding, so I believe that the problem of missing words in Big5 will still bother users for some time until all programs can change to Unicode. [2]

Input method

Announce
edit
  • VimIM In Vim environment, you can directly type decimal or hexadecimal code. Neither guided input method nor code table is required. [2]

see

Announce
edit
  • CCCII
  • GB 18030 Expansion of the basic set of Chinese character coded character set for information interchange
  • CJK Unified Ideographs
  • Chinese random code
  • National Standard Chinese Interchange Code (CNS11643)