Big5, also known asBig fiveorFive yards, is usingTraditional Chinese(Traditional Chinese) Most commonly used computers in the communitychinese characterscharacter setStandard, including 13060 Chinese characters.
Although Big5 is popular in Taiwan, Hong Kong andMacaoTraditional Chinese, but it has not been the local national/regional standard or official standard for a long timeIndustry standards。The character sets of Yitian Chinese system, Windows Traditional Chinese version and other major systems are all based on Big5, but the manufacturers have added different character creation and character creation areas to generate different versions.
In 2003, Big5 was included in the appendix of CNS11643 Chinese Standard Interchange Code, gaining a more formal status.This latest version is calledBig5-2003。[1]
History and name
Announce
edit
Big5 was created by TaiwanInformation Industry Policy Promotion Meeting of Consortium Legal PersonbyFive Chinese software packagesThe designed Chinese common internal code was announced in December 1983. In March of the next year, the Information Industry Policy Promotion Conference signed the "16 digit personal computer suite software cooperative development (BIG-5) project" with 13 Taiwanese manufacturers(Five Chinese software packages)”, because this Chinese internal code isTaiwanIt is designed by the self-developed "Five Chinese Package Software", so it is called Big5 Chinese internal code.Although the five Chinese software packages did not replace foreign software packages as expectedGuoqiao Chinese systemandYitian Chinese systemThe success in Taiwan market has made Big5 code have a profound impact on traditional Chinese computersInternal code, until today.The English name "Big5" of "Five Big Codes" was later translated back into Chinese in English word order, so that now there are two Chinese names "Five Big Codes" and "Big Five Codes".
Big5 code was created becausepersonal computerThere is no common internal code, so the Chinese application software launched by the manufacturer cannot be promoted, andIBM 5550、Wang AnmaInternal codes are incompatible with each other;On the other hand, Taiwan had not yet introduced a Chinese coding standard.In this context of time and space, a project adopted to enable Taiwan to enter the information age as soon as possible;At the same time, this project has also had a long-term impact on the Asian traditional Chinese character circle with Taiwan as the core.
Before Big5 production, research and developmentChinese computerOfZhu BangfuHe believed that the internal code character set should include all orthographic characters and variant characters to meet the needs of applications such as household administration. Therefore, at the internal code conference at that time, he proposed to use his more than 50000 character font.The engineer thinks that although its technology is feasible, the internal code with a length of three bytes (more than two bytes) will cause the problem of text misalignment when English display screen images are mapped to Chinese images, because the popular Reliance Chinese system images were mapped into a Chinese text pattern with a width of two bytes,In English software, as long as a Chinese character is displayed in two English word widths, the picture will not be disordered, causing Chinese system operators to prefer two byte internal codes;In addition, the internal code compressed from the Cangjie input code has no sorting function, so it is not used.In 1983, Zhu Bangfu was falsely accused ofthe Communist PartyIts research results are even less likely to be adopted.
After the birth of Big5, most computer software in Taiwan used Big5Yitian Chinese systemThe high popularity ofMicrosoftWindows 3.xAnd so on.Although there are still various programs in Taiwan that want to replace Big5 code, such as those implemented by the Chinese system of YitianHeaven Reliant CodePromoted by Taipei Computer AssociationGuild codeHowever, since Big5 has been used for many years, it can never become a mainstream character when habits are not easy to change.Taiwan later developedNational standard CNS 11643 Chinese standard exchange codeDue to the unusual internal code system, which is intended for exchange, and is limited by nature, it must use at least three bytes to represent a Chinese character, so its popularity is far lower than that of Big5 code.
In the early 1990s, whenChinese MainlandOfE-mailAnd transcoding softwareShenzhenHong Kong and Taiwan companies have also used the Big5 system to facilitate document exchange with the headquarters and avoid writing a different internal code system for mainland offices.useSimplified ChineseCommunity, most commonly usedGB 2312、GBKAnd subsequentNational standard code(GB 18030)。
In addition to Taiwan, other regions that use traditional Chinese characters, such asHong Kong(Hong Kong Supplementary Character Set)、Macao(Macao Supplementary Character Set), and overseas Chinese who use traditional Chinese characters used Big5 as the Chinese internal code and exchange code.[2]
Byte structure
Announce
edit
Big5 is a setDouble byte character set, a double octet storage method is used to place a word with two bytes.The first byte is called "high byte", and the second byte is called "low byte".
"High byte" uses 0x81-0xFE, "Low byte" uses 0x40-0x7E, and 0xA1-0xFE.In the partition of Big5:
0x8140-0xA0FE
Reserved for user-defined characters(LetteringDistrict)
Common Chinese Characters, press firststrokePress againRadicalSort.
0xC6A1-0xC8FE
Reserved for user-defined characters (word making area)
0xC940-0xF9D5
Secondary common Chinese characters, also sorted by strokes and then by radicals.
0xF9D6-0xFEFE
Reserved for user-defined characters (word making area)
It is worth noting that Big5 repeatedly includes two identical words: "Wu, Wu" (0xA461 [U+5140] and 0xC94A [U+FA0C]), "嗀, 嗀" (0xDCD1 [U+55C0] and 0xDDFC [U+FA0D]).In addition, "ten" and "thirty" are repeated again in the symbolic arearetrieval systemIt will cause no words to be queried.
Code punching problem
Because low bit characters containprograming language、shell、scriptSpecial characters commonly used in strings or commands, such as 0x5C "", 0x7C "|", etc."" is used as an escape symbol in strings for many purposes, and is also called an escape character, such as n (newline), r (homing), t (tab), (its own symbol), "(quotation mark), etcUNIXMost operating systems are used as command pipelines, such as "ls - la | more".If there are these special escape characters in the string, they will beprogramorinterpreterIt is interpreted as a special purpose.However, because it is Chinese, it cannot be correctly interpreted as the above behavior, so the program may ignore this escape symbol or interrupt the operation.If so, it violates the user's original intention toChinese charactersPart of the original intention of use.
Low bit characters overlap ASCII characters as follows:
@ A-Z [ \ ] ^ _ ` a-z { | } ~
It often appears in common words such as "Gong" (0xA55C), "Xu" (0xB35C), "Gai" (0xBB5C), and "Yu" (0xA87C), which makes many software unable to correctly process Big5 encoded strings or files.This question is jokingly personified as“Xu Gonggai”Or“Xu Gaigong”(These three words all have this problem).
The general solution is to add an additional "" character, because "" will be interpreted as "", so the "success factor" string can be correctly treated by the program as the "success factor" string.However, the additional trouble is that some output functions do not treat "" as a special character, so some programs or web pages will incorrectly often appear "" after the words "Xu Gonggai".
Conflicting with line drawing characters
The first byte of Big5 code character will be the same as DOSCode page 437OfUnderlined characterConfrontation results in garbled code.
The original idea of the private character creation area is to allow users to add characters that were originally missing in the code table. However, when each user adds different characters at different places, it is difficult for them to know exactly what a code is intended to express when exchanging data.[2]
development
Announce
edit
As the Big5 extensions launched by manufacturers and governments are incompatible with each otherGarbled codeQuestion.WhereasUnicodeCan correctly handle more than 70000 Chinese charactersoperating systemAnd applications such asApple ComputerMac OS XAnd withCocoaAPI writing proceduresMicrosoftWindows 2000And laterMicrosoft Office2000 and laterMozillaBrowserInternet ExplorerBrowserJavaLanguage, etc.), Unicode encoding has been used instead.Unfortunately, there are still some old software (such asVisual Basic6. PartTelnetorBBSSoftware), which does not support Unicode encoding, so I believe that the problem of missing words in Big5 will still bother users for some time until all programs can change to Unicode.[2]
Input method
Announce
edit
VimIMIn Vim environment, you can directly type decimal or hexadecimal code.Neither guided input method nor code table is required.[2]
see
Announce
edit
CCCII
GB 18030Expansion of the basic set of Chinese character coded character set for information interchange