Collection

zero Useful+1

zero

decoding

[yì mǎ]

decoding

open 2 entries with the same name

This entry is made by Compilation and application of scientific encyclopedia entries of "Science Popularization China" to examine.

decoding yes code And simultaneously remove Bitstream Noise mixed in the process of propagation. Translate the text into a group of numbers using the decoding table or use the decoding table to represent a series of information signal The process of translating into words is called decoding.

decoder yes electronic technique MIMO Combinational logic circuit , responsible for Binary code Translate to specific objects (such as Logic level Etc.), function and encoder contrary. Decoders are generally divided into general purpose decoder and digital display decoder.

In digital circuit, decoder (such as n-line - 2n line BCD Decoder) can act as Multiple input multiple output Logic gate The role of can transform the encoded input into the encoded output. Here, the encoding of input and output is different. The input enable signal must be connected to the decoder to make it work normally, or the output will be an invalid codeword. Decoding in Multiplexing 、 Seven segment digital tube and Memory Address decoding and other applications are necessary.

Chinese name: decoding
Definition: Is the reverse process of coding

Nature: written words
Infiltration: Noise
Field: Digital and Analog Electronic Technology

catalog

▪ 2. National standard, location, "quasi national standard"
▪ 3. GBK code
▪ 4. BIG5 code
▪ 5. HZ code
▪ 6. ISO － 2022CJK code
▪ 7. UCS and ISO 10646

criterion

Announce

edit

Assuming that the coding sequence is (∧) 1 2, m m m C=c c, the receiver receives the signal Is R（ analog signal Or digital signal, depending on the definition of the channel), then the receiver will naturally look for conditions in all possible code sequences probability The largest P (C R) m is considered as the most likely transmission sequence. That is:

C~Arg {MAX P (C R)} m C mm=This decision criterion is called maximum a posteriori probability Guidelines (MAP).

algorithm

Announce

edit

Viterbi decoding algorithm is a Convolutional code Of decode Algorithm. The disadvantage is that the complexity of the algorithm increases rapidly with the increase of the constraint length. To be compared when the constraint length N is 7 route There are 64. When the number is 8, the number of paths becomes 128. （2<<(N-1））。 Therefore, viterbi decoding is generally applied in the case where the constraint length is less than 10.

algorithm It is specified that the data received at time t should be compared for 64 times, that is, there are two in each path of 64 states branch (Because the input is 0 or 1), at the same time, it jumps to two different states, compares the two corresponding outputs with the actually received output, discards the one with a large measurement value (that is, the comparison results differ greatly), and the one left is called survival route Add the surviving path to the measurement of the surviving path at the previous time and save it. This adds one step to the 64 surviving paths. At the end of decoding, 64 lines survived route Select the one with the smallest measurement and reverse deduce the surviving path (called to flash back ）And get the corresponding decoding output.

Code definition

Announce

edit

Put words, numbers or other object Digitize, or convert information and data into specified electric pulse signal 。 Coding is widely used in computer, television, remote control and communication. Coding is based on certain agreement Or format Analog information convert to Bitstream Process.

stay computer hardware In, coding is the process of converting information into coded values (typically numbers) on a subject or unit for data storage, management and analysis purposes. In software, coding means logically executing a program using a specific language such as C or C++. stay cryptography In, coding refers to the act of writing in a code or password.

Convert data into codes or coded characters, and be able to translate into the original data form. It is the process of computer writing instructions and part of programming. In automatic cartography, the process of using numbers and letters to represent map content according to certain rules. Through coding, computers can identify geographical elements of maps.

The n-bit binary number can be combined into different information of the n-th power of 2, and each information is specified with a specific code group. This process is also called coding.

digital system There are two types of codes commonly used in Binary encoding The other is binary decimal coding.

Coding system

Announce

edit

1. ASCII and Binary

The files we encounter daily are classified into ASCII and Binary. ASCII is the acronym of "American Standard Code for Information Exchange", which can be called "American Standard". American Standard specifies 128 numbers from 0 to 127 to represent the standard code of information, including 33 control codes, a space code, and 94 image codes. The image code includes English upper and lower case letters, Arabic numerals, punctuation marks, etc. The English computer text we usually read is transmitted and stored in the form of image code. American Standard is the universal code for most computers in the world.

However, a character in a computer is mostly represented by an eight digit binary number. So every character There may be 256 different values. Since the American Standard only stipulates 128 codes, the remaining 128 codes are not standardized, and the usage varies from family to family. In addition, the use of 33 control codes in American Standard varies from manufacturer to manufacturer. So when we exchange files between different computers, it is necessary to distinguish between two different types of files. Each word in the first type of document is an American standard image code or a space code. Such documents are called "American Standard" text file ”(ASCII Text Files), or omitted“ text file ”, which can be directly exchanged between different computer systems. The second type of documents, that is, documents containing control codes or non US standard codes, cannot be directly exchanged between different computer systems. This kind of document has a general name, called“ Binary file ”（Binary Files）。

2. National standard, location, "quasi national standard"

“ national standard ”It is used for "National Standard Information Exchange of the People's Republic of China" Chinese character coding ”Short name of. national standard The table (basic table) arranges more than 7000 Chinese characters, punctuation marks, foreign letters, etc. into a square array of 94 rows and 94 columns. Each horizontal line in the square array is called an "area", and each area has 94 "bits". The coordinates of a Chinese character in the matrix are called“ Location Code ". For example, the word "Zhong" is in the 48th position of the 54th area in the square array Location The size is 5448.

In fact, the number is 94. It is the total number of image codes in American Standard. national standard The table uses this number to represent a Chinese character with two American logo symbols. Since the code of the American Standard image symbol is from 33 to 126, if the Chinese character area and bit code are added with 32 respectively, the range of the American Standard image code will overlap. For example, if 32 is added to the "middle" word area and bit code, 86 and 80 are obtained. The hexadecimal system of these two numbers is put together to get 5650, which is called“ national standard Code ", and the corresponding two American symbols, VP, are the" national symbol "of the word" Chinese ".

This leads to a distinction between national standard The question of symbol and American symbol. In a mixed Chinese and English file, does "VP" stand for "Chinese" or an English acronym? When developing CCDOS, the Sixth Research Institute of the Ministry of Electronic Industry used a simple solution: national standard Add 128 to each of the two numbers of the code to rise to the position of the non US standard code. (Changed national standard Code, which is still traditionally called "national standard".)

Although this solution solved the original problem, new problems arose. Chinese documents have become“ Binary file ”, which can neither be reliably exchanged between different computer systems, nor can it interact with most of the American standard symbols in the market Software compatibility 。

To distinguish between the two“ national standard ”We call the original national standard code overlapping with the American standard image code "pure national standard", and the national standard code of CCDOS plus 128 "quasi national standard".

3. GBK code

GBK code is an extension of GB code Character encoding , has coded more than 20000 simple and complex Chinese characters, and the simplified versions of Win95 and Win98 use GBK as the system Internal code 。

GB is national standard K is the first letter of the "extended" Chinese phonetic alphabet. In fact, GBK is another Chinese character coding Standard, full name: Chinese Internal Code Specification, issued in 1995.

From the practical application, Microsoft Since the simplified Chinese version of win95, the system has adopted GBK code, which includes TrueType Song typeface and bold typeface GBK font library (provided by Beijing Zhongyi Electronic Company), which can be used for display and printing, and provides four GBK Chinese character input methods. In addition, the browser IE4.0 simplified and traditional Chinese versions internally provide a GBK-BIG5 code bidirectional conversion function. In addition, in the language pack provided by Microsoft for IE, the simplified Chinese language support kit has two fonts, Song typeface and bold typeface, which are also GBK Chinese characters (Zhuhai Stone Computer Typesetting Provided by the system development company). Other Chinese font manufacturers have also begun to provide TrueType or PostScript GBK font library 。

Many plug-in Chinese platforms, such as Antarctica , Richwin, etc., providing GBK code support, including font, input method, and converters between GBK and other Chinese codes.

On the Internet, many websites use GBK codes.

But most search engines can not support GBK Chinese character search very well, and some search engines in mainland China can not support GBK Chinese character search perfectly.

GBK is compatible with GB-2312 code downwards and supports ISO 10646.1 international standard upwards, which is a starting standard for the former to transition to the latter.

GBK specification includes all CJK Chinese characters and symbols in ISO 10646.1, and has some supplements. Including: all Chinese characters and non Chinese characters in GB 2312; Other CJK Chinese characters in GB 13000.1. The above total 20902 GB Chinese characters; 52 Chinese characters not included in GB 13000.1 in the Simplified Summary Table; 28 radicals and important components not included in GB 13000.1 in Kangxi Dictionary and Cihai; 13 Chinese character structure characters; 139 graphic symbols in BIG-5 that are not included in GB 2312 but exist in GB 13000.1; 6 phonetic symbols added in GB 12345; 19 vertical graphic symbols added to GB 12345 (29 vertical punctuation symbols added to GB 12345 compared to GB 2312, 10 of which are not included in GB 13000.1, so GBK will not accept them); 21 Chinese characters selected from CJK compatible area of GB 13000.1; 31 IBM OS/2 special symbols in GB 13000.1 revenue. GBK also adopts double byte It indicates that the overall coding range is 0x8140~0xFEFE, the first byte is between 0x81~0xFE, and the last byte is between 0x40~0xFE. Excluding a line of 0x × × 7F, there are 23940 code bits in total, and 21886 Chinese characters and graphic symbols are included, including 21003 Chinese characters (including radicals and components) and 883 graphic symbols.

4. BIG5 code

BIG5 code is for traditional Chinese characters Chinese character coding , currently in Taiwan and Hong Kong computer system It is widely used in. The coding range of BIG5 code is shown below.

5. HZ code

HZ code is widely used on the Internet Chinese character coding 。 The "HZ" scheme is characterized by "pure national standard ”Chinese and American standard codes are mixed. How to distinguish "HZ" national standard What about those that match the American logo? In fact, the answer is very simple: when a section of national standard code is inserted in the middle of a string of American standard codes, we add~in front of the national standard code and~after it. These additional codes are called "escape codes" and "escape codes" respectively. Since these additional codes are also American standard image codes, the whole document is just like an American standard text file , which can be safely transferred on the computer network, and also can be used to process most English texts Software compatibility 。

6. ISO － 2022CJK code

ISO-2022 is the International Standards Organization (ISO) character Developed coding standards. Use two byte The Chinese code is called ISO-2022 CN, and the Japanese and Korean codes are called JP and KR respectively. Generally, they are collectively called CJK codes. At present, CJK code is mainly used in Internet network.

7. UCS and ISO 10646

In 1993, the international standard ISO10646 defined Universal Character Set (Universal Character Set, UCS). UCS is all other character set A superset of the standard. It guarantees that character set It is bidirectional compatible. That is, if you translate any text string to UCS format and then back to the original code, you will not lose any information.

UCS contains character 。 It includes not only descriptions in Latin, Greek, Slavic, Hebrew, Arabic, Armenian and Georgian, but also hieroglyphs in Chinese, Japanese and Korean, as well as Hiragana, Katakana, Bengali and Punjabi Golumucci character (Gurmukhi), Tamil, Kannada, Malayalam, Thai, Lao, Bopomofo, Hangul, Devangari, Gujarati, Oriya, Telugu and other languages. For languages that have not yet been added, they will eventually be added because they are studying how to best code them in the computer. These languages include Tibetian, Khmer, Runic, Ethiopian, other hieroglyphs, and various Indo European languages, as well as selected artistic languages such as Tengwar, Cirth, and Klingon. UCS also includes a large number of graphic, printing, mathematical, and scientific symbols, including all those provided by TeX, Postscript, MS-DOS, MS-Windows, Macintosh, OCR fonts, and many other word processing and publishing systems character 。

ISO 10646 defines a 31 bit character set 。 However, in this huge coding space, only the first 65534 code bits (0x0000 to 0xFFFD) have been allocated so far. The 16 bit subset of this UCS is called the Basic Multilingual Plane (BMP) character They are very special characters (such as hieroglyphs), and only experts in history and science will use them. According to the current plan, there may never be character It is allocated beyond the 21 bit encoding space from 0x000000 to 0x10FFFF, which covers more than 1 million potential future characters. ISO 10646-1, first published in 1993, defines character set And the architecture of the content in BMP. Define Character encoding The second part of ISO 10646-2 is under preparation, but it may take several years to complete. new character It is still being added to BMP continuously, but the existing characters are stable and will not change any more.

UCS not only provides character Assign a code and give it a formal name. That represents a UCS or Unicode value Hexadecimal number , usually add "U+" in front, just like U+0041 represents character "Latin capital A". UCS character U+0000 to U+007F is consistent with US-ASCII (ISO 646), and U+0000 to U+00FF is consistent with ISO 8859-1 (Latin-1). From U+E000 to U+F8FF, a large range of codes beyond BMP are reserved for private use.

In 1993, four USC-4 (Universal Character Set) defined in ISO10646 were used byte Is wide enough to accommodate a considerable amount of space, but this is too fat character At that time and even now, the standard had its unrealistic side, that is, it would occupy too much storage space And affect the efficiency of information transmission. At the same time, the Unicode organization also started to develop a 16 bit character Standard. In order to avoid the competition between the two 16 bit codes, the two organizations began to negotiate in 1992 with a view to finding common ground through compromise. This is UCS-2 (BMP, Basic Multilingual Plane, 16 bit) and Unicode today, but they are still different schemes.

8. Unicode code

We need to trace the origin of Unicode.

When computers were popularized in East Asia, they encountered the use of ideographic character China, Japan, South Korea and other countries that are not alphabetic languages. Commonly used in the languages used in these countries character As many as thousands of characters, but the original characters are single byte Code, one piece code page The maximum number of characters that can be accommodated in is only 28=256. There is nothing that can be done for languages that use ideographic characters. Since a byte Not enough. Naturally, people use two bytes character set （DBCS）。 But double byte character set Although ideographic characters in Chinese use two byte codes, the ASCII code and Japanese katakana are still represented by single byte, which brings a lot of trouble to programmers, because whenever DBCS string processing is designed, it is always necessary to judge whether a byte represents a character or a half character. If it is a half character, Is that the first half or the second half? It can be seen that DBCS is not a very good solution.

People are constantly looking for better Character encoding The final result of the scheme is the birth of Unicode. Unicode is actually wide byte Character set, which uses two bytes for each character, namely 16 bits, so when processing characters, you don't have to worry about only processing half a character.

At present, Unicode is used in networks, Windows systems and many large-scale software.

Among GB coding standards, GB2312 and GBK are commonly used. GB2312 is a subset of GBK, and the GB2312 coding range is 0xA1A1 - 0xFEFE. If the pure GB2312 coding is simple, GBK is easy to process character set There are some hints. Let's talk about the GBK coding standard first:

GBK adopts double byte It indicates that the overall coding range is 8140-FEFE, the first byte is between 81-FE and the last byte is between 40-FE, and the line xx7F is eliminated. There are 23940 code points in total, including 21886 Chinese characters and graphic symbols, including 21003 Chinese characters (including radicals and components) and 883 graphic symbols.

All code classifications

Announce

edit

1. Chinese character area. include:

a. GB 2312 Chinese character area. Namely GBK/2: B0A1-F7FE. 6763 Chinese characters in GB 2312 are included, in the original order.

b. GB 13000.1 Extended Chinese character area. include:

(1) GBK/3: 8140-A0FE。 6080 CJK Chinese characters in GB 13000.1 are included.

(2) GBK/4: AA40-FEA0。 8160 CJK Chinese characters and supplementary Chinese characters are included.

CJK Chinese characters come first, arranged according to UCS code size; The added Chinese characters (including radicals and components) are listed below according to the page number of Kangxi Dictionary/ Word bit Arrange.

2. Graphic symbol area. include:

a. GB 2312 Non Chinese character symbol area. Namely GBK/1: A1A1-A9FE. In addition to the symbols in GB 2312,

There are also 10 lowercase Roman numerals and symbols supplemented by GB 12345. There are 717 counting symbols.

b. GB 13000.1 expanded non Chinese character area. Namely GBK/5: A840-A9A0. BIG-5 non Chinese character symbols, structure symbols and "○" are arranged in this area. There are 166 counting symbols.

3. User defined area: It is divided into (1), (2) and (3) three cells.

(1) AAA1-AFFE, 564 code points.

(2) F8A1-FEFE, 658 code points.

(3) A140-A7A0, 672 code points.

Although Zone (3) is open to users, its use is restricted, because it does not rule out adding new areas in the future character Possibility.

Here are some tips:

1. In php, Character encoding It is based on the code sent, because the code used is the code entered by the user and will not be automatically changed, but in ASP, the default code is unicode, so we can easily get the gbk ->unicode code comparison table, so that it is easy to realize the conversion from gbk to utf-8 even without a basic library;

2. Because GBK is the highest and lowest value is 0x40, that is, 64, sometimes when organizing some strings involving Chinese, the segmentation character It is better to use the ones before 64 ascii Code, so that replacement or segmentation in any case will not lead to garbled codes. The commonly used characters are ",", ";", ":", "", "", "", "", and "". These characters will never confuse the gb code.

Coding type

Announce

edit

Encoding is a basic perceptual process that interprets afferent stimuli in cognition. Technically, this is a complex, multi-stage conversion process, from more objective sensory input (such as light and sound) to subjective meaningful experience.

Character encoding Character encoding is a set of rules that can be used to pair a set of natural language characters (such as the alphabet or syllable list) with a set of other things (such as numbers or electric pulses).

Text encoding uses a Markup Language To mark the structure and other features of a text for computer processing.

Semantics encoding refers to the semantic encoding of formal language A with formal language B, which is a method of using language B to express all words (such as programs or instructions) of language A.

Electronic encoding is to convert a signal Convert to a code that has been optimized for transmission or storage. The conversion is usually performed by a Codec Done.

Neural encoding refers to the method of how information is described in neurons.

Memory coding (Memory encoding) is the process of converting feelings into memories.

Encryption is the process of transforming information for confidentiality.

Transcoding is the process of converting encoding from one format to another.

Physics

Announce

edit

In physics, coding and decoding are referred to as gate circuits.

take analog signal The period of conversion to binary digit is called Analog-to-digital converter （ADC）。 A digital to analog converter (DAC) converts binary numbers into analog quantities. Encoder and decoder It is generally used for chip address selection. 3-8 decoder It is to convert the input three bit code into 8-bit output, so that one bit is different from the others. For example, 010 is 00000010 after decoding.

Novice on the road

Growth task Getting Started with Editing Edit Rule Edit by myself

I have questions

Content query Online Service Official post bar Feedback

Complaints and suggestions

Report bad information Failed to appeal through entries Complaint of infringement information Blocking query and unblocking

Jinggong Network Anbei No. 1100000200001