Garbled code

Chinese words
open 5 entries with the same name
Collection
zero Useful+1
zero
Random code refers to a series of characters that cannot be read because the local computer uses an inappropriate character set when opening the source file with a text editor. There are various reasons for its results.
Chinese name
Garbled code
Cause
System or Software Lack of respect for Character encoding support
common problem
GB code and BIG5 code conflict
Classification
Text and document garbled
Performance
Some or all characters cannot be read
Main body
local computer
English name
messy code [3]

brief introduction

Announce
edit

type

Garbled code
There are four types of Chinese character disorder:
Text garbled: Windows system displays garbled code, such as menu, desktop, prompt box, etc. This is caused by the improper setting of the font part in the registry;
Document garbled: It refers to the garbled code in the place where the executable originally displayed Chinese. The reasons for this kind of garbled code are complex, including the first type of garbled code, or the Chinese dynamic link library used in the software is covered by the English dynamic link library;
Document garbled: mainly refers to mail garbled;
Web page garbled code: it is caused by the non common use of BIG5 in traditional Chinese in Hong Kong and Macao and GB2312 in simplified Chinese in mainland China.
Correct the garbled code and use the system Internal code Conversion tools, such as "Antarctic Star", convert the system internal code into the corresponding internal code, character The display is correct.

reason

Generally, it is software program decoding error. If the browser displays GBK code as Big5 code, or E-mail program Decode the email sent by the other party incorrectly. If the encoding is wrong when sending, the recipient's email program cannot be decoded, and the sender's email program needs to be re encoded before sending. The font file is incorrect. The source code is incorrect, or the file is damaged.
The operating system in one language version has an application in another language version installed, or the language version of the upgrade patch installed by the application is inconsistent with the language version installed by the application.
Early single byte When opening a file in a double byte language, the application program of Microsoft cannot correctly recognize the text segmentation, and divides a word into two paragraphs at the newline, resulting in the whole line immediately following is full of garbled code.
A file created by a higher version of the program is not recognized by a lower version of the application.
Due to the internal conflict of the modified files such as TXD, the "Readme", "Must See!" and other reading files of the game modified software such as MOD (modification) CLEO, IV patch, real patch, skill patch, upgrade patch and CCI character patch will be garbled.
Incorrect operation of computer software will also lead to garbled codes in the whole file
Database reason
The data is correct, but the database configuration is incorrect, and the wrong character set is used. Generally, it is database migration and restoration DBA Caused by the error of.
Generally, the client uses the default character set, such as developing on the GBK machine, but changes to Linux The following shows that the data read is garbled.
The solution is to specify the connection parameters explicitly data transmission The character set used instead of the operating system default.
Data error. Generally, it is the data encoding problem sent by the client. For example, the data sent by the page is UTF-8, but the background processing program is GBK. As a result, the data saved to the database is garbled.
Solution: All character set codes shall adopt uniform codes. For example, they all use GBK. [1]

Relevant information

Announce
edit

Avoid garbled code

1. Try to use unified coding. If you are re developing a system, especially in Java, it is recommended to use UTF-8 coding from the page to the database to the configuration file. Safety first.
2. The use of SetCharacterEncodingFilter is not omnipotent, but without it it will be very troublesome. If it is developed based on Servlet, you can use it if you can. However, one thing to note is that this filter is only effective for POST requests, and GET is ignored. If you don't believe it, you can debug it to see how it works. As for why you don't filter get requests, it seems that it is powerless for GET requests.
3. As mentioned above, there is a problem with GET requests. Try to use POST requests, which is also a basic point of Web development:
4. Avoid JavaScript and Ajax garbled code. Note that JavaScript defaults to ISO8859 code. Avoid JS/AJAX garbled code. Just like GET, do not use Chinese in URLs. If you can't avoid it, you can only transcode when generating links.
5. Unify the development environment as soon as possible, and simulate the real environment test as soon as possible. This seems to be a digression. All software development does this, but it is still worth noting.

Countermeasures

1. The development environment is garbled
Because Java uses UTF-8 as the default code, and many people on the Internet suggest that UTF-8 should be used as the default code instead of GBK when developing Struts. The IDE uses Eclipse. The default text editor should be changed to UTF-8 code when using Eclipse for the first time.
2. Filtering POST requests
This is the most basic thing. Every servlet system basically uses this thing. However, it is only valid for POST requests, which is very critical. Use SetCharacterEncodingFilter, a very basic set of filters, to filter all POST requests from the page into UTF-8 encoding.
3. The JSP page is garbled
Change all JSP pages to charset=UTF-8, which can ensure that UTF-8 code is used when interacting with the background. Generally, applications can cope with the above work.
4. The problem of converting Chinese characters into UTF-8 characters in the resource file
Internationalization problem. When using the resource file, because the Chinese language cannot be recognized by the program in the properties file, it needs to be transcoded. I made a very simple one under the resource file Bat file Every time the resource file is modified, it is modified in a temporary file, and then the bat file is executed to convert and save it as the required resource file.
5. GET request garbled
If the get method is used in this project to submit the request and attach parameters, the code will be garbled because Tomcat's default request code is ISO8859, and a parameter needs to be added to Tomcat's configuration file server.xml, URIEncoding=”UTF-8”, In this way, the parameters of the attachment in the request will be encoded in UTF-8. [2]