
Basic meaning of hash algorithm
Hash is the basis of cryptography, and understanding hash is a necessary prerequisite for understanding digital signature, encrypted communication and other technologies.
Hash, English is hash, which originally means "chop and stir". There is a kind of food called Hash, which is made by chopping and stirring the ingredients. The result of the hash function is the hash value, usually referred to as hash. Hash functions are sometimes translated as hash functions.
According to the definition of Wikipedia, what the hash function needs to do is to generate a fixed length data for a data of any size as its mapping. The so-called mapping is one-to-one correspondence. A reliable hash algorithm should meet three points.
The first is security. Given data M, it is easy to calculate the hash value X, while given X cannot calculate M, or the hash algorithm should be a one-way algorithm. The second is unique. Two different data must have different hashes. The third is fixed length. Given a hash algorithm, no matter how large the input data is, the output length is fixed.
But think about it carefully, if the hash length is fixed, that is, the value range is limited, and the value range of the input data is infinite, so there will always be two different inputs with the same hash. Therefore, the security of hash function must be a relative concept. If two different inputs have the same output, it is called collision. For different hash algorithms, the more hash bits, which basically means that the higher the security level, or its "anti-collision", the better.
Let's talk about the main function of hash function. The uniqueness of the hash ensures that if the data is slightly damaged during storage or transmission, its hash will change. One of the most common functions of the hash function is to perform integrity check. Integrity means that the data is not damaged. Hash has many different names, sometimes called Digest, sometimes called Checksum check value, and sometimes called Fingerprint fingerprint. In fact, they mean the same thing, that is, hash can be used to represent the data itself.
For example, when a friend sends me a copy of data, I have one copy and he has one copy. If the hash values of the two copies of data are the same, then the contents of the two copies of data are the same, or it can be said that the data is not damaged during the transmission, and the data I have in my hand is complete.
Therefore, the basic function of the hash function is to calculate a summary string of fixed length for big data, which is called a hash. Hash is mainly used for integrity verification.
Classification of Hash Algorithms
Next, we will classify hash functions and talk about the characteristics of hash algorithms in more detail. First of all, there are many kinds of hash algorithms, such as md5, sha256, and so on. But they can be generally divided into two categories: ordinary hash and cryptographic hash function.
There are many kinds of hash algorithms that can be found in the industry. We can talk about it roughly according to the length of the output hash. Although the security of the hash algorithm is not only related to the hash length, the longer the hash value is, the safer it is.
For example, the output of CRC-32 is 32 bits, that is, 32-bit binary numbers, expressed in hexadecimal is 8 bits. The hash of MD5 algorithm is a 32-bit hexadecimal number, which is common. SHA-256 has 256 bits, and the hexadecimal representation is 64 bits. These algorithms can be divided into ordinary hash algorithms and encrypted hash algorithms, and there is no obvious difference between the two algorithms. For example, MD5 was originally designed for encryption hashing, but later due to the development of computers, the possibility of collision of MD5 is very high, so at present MD5 can only be used as an ordinary hash for data verification.
The difference between an encrypted hash and an ordinary hash is security. The general principle is that as long as a hash algorithm has collided, it will not be recommended as an encrypted hash. Only a hash algorithm with high security can be used as an encrypted hash.
At the same time, encrypted hashes can also be used as ordinary hashes. The Git version control tool uses the SHA-1 encryption hash algorithm to verify the integrity. Generally speaking, the more secure the hash algorithm is, the slower the processing speed will be. Therefore, not all occasions are suitable for using encrypted hashes to replace ordinary hashes.
In the field of cryptography, there are two algorithms that input data and output data that no one can understand. One is the hash algorithm, and the other is the encryption algorithm. Note that the hash algorithm is completely different from the encryption algorithm.
First, the output length of the hash algorithm is fixed, while the output length of the encryption algorithm is directly related to the length of the data itself. Second, the hash cannot be inversely calculated, but the output of the encryption algorithm must be able to inversely calculate the data. We won't talk about the encryption algorithm here.
It is emphasized here that the encryption hash algorithm is only used in the encryption process, but it is not an encryption algorithm itself.
In short, there are many kinds of hash algorithms, and the longer the algorithm is, the safer it is. The hash algorithm with low security is considered as an ordinary hash algorithm, which is mainly used for integrity verification. High security is called encryption hash algorithm, which will be used in encryption algorithm. The so-called high and low are relative concepts. For example, MD5 used to belong to encryption hash, but now it can only be used for security verification. Since 2017, the encryption certificate generated by SHA-1 algorithm will also be rejected by major browsers. At present, the most popular encryption algorithm is SHA-2, but unlike SHA-1, SHA-2 is not an algorithm, but a general term for a series of algorithms, including the SHA-256 we mentioned earlier.
Practical examples
Having said so much basic knowledge, let's talk about the application of hash in reality in the last part.
The first scenario is website registration. When we submit the user name and password, the user name will be saved directly to the website database, but the password is not saved directly. Instead, the password will be converted into a hash, which is actually saved in the database. Therefore, even the company's background management personnel cannot get the user's password. In this way, if the company database is leaked, the user's password is still secure. When the user logs in to the website himself, he enters the password and submits it to the server. The server performs the same hash operation. Because the input data has not changed, the hash will not change, and the login will be successful.
Another scenario is in blockchain and cryptocurrency. When generating Bitcoin addresses, SHA-256 algorithm is used, which is also the workload proof algorithm.
Basically, hash functions are used more or less wherever cryptography is involved.
summary
So far as hashes and hash functions are concerned, here are a few words to summarize.
The basic function of hashing is to provide a summary or fingerprint of data. The common usage scenario is integrity verification. There are many kinds of hash algorithms. Generally, the longer the hash, the higher the security. Hashes with enough security, or no one can successfully implement collision hashes, are eligible to be considered for encryption, and such hash algorithms are also called encryption hash algorithms.
In practice, common hash algorithms include MD5, SHA-1, and SHA-256. MD5 is usually used for data verification. SHA-1 used to be an encryption hash algorithm, but it has been removed from the list and can only be used as a more secure verification algorithm. SHA-256 is still a widely used cryptographic hash algorithm, which will be used in Bitcoin address generation and POW workload proof algorithm.