What is a hash function? (1)
In this article, we are going to explain addresses and hash functions. Unlike the examples of commonly used cryptosystems such as signing contracts or encrypting files, hash functions can be a little difficult to understand because they are not presented as a direct example to the user. However, since it plays a very important role in the construction of the blockchain, it is a concept that must be understood for the safe management of assets. Since the content can be long, it is composed of two parts. In this first part, we will look at the necessity and characteristics of a hash function.
To help you understand the hash function, let's look at a simple example. Let's say you have 100 important document files on your computer. Since it is an important file, a big problem may occur if someone arbitrarily modifies it. However, it would take a very long time to open 100 files directly to check the contents and to find the modified parts by turning the pages one by one. Wouldn't it be great if there is a simpler way to check?
Well, we are in luck. Without having to check all 100 files individually, you can quickly check whether they have been changed or not. And if there is a change, there is a way to immediately notice. This simple method of checking can be accomplished by creating hash values for 100 files using a hash function. The hash value means the value that is output when the input is put into the hash function. A property that indicates data has not been forged is called integrity, and the hash value is used for integrity verification.
Although the function and purpose are different from hashes, compression is a process that works similarly. Compressing a file reduces the size of the file, and decompressing the compressed file reveals the original file. Similarly, the hash function provides the output result through a transformation process similar to compression on the input data. However, the characteristics of hash functions are very different from compression, so let's look at these processes a little bit.
The first characteristic of hash functions is that they return a fixed output length. In the case of compression, if the file size is large, the file size is relatively large even if compressed, but the hash function returns the same output length no matter what input is entered. Since this output length appears as a 40-digit number (160-bit HEX value) that the human eyes can identify, it is much better to create a hash value of 100 files and quickly compare only 40 digits rather than comparing all 100 files.
The second characteristic is that even a small change in the input data has a completely different output result. Even a slight change in input data has completely different output results, so it can be easily detected. These two features can be seen in the figure below.
The third characteristic is that the input data cannot be estimated from the output data. Unlike compression, which produces the original file through inverse transformation from a compressed file, it is very difficult to guess what the input data was from the output of the hash. Of course, you cannot create the original (input data) through a hash function. Even if you tell someone the hash value of 100 files, the contents of the files will not be leaked.
The fourth characteristic is collision resistance, which means that it is impossible to find two different input data which result in an identical output value. If the output values created through inputting 100 files, and 100 forged files show the same output value, then hash cannot be used for integrity verification. Therefore, hash functions are designed to prevent this from happening, and this characteristic is called collision resistance.
In this part, we learned about the function and purpose of the hash function and the characteristics of the hash function. In summary, hash functions are used for the purpose of making integrity verification quick and easy. In the next part, we will look at how hash functions are used in blockchain.
Last updated