Hashing for integrity checking
Integrity Checking
Hashing can be used to check that files haven’t been changed. If you put the same data in, you always get the same data out. If even a single bit changes, the hash will change a lot. This means you can use it to check that files haven’t been modified or to make sure that they have downloaded correctly. You can also use hashing to find duplicate files, if two pictures have the same hash then they are the same picture.
Let’s Review: A hash function or hashing transforms and maps an arbitrary length of input data value to a unique fixed length value. Input data can be a document, tree data, or a block data. Even a slight difference in the input data would produce a totally different hash output value.
Two basic requirements of a hash function.
The algorithm chosen for the hash function should be a one-way function and it should be collision free, or exhibit extremely low probability of collision.
The first requirement is to make certain that no one can derive the original items hashed from the hash value. Can you make potatoes out of mashed potatoes? The second requirement is to make sure that the hash value uniquely represents the original items hashed. There should be extremely low probability that two different datasets map onto the same hash value. These requirements are achieved by choosing a strong algorithm such as secure hash, and by using an appropriately large number of bits in the hash value.
Most common hash size now is 256 bits and the common functions are SHA-3, SHA-256 and Keccak. Hash value space, how good is 256 bits hash? A 256-bit hash value space is indeed very large. 2 to the power of 256 possible combinations of values. That is approximately 10 to the power of 77. That is 10 followed by 77 zeros. Odds of a meteor hitting your house is higher than generating two of the same hash values of 256 bits when applying this algorithm.
Let’s proceed to explore some techniques now. We’ll compare two different approaches for hashing based on how the constituent elements are organized. A simple hash and a Merkle tree hash. Here we illustrate simple hashing and Merkle tree hashing with ADD as a hash function. We use the data 10, 4, 6, 21 and 19 and ADD as a hash function. Actual hashing functions are quite complex and are variations of SHA-3, and the data values are much larger, mainly 256 to 512 bit values. In the simple hash approach, all the data items are linearly arranged and hashed.
In a tree-structured approach, the data is at the leaf nodes of the tree, leaves are pairwise hash to arrive at the same hash value as a simple hash.
Note that the state is a variable that may be modified by a smart contract execution, and the result of the execution may be returned in a receipt. Tree structure helps the efficiency of repeated operations, such as transaction modification and the state changes from one block to the next. Log N versus N.
Summarizing, in Ethereum, hashing functions are used for generating account addresses, digital signatures, transaction hash, state hash, receipt hash and block header hash. SHA-3, SHA-256, Keccak-256 are some of the algorithms commonly used for hash generation in blockchains.