Skip to content Skip to footer

What’s a hash function?

Hash functions are quite different from encryption. There is no key, and it’s meant to be impossible (or very very difficult) to go from the output (the hash) back to the input (the plaintext).

A hash function takes some input data of any size, and creates a summary or “digest” of that data. The output is a fixed size. It’s hard to predict what the output will be for any input and vice versa. Good hashing algorithms will be (relatively) fast to compute, and slow to reverse (Go from output and determine input). Any small change in the input data (even a single bit) should cause a large change in the output.

The output of a hash function is normally raw bytes, which are then encoded. Common encodings for this are base 64 or hexadecimal. Decoding these won’t give you anything useful.

Hashing for a five-Year-old:

Imagine you have a magical treasure chest (the blockchain) where you want to keep track of everyone’s toys (transactions). Now, you want to make sure that no one can mess with the toys or sneak in some fake toys.

So, you decide to use a special magical lock (hash function) that turns each toy into a secret code. This code is unique for every toy, like a fingerprint for toys.

Now, you put the toys into the treasure chest, but you don’t just put them in as they are. You first use the magical lock to turn each toy into a secret code. Then, you arrange the toys in a line (block) and use the lock again to create a special code for the whole line of toys. This special code is like a super-secret password for the treasure chest.

Whenever someone wants to add more toys to the chest, they have to use the magical lock to create codes for those toys and add them to the end of the line. And the special password for the whole chest changes each time you add new toys.

This way, if anyone tries to sneak in fake toys or take some out, the super-secret password won’t match, and everyone will know something fishy is going on!

So, hashing on the blockchain is like using a magical lock to keep all the toys safe and making sure no one can play tricks with them.

Why should I care?

Hashing is used very often in cyber security. When you logged into your Facebook account, it used hashing to verify your password. When you logged into your computer, it also used hashing to verify your password. You interact indirectly with hashing more than you would think, mostly in the context of passwords.

What’s a hash collision?

A hash collision is when 2 different inputs give the same output. Hash functions are designed to avoid this as best as they can, especially being able to engineer (create intentionally) a collision. Due to the pigeonhole effect, collisions are not avoidable. (The pigeonhole effect is -there are a set number of different output values for the hash function, but you can give it any size input. As there are more inputs than outputs, some of the inputs must give the same output. If you have 128 pigeons and 96 pigeonholes, some of the pigeons are going to have to share).

MD5 and SHA1 have been attacked, and made technically insecure due to engineering hash collisions. However, no attack has yet given a collision in both algorithms at the same time so if you use the MD5 hash AND the SHA1 hash to compare, you will see they’re different. The MD5 collision example is available from https://www.mscs.dal.ca/~selinger/md5collision/ and details of the SHA1 Collision are available from https://shattered.io/. Due to these, you shouldn’t trust either algorithm for hashing passwords or data.

Question 2. What is the output size in bytes of the MD5 hash function?

MD5 processes a variable-length message into a fixed-length output of 128 bits.

128 bit= 16 bytes

Question 3. Can you avoid hash collisions? (Yes/No)

Question 4. If you have an 8 bit hash output, how many possible hashes are there?