If you have heard of Bitcoin or any of the other cryptocurrencies, then you have probably heard the word blockchain or the term “blockchain technology”. But what is it? How does it apply to digital currencies? Does it have other uses? How does it work? Please, let me explain!
The biggest global business problem is trust. When two parties want to do business, the top question is, can one trust the other? As a result of this trust issue every country has institutions and laws which govern how business is done. Banking and financial regulation laws; consumer rights and consumer courts; contract law and arbitration; all these things exist to generate trust and to punish those who break that trust.
When thinking about money specifically, financial institutions, like banks, are the guardians of trust. Without trust, or at least the perception of trust, no financial systems can exist. What cryptocurrencies have done is remove the need for a bank, or a financial institution, or a government to provide that trust. They do this with blockchains.
Now before I go on, I just want to mention that when I talk about trust, I am talking about the trust needed at a transaction level. If person X is paid by person Y, will that transaction actually occur? Will it be recorded somewhere? Is there an institution who will oversee that transaction?
I am not talking about the value of money or the value of stocks and shares, or the value of property. The value of something is also partly due to trust, but it’s a different side of trust: trust in the stability of a company, trust in the growth of an economy, trust in the prosperity of a region, and so on.
What is a blockchain?
A blockchain is a big long list of records that is publicly available for anyone to browse, verify and, to an extent, add to. These records, known as blocks, hold details of transactions and the blockchain grows when new blocks are added to it. To make sure that the blockchain is valid, each block has a special and unique number, called a hash, and the next block in the chain has the hash of the previous block embedded into it. This means that one block is inextricably linked to another: thus forms the chain. A break in the chain is easy to spot and any attempts to fraudulently manipulate the chain are impossible, as it would require the whole chain to be rebuilt from zero.
A blockchain is a big long list of records that is publicly available for anyone to browse, verify and (to some extent) add to.
What is a hash?
A hash is basically a way of reducing a large amount of data into a more manageable chunk for reference, sorting, and comparison. Hashes come in two flavors, unique hashes and non-unique hashes. Take, for example, these two chunks of data, which are actually sentences, but they could be any data:
- “Above all else, guard your heart, for everything you do flows from it.”
- “Speak up for those who cannot speak for themselves, for the rights of all who are destitute.”
A non-unique hash could be the length of data, 70 for the first and 92 for the second. A quick way to compare to see if the data is the same is to compare this non-unique hash. Is 70 equal to 92? No. Therefore the strings are not the same. Or another non-unique hash could be the first letter, ‘A’ and ‘S’. Are they the same? No. Therefore the strings are different. Non-unique hashes have lots of uses in software engineering, however because they aren’t guaranteed to be unique then there are lots of occasions when you get a hash clash, where the hash you have generated is the same for both strings.
Hashes come in two flavors, unique hashes and non-unique hashes.
For example, this sentence, “Accept that some data will have the same length but will not be equal.” has a length of 70 and also begins with ‘A’. So it generates the same non-unique hashes as the first hash above. Using a unique hash system will avoid this by generating a unique value for any data. Even the slightest change will generate a different hash.
The most commonly used hashes of this type are from the Secure Hash Algorithms family. Until recently SHA-1 (the first hash in that family) was the favorite, however now SHA-256, a variant of SHA-2, is the flavor of the month.
Here are the SHA-256 hashes for our two example sentences:
The first thing to notice is that they are both the same length. This is a key characteristic of the SHA family of hashes. No matter what size of data you put in, the length of the hash will always be the same. The second thing to know is that the hash is always the same for the same data. So every time our first sentence is hashed, the same hash will result. The third thing to notice is just how different the two hashes are, they don’t look even remotely alike. This uniqueness also applies to even the smallest of changes.
If we altered our first sentence so that the word “heart” started with a capital ‘H’ then the SHA-256 becomes:
The new result is clearly very different to the first one. Even small changes cause the hash to be completely unrecognizable to something similar at the source.
Unique hashes are always the same length and a hash is always the same for the same data.
The role of hashes in a blockchain
The way a blockchain maintains its integrity is that a unique hash is generated for each block. That hash is then embedded in the next block. This means the hash for block 101 is part of the data used to generate the hash for block 102, the hash for block 102 is part of the data used to generate the hash for block 103, and so on. If block 101 is changed then its hash will change which will also mean that the hash for 102 will change. This causes a chain reaction and every block becomes invalid.
This all sounds good, but the problem is that calculating a hash doesn’t take long. On a single core Raspberry Pi Zero, it takes less than 0.02 seconds. Even a small computer can calculate thousands of hashes in 1 second. This means it wouldn’t be very hard to recalculate all the hashes in a blockchain in just a few seconds, even if the blockchain was tens of thousands of blocks in length.
So to get around this the hash of a block needs to be hard to calculate. That way the effort needed to recalculate all the hashes would be impossible. To do that the hashes in a blockchain need to be special. The simplest way to make them special is to insist that the hashes start in a certain way, for example with some leading zeros.
The problem with requiring a special hash is that the hash generated for any fixed piece of data is always the same. You can run that hash function a million times and you will always get the same hash. So how is it possible to generate a hash with leading zeros? The answer is to change the data! Built into each block is a special field called a nonce. A nonce is a small counter that is altered on each iteration in an attempt to generate a hash with leading zeros.
For example, if you added a nonce of 62 to our test string like this, “Above all else, guard your heart, for everything you do flows from it.62” we get the hash: de55b08984e800bbe28fb903cc142c650d137455b1e98e9924ff1a1851e44f26.
Did you notice the “62” at the end of the sentence?
If we make the nonce 63 instead, “Above all else, guard your heart, for everything you do flows from it.63” we get 00501a8b6e96e47ee1f76a2735fa19775224981317f39c24835d06b9e5f77422, a hash with two leading zeros.
Even with two leading zeros, the Pi only took 0.2 seconds to find the hash. What about with three leading zeros? That needs a 1811 at the end, but still only took 0.3 seconds to find. Four leading zeros takes around 3.2 seconds, and five leading zeros needs about 12 seconds. All of these calculations are based on a very affordable Raspberry Pi, so to make sure this is secure we are clearly going to need more zeros!
Calculating a hash with just six leading zeros takes my humble Raspberry Pi Zero 9 minutes and 38 seconds. Quite a while, to be sure, but that is without any kind of special optimization with the GPU or similar. It took the Pi 9.5 million attempts to find the hash. Currently, the number of leading zeros needed on a Bitcoin block is 17!
Calculating a hash with six leading zeros takes a Raspberry Pi Zero 9 minutes and 38 seconds - a Bitcoin block currently uses 17 leading zeros!
Proof-of-work and P2P
By insisting that the block has a certain type of hash, additional work is needed to generate that hash. However, since the nonce is stored in the block data it is very easy to verify. That means generating a block is, by design, difficult to produce but easy to check. For cryptocurrencies, this work is rewarded in coins of the respective currency. The upshot of all this hard work is that it becomes infeasible to fraudulently manipulate the blockchain, thus making it secure.
One other important aspect of a blockchain is that it is distributed. A traditional database (DB) is located on a server somewhere and clients connect to the server to get the data out of the DB. We see this everywhere already, as most websites use a DB. The pages are stored in the database and they are retrieved from the DB and served to the web browser. The same is true when you sign in to a web site. Your username and password are stored in the DB and they are checked against the credentials that you enter.
Storing a blockchain on a centralized server is problematic. Who owns the server? Who runs it? Who is responsible for server security? What happens when the server is offline? Part of the role of trust institutions is to run and maintain the servers needed for day-to-day business. However with a blockchain the data is distributed. There isn’t a central server. The data is spread about everywhere. Some peers in this distributed network might have a full copy of the blockchain, others might have fragments, but overall there are multiple copies of the data, available at multiple points. For example, you can browse the Bitcoin blockchain in its entirety at blockchain.info.
More than just cryptocurrencies
As you might have guessed, blockchain technology is bigger than just cryptocurrencies. Every industry where a record of transactions is needed can benefit from blockchains. Some transactions are a matter of public record. For example in many countries, the land registry is a public record. Right now, to find out who owns a plot of land you need to make a request at the appropriate government office and then wait for the reply. But what if the land registry was a blockchain, meaning it was distributed, secure and easily accessible?
Before we envision a utopian world of data transparency and transactional bliss, there are some practical problems. To stop fake transactions entering the blockchain each participant must use Public Key Cryptography. This is great in theory, however in practice it can be a nightmare. The biggest problem with public key cryptography is that if you lose your private key then you lose everything. So if the land registry was a blockchain and you lost your private key you would lose the right to that land… ouch!
If you want more details on public key cryptography then I recommend these articles:
Blockchains are simple in principle and very powerful in practice. Their secure and distributed nature means that traditional barriers are removed, however their reliance on public key cryptography can cause a lot of damage if (or when) a private key is lost.
What do you think? Do blockchains have a future outside of cryptocurrencies? Please let me know in the comments below.