Storing Data on Blockchain

Though we are experiencing crypto winter at the moment, with major coins devalued more than 80% in 2018, the underlying blockchain technology is still exciting. The blockchain provides a democratized trust, distributed and validation protocol that has already disrupted banking and financial services and is on the verge of overhauling other industries like healthcare, supply chain, HR and more.

Despite the hype and its promising future, blockchain still has its shortcomings, the issue of data storage is one of them. The transactions based on the POW consensus for bitcoin, Ethereum, and other cryptocurrencies are extremely slow and therefore not suitable for storage of large data. For example, the deployment of dApp Cryptokitties nearly crippled the Ethereum network

The main problem of storing data on a blockchain is the limitation of the amount of data we can store because of its protocol and the high transaction costs. As a matter of fact, a block in blockchain can store data from a few kilobytes to maybe a few megabytes. For example, the block size of the Bitcoin is only 1Mb. The block size limitation has a serious impact on the scalability of most cryptocurrencies and the bitcoin community is debating whether to increase the block size.

Another issue is the high cost of the transactions. Why is storing data on the blockchain so expensive? It is because the data has to be stored by every full node on the blockchain network. When storing data on the blockchain, we do pay the base price for the transaction itself plus an amount per byte we want to store. If smart contracts are involved, we also pay for the execution time of the smart contract. This is why even storing kilobytes of data on the blockchain can cost you a fortune.

Therefore, it is not viable to store large data files like images and videos on the blockchain. Is there a possible solution to solve the storage issue? Yes, there are quite a few solutions but the most promising one is IPFS.

What is IPFS?

IPFS or Interplanetary File System is an innovative open source project created by the developers at Protocol Labs. It is a peer-to-peer filesharing system that aims to change the way information is distributed across a wide area network. IPFS has innovated some communication protocols and distributed systems and combine them to produce a unique file sharing system.

The current HTTP client-server protocol is location-based addressing which faces some serious drawbacks. First of all, location-based addressing consumes a huge amount of bandwidth, and thus costs us a lot of money and time. On top of that, HTTP downloads a file from a single server at a time, which can be slow if the file is big. In addition, it faces single-point of failure. If the web server is down or being hacked, you will encounter 404 Not Found error. Besides that, it also allows for powerful entities like the governments to block access to certain locations.

On the other hand, IPFS is a content-based addressing system. It is a decentralized way of storing files, similar to BitTorrent. In the IPFS network, every node stores a collection of hashed files. The user can refer to the files by their hashes. The process of storing a file on IPFS is by uploading the file to IPFS, store the file in the working directory, generate a hash for the file and his file will be available on the IPFS network. A user who wants to retrieve any of those files simply needs to call the hash of the file he or she wants. IPFS then search all the nodes in the network and deliver the file to the user when it is found.

IPFS will overcome the aforementioned HTTP weaknesses. As files are stored on the decentralized IPFS network, if a node is down, the files are still available on other nodes, therefore there is no single point of failure. Data transfer will be cheaper and faster as you can get the files from the nearest node. On top of that, it is almost impossible for the powerful entities to block access to the files as the network is decentralized.

The following figure shows the difference between the centralized client-server protocol(HTTP) and the peer-to-peer IPFS protocol.


 [Source: https://www.maxcdn.com/one/visual-glossary/interplanetary-file-system/]

Blockchain and IPFS

IPFS is the perfect match for the blockchain. As I have mentioned, the blockchain is inefficient in storing large amounts of data in a block because all the hashes need to be calculated and verified to preserve the integrity of the blockchain. Therefore, instead of storing data on the blockchain, we simply store the hash of the IPFS file. In this way, we only need to store a small amount of data that is required on the blockchain but get to enjoy the file storage and decentralized peer-to-peer properties of IPFS.

One of the real world use cases of blockchain and IPFS is Nebulis. It is a new project exploring the concept of a distributed DNS that supposedly never fails under a overwhelming access requests. Nebulis uses the Ethereum blockchain and the Interplanetary Filesystem (IPFS), a distributed alternative to HTTP, to register and resolve domain names. We shall see more integration of Blockchain and IPFS in the future.

References

Building the blockchain using JavaScript

The blockchain is a data structure that comprises blocks of data that are chained together using a cryptographic hash. In this article, we shall attempt to build a blockchain prototype using JavaScript. This is a bit technical for non-technical people but should be a piece of cake for computer nerds.

First of all, let’s examine the content of a block. A block consists mainly of the block header containing metadata and a list of transactions appended to the block header. The metadata includes the hash of the previous block, the hash of the block, timestamp, nonce, difficulty and block height. For more information, please refer to my earlier article Blockchain in a Nutshell.

Prior to writing the code, you need to install the following software:

  • Chocolatey
  • Visual Studio Code
  • node.js

The installations are based on Windows 10, but you can do the the same thing easily in Ubuntu. Chocolatey is a software management solution for Windows, Visual Studio Code is a streamlined code editor with support for development operations like debugging, task running and version control and Node.js is an open source server environment.

Let’s start writing the code using Visual Studio Code. The first line is

const SHA256 = require("crypto-js/sha256");

This code meas we are using the JavaScript library of crypto standards. We require the crypto-js library because the sha256 hash function is not available in JavaScript.The crypto module provides cryptographic functionality.

SHA-256 is a cryptographic hash algorithm. A cryptographic hash is a kind of ‘signature’ for a text or a data file. SHA-256 generates an unique 256-bit signature for a text.  SHA256 is always 256 bits long, equivalent to 32 bytes, or 64 bytes in an hexadecimal string format. In blockchain hash, we use hexadecimal string format, so it is 64 characters in length.

Next, we create the block using the following scripts:

class Block {
   constructor(index, timestamp, data, previousHash = '') {
       this.index = index;
       this.previousHash = previousHash;
       this.timestamp = timestamp;
       this.data = data;
       this.hash = this.calculateHash();
   }

The class Block comprises a constructor that initializes the properties of the block. Each block is given an index that tells us at what position the block sits on the chain. We also include a timestamp, some data to store in the block, the hash of the block and the hash of the previous block.

We also need to write a function calculateHash() that creates the hash, as follows:

 calculateHash() {
       return SHA256(this.index + this.previousHash + this.timestamp + JSON.stringify(this.data)).toString();
   }
}

Finally, we write the script that creates the blockchain, as follows:

class Blockchain{
   constructor() {
       this.chain = [this.createGenesisBlock()];
   }

   createGenesisBlock() {
       return new Block(0, "01/01/2017", "Genesis block", "0");
   }

   getLatestBlock() {
       return this.chain[this.chain.length - 1];
   }

   addBlock(newBlock) {
       newBlock.previousHash = this.getLatestBlock().hash;
       newBlock.hash = newBlock.calculateHash();
       this.chain.push(newBlock);
   }
   

Notice that the class blockchain comprises a constructor that consists of a few functions, createGenesisBlock(),  getLatestBlock() and  addBlock(newBlock).

Save the file as myblkchain.js or any name you like.

Now, execute the file in the VS terminal using the following command

node myblkchain.js

The output is as follows:

Let’ examine the output. Notice that each block comprises the properties index, previous hash, data and hash.

The index for the Genesis is always 0 . Notice that the hash of the genesis block and the previous hash of the second block are the same. It is the same for other blocks.

References

Plasma-The Solution for Security and Scalability

The Scalability Issue

Blockchain always faces the trade-off issue between security and scalability. Though their PoW consensus protocol guarantees near perfect security, it also slows down the processing speed significantly.  Currently, the Ethereum processing speed is 15 transactions per second while Bitcoin is 7 transactions per second. Both platform’s processing capacities are nowhere near Visa’s processing speed of 45,000 transactions per second. Furthermore, the increase in the number of dapps deployed on the main Ethereum main chain has caused congestion and slows down transactions tremendously. One of the most famous cases is Cryptokitties, it clogged up the Ethereum network in just a few weeks of deployment due to its unprecedented popularity.

In seeking a viable solution to the scalability issue, Vitalik Buterin and Joseph Poon have joined hands in conceptualizing and developing Plasma, a framework that can scale Ethereum processing power.  Joseph is also the co-founder of the Lightning network, a framework that has greatly increase Bitcoin processing speed. Both plasma and Lightning network are trustless multilayered blockchain networks.

Plasma

Plasma is a  system that comprises the main blockchain and the ‘child blockchains’ that branch out from the main blockchain(aka parent blockchain or root blockchain). The child blockchains can co-exist but function independently from the parent chain and each other. 

The Plasma system allows anyone to create their own child blockchains a.k.a plasma chains with their own smart contracts. Therefore, the Plasma system enables the creation of all kinds of use cases based on different business logic in their smart contracts. To ensure security, the root chain monitor and enforces the state in all the plasma chains and penalize the bad actors if there is proof of frauds.  In this way, the Plasma system makes off-chain transactions possible while relying on the Ethereum main blockchain to maintain its security. 

The Plasma Structure

Actually, the Plasma architecture is like a tree structure with the main Ethereum blockchain as the root. The child blockchains then branch out from the root blockchain, similar to branches grown out from the root of a tree. Every child chain, in turn, can spawn new child chains, the process can go on.  Therefore, the plasma structure constitute a hierarchy of blockchains,  as shown below:

Plasma Blockchain Structure

How does Plasma Works?

Plasma can greatly increase processing speed and throughput on the Ethereum blockchain because it allows off-chain transactions, similar to the payment channels of the Lightning network and other off-chain technologies. All the off-chain techniques take operations away from the main Ethereum blockchain.

State Channels

The concept of Plasma was derived from State Channels but improved on the latter. State channel works by creating an off-chain communication channel (a.k.a state channel )where transactions are not sent to the smart contract on the main chain, instead, they are sent through the Internet without touching the main blockchain.  It is only after all the transactions have been completed (for example, a crypto game has finished) that the final state is sent to the smart contract on the main chain, closing the channel in the process. The smart contract will check the legitimacy of the transactions and release the asset (such as some ETH or a prize) to the recipient. 

The state channel technique can improve scalability because it can reduce the number of transactions on the main blockchain. For example, a crypto chess game played between two players may involve hundreds of moves, which means hundreds of transactions will be executed on the Ethereum blockchain. However, if we use the state channel, we need to execute only 3 transactions that include registration of the players to initiate the game, submission of the final state to the blockchain and closing the channel. 

Steps in Implementing Plasma

Plasma works in a similar way but with a different approach. Instead of creating the channels, it creates the child blockchains, as illustrated earlier. Smart contracts are created on the main Ethereum blockchain(The root chain) and they  define the rules in the child blockchains. In other words, the smart contract serves as the root of the child blockchains. The child blockchains can employ their own consensus algorithm, such as proof of stake.  The blocks validator will submit the state of the child chain to the Root Chain smart contract periodically. The smart contract will register the state of each Child Chain in the form of block hashes of the Child Chain.

We can illustrate how Plasma works by examining a crypto game such as crytant crab or cryptokitties. The smart contract on the main chain will set the rules of the game, then deploy the actual game application smart contracts on the child-chain, which contains all of the game logic and rules.  The game assets such as characters or collectibles are created on the Ethereum main chain and then transferred onto the child-chain using the plasma root.  When the players play the games, all the executions are confined to the child chain, without interacting with the root chain. 

Plasma Exits

Plasma Exits is a  security mechanism behind Plasma that allows users in a Plasma Chain to stop participating in the chain, and move their funds or assets back to the root chain. When a user wishes to exit a particular child chain, he or she needs to submit an exit application.  The application is not immediately approved because a proof is required. This waiting period is called the challenge period, which means anyone can challenge the user’s claim by submitting a fraud proof. If the challenge is not valid or there is no challenge, the application will be approved and the user can exit and collect back his assets or funds.

Plasma is still evolving and now the Plasma team has come out with the improved version of Plasma known as Plasma cash.  We shall discuss this new version in coming articles.

References

ETHKL #5 : Security Audits & Scaling

The meetup was at HelloGold office , KL on Friday 23, Nov.

Speakers:

  1. Petar Tsankov-Chief Scientist/co-founder of ChainSecurity AG & Senior Researcher at the ICE center. ETH Zurich. 
  2. Andras Kristof- Founder and Advisor of Akomba Labs
  3. Lai Ying Tong- Researcher at Ethereum Foundation
  4. Ken Chan

The session began with Ken Chan introducing the audience about Zero-Knowledge Proofs. I was sure many developers among the audience understand what it is but the concept sounds strange to me. Fortunately, Ken was good in demonstrating the concept by using the scenario of the American presidential election involving Trump and Clinton as well as a “live demo” with Harith of HelloGold as the co-actor.

Apparently, the Zero-knowledge proof method, or more exactly zk_SNARKS, is a consensus protocol used by Zcash to validate its shielded transactions that are fully encrypted on its blockchain. According to Zcash(https://z.cash/technology/zksnarks/), the acronym zk-SNARK stands for “Zero-Knowledge Succinct Non-Interactive Argument of Knowledge,” and refers to a proof construction where one can prove possession of certain information, e.g. a secret key, without revealing that information, and without any interaction between the prover and the verifier.

Zcash further pointed out that “Zero-knowledge” proofs allow one party (the prover) to prove to another (the verifier) that a statement is true, without revealing any information beyond the validity of the statement itself. For example, given the hash of a random number, the prover could convince the verifier that there indeed exists a number with this hash value, without revealing what it is.

Ken illustrated the process of Succinct and Non-interactive using a diagram, where the prover begins by generating a proof string and then the verifier needs to verify the proof string, as shown below:

The above process is actually more complex than illustrated in the diagram. According to Zcash,  zk-SNARKs work by first turning what you want to prove into an equivalent form about knowing a solution to an algebraic equation, as follows:

Computation → Arithmetic Circuit → R1CS → QAP → zk-SNARK

Here is an example of what an arithmetic circuit looks like for computing the expression (a+b)*(b*c) :

Diagram Adapted from Zcash

The output is then verified by the verifier. However, Ken pointed out that the process might be compromised by some malicious codes which he called toxic waste that produce false proofs. Ken concluded with the following points:

Why ZK SNARKs?

  • Strong cryptography research by Zcash team
  • Math-based- not coin joining
  • Short proofs

Why not ZK SNARKS?

  • Trusted setup for every contract
  • No transparency for counterfeiting
  • Computationally expensive

Next, Dr.Petar from ChainSecurity discussed the importance of security audit. His topic was “How Not to get Hacked”. ChainSecurity is a smart contract auditing platform. They can identify security vulnerabilities and certify the functional correctness of smart contracts and blockchain projects. 

ChainSecuity has developed an Audit platform that can perform Automated Security Check on smart contracts. This platform can test and audit both Ethereum smart contracts (Security Scanner)and the Hyperledger Fabric chaincode(Chaincode Scanner).

According to Dr.Petar,  more USD$1 billion have been stolen this year due to crypto hacks. He stressed that writing secure smart contracts is difficult.  Developers might fail to see bugs and security flaws, therefore we need to audit the smart contracts.  However, currently, most audits are done manually and tend to miss many issues. Furthermore, in the post-development stage, most anomalies are invisible. 

To work around the aforementioned issues, ChainSecurity has developed some AI-based automated tools to help in every stage of smart contract lifeline. At the developmental stage, the automated tools will assist in certifying the correctness of the code. At code audit stage, the machine-checked audit will generate the audit report by committing the smart contract onto the Audit platform which runs security auditing using the security scanner, the symbolic verifier and the AI-based Tester. Finally, in the post development stage, there are monitoring tools to help track the smart contract health.

More information on security audit can be found on ChainSecurity website.

The final topic was scaling presented by Andras Kristof and Lai Ying Tong.  This is a topic where all Ethereum enthusiasts are concerned about. According to the speakers, the solution is to develop a two-layer architecture. Layer 1 is called serenity which comprises sharding, casper, random beacon, and p2p networking. Layer 2 comprises payment channels, state channels, sidechains, and plasma. The solution also comprises succinct proofs using snarks and starks. Furthermore, there are more integrations that include swarm, light clients and client optimizations.

In more details, the layer 1(serenity) structure includes the Main Chain(provides staking and PoW), the Beacon Chain((provides random number and PoS), the Shard Chain(provides data) and VM(provides state execution result).

For the payment channels, there are two channels, the Open Channel and the Close Channel. The transactions include blockchain transactions and Off-chain payments. Besides that, Lai also spoke on payment channels on the lightning network. The layer2 solutions are to move state-modifying operations off-chain, which include payment channels and state channels.

Besides that, Lai also covered topics on sidechains, plasma mvp, morevp, swarm, light clients and more. These are heavy topics and I shall discuss them in future articles.

Hyperledger Fabric Architecture Part 2

In my article ” Hyperledger Fabric Architecture Part 1“,  you have learned about the client applications, endorsing peers and committing peers as well as well as the ordering service. We have also discussed the transaction workflow and how consensus is reached. In this article, I shall explain the channels and membership service provider.

Channels

In permissionless blockchains like Bitcoin and Ethereum, all peers share and have access to the same ledger. However, this kind of blockchain may not be suitable for business applications. For example, a supplier may want to set different prices for different wholesalers, and he would not want everyone in the supply chain to view this information. In this scenario, he or she will prefer to deal with the different wholesalers separately. To solve this issue, Hyperledger Fabric came out with the novel concept of channels that allow private transactions within the same network.

Channels partition the Fabric network in such as way that only the stakeholders can view the transactions. In this way, organizations are able to utilize the same network while maintaining separation between multiple blockchains.  The mechanism works by delegating transactions to different ledgers. Members of the particular channel can communicate and transact privately. Other members of the network cannot see the transactions on that channel. The concept is illustrated in the following diagram:

The diagram above shows two channels, channel 1 and channel 2. Each channel has its own application, peers, ledger and smart contract (chaincode). In this example, channel 1 has two peers, P1 and P2 and channel 2 also has two peers, P3 and P4.  Ordering service is the same across any network and channel.

Application 1 will send transaction proposals to channel 1. P1 and P2 will then simulate and commit transactions to ledger L1 based on chaincode S1. On the other hand, Application 2 will send transaction proposals to channel 2. P3 and P4 will simulate and commit transactions to ledger L2 based on chaincode S2. 

Though our example shows peers belong to two distinct channels, in actual case peers can belong to multiple networks or channels. Peers that participate in multiple channels simulate and commit transactions to different ledgers. In addition, the same chaincode can be applied to multiple channels.

Membership Service Provider (MSP)

Hyperledger Fabric is a permissioned blockchain, therefore, every user needs permission to join the Fabric network. In order to obtain permission to join the Fabric blockchain network, the identity of every user must be validated and authenticated. The identity is  important because it determines the exact permissions over resources and access to information that user has in the Fabric network.

To verify an identity, we must employ a trusted authority. In Hyperledger Fabric, the trusted authority is the membership service provider (MSP).  The membership service provider is a component that defines the rules in which identities are validated, authenticated, and allowed access to a network. The MSP manages user IDs and authenticates clients who want to join the network. This includes providing credentials for these clients to propose transactions, defining specific roles a member might play and defining access privileges in the context of a network and channel.

The MSP uses a Certificate Authority to authenticate or revokes user certificates upon confirmed identity. In Fabric, the default Certificate Authority interface used for the MSP is the Fabric-CA API. However, organizations can choose to implement an External Certificate Authority of their choice.  Hyperledger Fabric supports many types of External Certificate Authority interfaces. As a result, a single Hyperledger Fabric network can be controlled by multiple MSPs.

The Authentication Process

In the authentication process,  the Fabric-CA identifies the application, peer, endorser, and orderer identities, and verifies them. Next, a signature is generated through the use of a Signing Algorithm and a Signature Verification Algorithm.  The Signing Algorithm utilizes the credentials of the entities associated with their respective identities and outputs an endorsement. The generated signature is a byte array that is bound to a specific identity.

In the following step, the Signature Verification Algorithm will accept the request(to join the network) if the signature byte array matches a valid signature for the inputted endorsement, or reject the request if not. If the user is accepted, he or she can see the transactions in the network and perform transactions with other actors in the network. On the other hand, if the user is rejected, he or she will not able to submit transactions to the network or view any previous transactions.

We shall explore chaincode in the next article.