Title: Dynamic-EC: an efficient dynamic erasure coding method for permissioned blockchain systems
Conference (Frontiers of Computer Science’24): Link
Journal (): Link
This paper designs a erasure-coding based permission blockchain system. It reduces the storage overhead from prior works (BFT-Store, ICDE’20, PartitionChain, TKDE’23) by trading off the fault tolerance. Its motivation is that prior works always tolerate f malicious nodes, and use a fixed erasure coding scheme that tolerates f malicious nodes. In reality, there are not as many malicious nodes as f, where in most cases, there are only a very small number of nodes, such that it can use another erasure code with lower storage overhead for further storage saving. Dynamic-EC measure the number of malicious nodes and risky nodes based on a heuristic algorithm, and select a erasure coding scheme to encode the data. Experiments show that it can reduce the storage overhead over BFT-Store and PermissionChain.
In my understanding, the only difference between Dynamic-EC and PartitionChain is the algorithm for voting the malicious nodes, such that the malicious nodes can be removed from the system. It also adds another set of nodes called “risky nodes”, meaning that these nodes can also be categorized unavailable (but they will not be removed from the system).
I don’t agree with the problem proposed by this paper, which to me is an artificial problem. It says that the number of malicious nodes can be smaller than f, then it can use a lower redundancy code for encoding. However, in reality, the number of malicious nodes varies and it’s possible that the number goes up to f. The design is actually trading-off fault tolerance for storage savings. However, we don’t want to trade-off fault tolerance for storage savings, as fault tolerance (i.e., the # of tolerated failures) is often a required metric in practice. If the data are already encoded in lower redundancy codes with fault tolerance lower than f (e.g., 1 or 2), then the data are not recoverable even if the number of malicious nodes is smaller than f (e.g., 3), which leads to data loss.
The experiments does not cover the “Dynamic” in encoding, such that the data is encoded in a fixed coding scheme (although it has lower redundancy than BFT-Store). Once the data are written to disk, the fault tolerance is fixed. It does not apply any redundancy transitioning to other coding schemes, based on the current measured number of malicious nodes (it’s main design). It means that Dynamic-EC may store data with multiple coding schemes when the number of failures varies from time to time. This is not evaluated in the experiments. And I don’t think this is a correct design, as mixing multiple coding schemes does not guarantee the system can tolerate the maximum number of malicious nodes.