Beyond RAID 6 --- Efficient Error Correcting Code for Dual-Disk Corruption
Author
Moussa, MohamadIssue Date
2018Keywords
erasure codeserror correcting codes
fault tolerance
RAID 6
reed solomon code
silent data corruption
Advisor
Rychlik, Marek
Metadata
Show full item recordPublisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
An error correcting code is a technique of adding extra information to a message such that it can be recovered even when some of its parts are corrupted due to a noisy channel. The three main tasks of an error correcting code is to detect errors, locate them and finally recover their original data by finding the error values. Erasure code is a special type of an error correcting code in which the locations of the errors are given, and its only task is to correct those given errors. Replication of data is one example of an erasure code, which is very efficient, in regards of computational time, which recovers a failed drive by using one of its replica. However, the drawback of this technique is the high storage overhead. Another family of error correcting codes, called Reed-Solomon codes, is known to be very efficient in regards of the storage overhead. The drawback of using Reed-Solomon code is its high computational cost. RAID 6 system implements a Reed-Solomon code efficiently using two extra parity drives, in order to protect the given set of K data drives. RAID 6 is able to correct Z erasures (errors at known locations) and E random-errors (errors at unknown location) provided that Z+2E < 3. In our paper, we describe a replacement for RAID 6, based on a new linear, systematic code, which detects and corrects any combination of Z erasures and E errors provided that Z+2E < 5. In addition, we investigate some scenarios for error correction beyond the code's minimum distance, using list decoding. We describe a decoding algorithm with quasi-logarithmic time complexity, when parallel processing is used: ~ O(log N) where N is the number of disks in the array (similar to RAID 6). By comparison, the error correcting code implemented by RAID 6 allows error detection and correction only when (E,Z)=(1,0), (0,1), or (0,2). Hence, when in degraded mode (i.e., when Z>0), RAID 6 loses its ability for detecting and correcting random errors (i.e., E=0), leading to data loss known as silent data corruption. In contrast, the proposed code does not experience silent data corruption unless Z>2. The aforementioned properties of our code, the relative simplicity of implementation, vastly improved data protection, and low computational complexity of the decoding algorithm, make our code a natural successor to RAID6. As this code is based on the use of quintuple parity, this justifies the name PentaRAID for the RAID technology implementing the ideas of the current paper.Type
textElectronic Dissertation
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeMathematics