Name: Zhi Chen
Date: May 15, 2018
Time: 2:30pm
Location: Donald Bren Hall 3011
Committee: Professor Alex Nicolau (Chair), Alex Veidenbaum, Nikil Dutt
Abstract:
Modern processors continue to aggressively scale down the feature size and reduce voltage levels to run faster and be more energy efficient. However, this trend also poses significant reliability concern as it makes transistors more susceptible to soft errors. Soft errors are transient. Although they don’t impair the computing systems permanently, these errors can corrupt the output of a program or even crash the entire system. Hardware or software redundant techniques could be used to detect errors during the execution of a program. However, hardware redundancy, e.g. DMR (dual-modular redundancy) and TMR (triple-modular redundancy), leads to significant area overhead and very high energy cost. Software redundancy, e.g. instruction duplication, has lower performance and energy penalty and virtually no hardware cost by sacrificing a small degree of error coverage. Yet commodity processors generally don’t require “five-nines” reliability as they are not mission-critical. Instead, performance and energy consumption have more priority. This dissertation proposes a novel approach to instruction duplication, which exploits the redundancy within SIMD instructions. The key idea is to pack the original data and its duplicate in the different lanes of the same vector register instead of executing two scalar instructions separately as these registers are underutilized on most applications. The proposed solution is implemented in the LLVM compiler as a stand-alone pass. Evaluation on a host of benchmarks reveal that proposed SIMD-based error detection technique causes much less performance, code size, and energy overheads.
This dissertation further extends the proposed approach as a countermeasure to protect cryptographic algorithms. These algorithms are widely adopted in modern processors and embedded systems to protect information. A number of popular cryptographic algorithms in the Libgcrypt library are protected using the SIMD-based instruction duplication technique. A large amount of errors are injected to these algorithms. The results show that almost all injected faults can be detected with reasonable performance and code size cost.