A Hybrid HW-SW Approach for Intermittent Error Mitigation in Streaming-Based Embedded Systems
Recent advances in process technology augment the systems-on-chip (SoCs) functionality per unit area with the substantial decrease of device features. However, features abatement triggers new reliability issues such as the single-event multi-bit upset (SMU) failure rates augmentation. To mitigate these failure rates, we propose a novel error mitigation mechanism that relies on a hybrid HW-SW technique. In our proposal, we enforce SoC SRAMs by implementing a fault-tolerant memory buffer with minimal capacity to ensure error-free operation. We utilize this buffer to temporarily store a portion of the stored data, named a data chunk, that is used to restore another data chunk in a fully demand-driven way, in case the latter is faulty. We formulate the buffer and data chunk size selection as an optimization problem that targets energy overhead minimization, given that timing and area overheads are restricted with hard constraints decided beforehand by the system designers. We show that our proposed mitigation scheme achieves full error mitigation in a real SoC platform with an average of 10.1% energy overhead with respect to a base-line system operation, while guaranteeing all the designtime constraints.