EdgeAI-Aware Design of In-Memory Computing Architectures

Rios, Marco Antonio

doi:10.5075/epfl-thesis-10393

doctoral thesis

EdgeAI-Aware Design of In-Memory Computing Architectures

2024

Driven by the demand for real-time processing and the need to minimize latency in AI algorithms, edge computing has experienced remarkable progress. Decision-making AI applications stand out for their heavy reliance on data-centric operations, predominantly characterized by matrix and vector manipulations. Consequently, due to these computational patterns, conventional computer architectures that separate CPU and memory units (Von-Neumann model) face limitations in meeting AI's performance and power requirements.

In contrast, IMC proposes to execute the workloads directly from memory access and has emerged as a potential solution to overcome these limitations. Particularly, SRAM-based architectures have demonstrated the ability to perform digital bit-wise operations by simultaneously accessing two words in memory (i.e., bit-line computing). However, despite their promising capabilities, these architectures also pose challenges in the context of AI computations. This thesis effectively bridges the gap between AI requirements and edge hardware constraints. It exposes the proposed architectures to these workloads and addresses their limitations, enabling them to be an efficient platform for executing edgeAI applications. The research focuses on circuit innovations that enhance in-memory linear algebra operations' speed, efficiency, and reliability while introducing innovative optimization methods for AI applications fully leveraged by the architectures.

Firstly, since bit-line computing architectures extract bit-wise operations in each memory access, they require several cycles to perform multiplications using the shift-add algorithm. In contrast, this thesis proposed accelerated signed two's complement multiplications with minimal area overhead, supporting overflow on multiplications and accumulation to enhance computational efficiency and accuracy.

Energy consumption is a critical concern in SRAM-based architectures, prompting the proposal of a hybrid bit-line computing architecture that combines SRAM and RRAM. This architecture leverages specific memory features to achieve higher energy efficiency. Moreover, voltage scaling, a well-established technique for energy optimization, introduces challenges to performing operations within memory due to increased vulnerability to errors at lower voltage levels. To address this, the thesis proposes strategies to increase read margins and introduces an error correction and mitigation strategy compatible with bit-line computing.

Notably, enabling edgeAI necessitates a comprehensive co-design approach encompassing hardware and software considerations. The solutions presented in this thesis at the system-architecture level enable efficient data management on the proposed architectures, allowing for fine-grained optimizations of convolutional neural networks. These optimizations directly contribute to increasing the efficiency of the overall system. An end-to-end framework is also proposed to support decision-making AI algorithms, integrating advanced data compression and real-time decompression techniques.

Name

EPFL_TH10393.pdf

Type

n/a

Access type

openaccess

License Condition

copyright

Size

19.36 MB

Format

Adobe PDF

Checksum (MD5)

376d8b17eaefef04d24acce181c27cb0