Keep All in Memory with Maxwell: a Near-SRAM Computing Architecture for Edge AI Applications
Recent advances in machine learning have dramatically increased model size and computational requirements, increasingly straining computing system capabilities. This tension is particularly acute for resource-constrained edge scenarios, for which careful hardware acceleration of computing-intensive patterns and the optimization of data reuse to limit costly data transfers are key. Addressing these challenges, we herein present a novel compute-memory architecture named Maxwell, which supports the execution of entire inference algorithms nearmemory. Leveraging the regular structure of memory arrays, Maxwell achieves a high degree of parallelization for both convolutional (CONV) and fully connected (FC) layers, while supporting fine-grained quantization. Additionally, the architecture effectively minimizes data movements by performing nearmemory all intermediate computations, such as scaling, quantization, activation functions, and pooling layers. We demonstrate that such an approach leads to up to 8.5x speed-ups, with respect to state-of-the-art near-memory architectures that require the transfer of data at the boundaries of CONV and/or FC layers. Accelerations of up to 250x with respect to software execution are observed on an edge platform that integrates Maxwell logic and a 32-bit RISC-V core, with Maxwell-specific components only accounting for 10.6% of the memory area.
Maxwell_camera_ready_ISQED25.pdf
Main Document
openaccess
N/A
1.43 MB
Adobe PDF
281e9e3a0c9f892a773e3b06011d40bd