Equinox: Training (for Free) on a Custom Inference Accelerator

Drumond Lages De Oliveira, Mario Paulo; Coulon, Louis; Pourhabibi Zarandi, Arash; Yüzügüler, Ahmet Caner; Falsafi, Babak; Jaggi, Martin

doi:10.1145/3466752.3480057

Drumond Lages De Oliveira, Mario Paulo; Coulon, Louis; Pourhabibi Zarandi, Arash; Yüzügüler, Ahmet Caner; Falsafi, Babak; Jaggi, Martin

2021

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

DNN inference accelerators executing online services exhibit low average loads because of service demand variability, leading to poor resource utilization. Unfortunately, reclaiming idle inference cycles is difficult as other workloads can not execute on a custom accelerator. With recent proposals for the use of fixed-point arithmetic in training, there are opportunities for training services to piggyback on inference accelerators. We make the observation that a key challenge in doing so is maintaining service-level latency constraints for inference. We show that relaxing latency constraints in an inference accelerator with ALU arrays that are batching-optimized achieves near-optimal throughput for a given area and power envelope while maintaining inference services' tail latency goals. We present Equinox, a custom inference accelerator designed to piggyback training. Equinox employs a uniform arithmetic encoding to accommodate inference and training and a priority hardware scheduler with adaptive batching that interleaves training during idle inference cycles. For a500𝜇𝑠 inference service time constraint, Equinox achieves 6.67× higher throughput than a latency-optimal inference accelerator. Despite not being optimized for training services, Equinox achieves up to 78% of the throughput of a dedicated training accelerator that saturates the available compute resources and DRAM bandwidth. Finally, Equinox’s controller logic incurs less than 1% power and area overhead, while the uniform encoding (to enable training) incurs 13% power and 4% area overhead compared to a fixed-point inference accelerator.

Details

Title Equinox: Training (for Free) on a Custom Inference Accelerator

Author(s) Drumond Lages De Oliveira, Mario Paulo ; Coulon, Louis ; Pourhabibi Zarandi, Arash ; Yüzügüler, Ahmet Caner ; Falsafi, Babak ; Jaggi, Martin

Published in Proceedings of the 54th International Symposium on Microarchitecture (MICRO'21)

Conference 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21), Virtual Event, Greece, October 18–22, 2021

Date 2021-10-18

Publisher ACM

ISBN 978-1-450385-57-2

Keywords

DNN accelerators; DNN inference; Systolic arrays

DOI https://doi.org/10.1145/3466752.3480057

Laboratories PARSA
MLO

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > MLO - Machine Learning and Optimization Laboratory
Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > PARSA - Parallel Systems Architecture Laboratory
Peer-reviewed publications
Conference Papers
Work produced at EPFL

Record creation date 2021-09-22

Files

Abstract

Details

PDF