Equinox: Training (for Free) on a Custom Inference Accelerator

Drumond Lages De Oliveira, Mario Paulo; Coulon, Louis; Pourhabibi Zarandi, Arash; Yüzügüler, Ahmet Caner; Falsafi, Babak; Jaggi, Martin

doi:10.1145/3466752.3480057

conference paper

Equinox: Training (for Free) on a Custom Inference Accelerator

Drumond Lages De Oliveira, Mario Paulo

•

Coulon, Louis

•

Pourhabibi Zarandi, Arash

October 18, 2021

Proceedings of the 54th International Symposium on Microarchitecture (MICRO'21)

54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21)

DNN inference accelerators executing online services exhibit low average loads because of service demand variability, leading to poor resource utilization. Unfortunately, reclaiming idle inference cycles is difficult as other workloads can not execute on a custom accelerator. With recent proposals for the use of fixed-point arithmetic in training, there are opportunities for training services to piggyback on inference accelerators. We make the observation that a key challenge in doing so is maintaining service-level latency constraints for inference. We show that relaxing latency constraints in an inference accelerator with ALU arrays that are batching-optimized achieves near-optimal throughput for a given area and power envelope while maintaining inference services' tail latency goals. We present Equinox, a custom inference accelerator designed to piggyback training. Equinox employs a uniform arithmetic encoding to accommodate inference and training and a priority hardware scheduler with adaptive batching that interleaves training during idle inference cycles. For a500𝜇𝑠 inference service time constraint, Equinox achieves 6.67× higher throughput than a latency-optimal inference accelerator. Despite not being optimized for training services, Equinox achieves up to 78% of the throughput of a dedicated training accelerator that saturates the available compute resources and DRAM bandwidth. Finally, Equinox’s controller logic incurs less than 1% power and area overhead, while the uniform encoding (to enable training) incurs 13% power and 4% area overhead compared to a fixed-point inference accelerator.

Name

equinox_drumond_MICRO2021.pdf

Type

Preprint

Version

Submitted version (Preprint)

Access type

openaccess

License Condition

Copyright

Size

1.13 MB

Format

Adobe PDF

Checksum (MD5)

0930b1e23042625e8d8c1d5e4273ec58