ColTraIn: Co-located DNN training and inference

Drumond Lages De Oliveira, Mario Paulo

doi:10.5075/epfl-thesis-10265

doctoral thesis

ColTraIn: Co-located DNN training and inference

2020

Deep neural network inference accelerators are deployed at scale to accommodate online services, but face low average load because of service demand variability, leading to poor resource utilization. Unfortunately, reclaiming inference idle cycles is difficult, as no other workload can execute on such custom accelerators. DNN training services offer opportunities to reclaim inference accelerator idle cycles. However, the inference services' tight latency constraints and the training algorithms' dependence on floating-point arithmetic limit the opportunities for piggybacking training services on inference accelerators.

In this thesis, we tackle the challenges that prevent inference DNN accelerators from exposing their idle cycles to training services. We first develop an efficient numeric representation that enables DNN training with accuracy similar to single-precision floating point and energy efficiency similar to 8-bit fixed point. Then, we explore the inference accelerator design space to show that, unlike in current latency-optimal platforms, relaxing latency constraints with ALU arrays that are batching-optimized achieves near-optimal throughput for a given area and power envelope. High throughput inference accelerators maximize the opportunities to piggyback training. Finally, we present Equinox, a family of inference accelerators designed to piggyback training. Equinox employs a uniform encoding and a priority hardware scheduler that processes training requests during inference idle cycles without affecting inference tail latency. Overall, we show that exposing accelerator idle cycles to training services uncovers significant computing power for training services with a small overhead for inference accelerators, improving overall datacenter efficiency.

Type

doctoral thesis

DOI

10.5075/epfl-thesis-10265

Author(s)

Drumond Lages De Oliveira, Mario Paulo

Advisors

Falsafi, Babak

•

Jaggi, Martin

Jury

Prof. Christoph Koch (président) ; Prof. Babak Falsafi, Prof. Martin Jaggi (directeurs) ; Prof. James Larus, Prof. Andreas Moshovos, Dr Michael Papamichael (rapporteurs)

Date Issued

2020

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2020-09-25

Thesis number

10265

Total of pages

115

Subjects

datacenters

•

deep neural network accelerators

•

online services

•

systolic array

•

arithmetic representation

•

block floating point

EPFL units

Faculty

School

Doctoral School

Available on Infoscience

September 18, 2020

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/171775