Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. ColTraIn: Co-located DNN training and inference
 
doctoral thesis

ColTraIn: Co-located DNN training and inference

Drumond Lages De Oliveira, Mario Paulo  
2020

Deep neural network inference accelerators are deployed at scale to accommodate online services, but face low average load because of service demand variability, leading to poor resource utilization. Unfortunately, reclaiming inference idle cycles is difficult, as no other workload can execute on such custom accelerators. DNN training services offer opportunities to reclaim inference accelerator idle cycles. However, the inference services' tight latency constraints and the training algorithms' dependence on floating-point arithmetic limit the opportunities for piggybacking training services on inference accelerators.

In this thesis, we tackle the challenges that prevent inference DNN accelerators from exposing their idle cycles to training services. We first develop an efficient numeric representation that enables DNN training with accuracy similar to single-precision floating point and energy efficiency similar to 8-bit fixed point. Then, we explore the inference accelerator design space to show that, unlike in current latency-optimal platforms, relaxing latency constraints with ALU arrays that are batching-optimized achieves near-optimal throughput for a given area and power envelope. High throughput inference accelerators maximize the opportunities to piggyback training. Finally, we present Equinox, a family of inference accelerators designed to piggyback training. Equinox employs a uniform encoding and a priority hardware scheduler that processes training requests during inference idle cycles without affecting inference tail latency. Overall, we show that exposing accelerator idle cycles to training services uncovers significant computing power for training services with a small overhead for inference accelerators, improving overall datacenter efficiency.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-10265
Author(s)
Drumond Lages De Oliveira, Mario Paulo  
Advisors
Falsafi, Babak  
•
Jaggi, Martin  
Jury

Prof. Christoph Koch (président) ; Prof. Babak Falsafi, Prof. Martin Jaggi (directeurs) ; Prof. James Larus, Prof. Andreas Moshovos, Dr Michael Papamichael (rapporteurs)

Date Issued

2020

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2020-09-25

Thesis number

10265

Total of pages

115

Subjects

datacenters

•

deep neural network accelerators

•

online services

•

systolic array

•

arithmetic representation

•

block floating point

EPFL units
PARSA  
Faculty
IC  
School
IINFCOM  
Doctoral School
EDIC  
Available on Infoscience
September 18, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/171775
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés