Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Reports, Documentation, and Standards
  4. The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving
 
research report

The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving

Kim, Kyoung-Min  
•
Hong, Kijae
•
Gulcehre, Caglar  orcid-logo
Show more
November 19, 2024

The growing usage of Large Language Models (LLMs) highlights the demands and challenges in scalable LLM inference systems, affecting deployment and development processes. On the deployment side, there is a lack of comprehensive analysis on the conditions under which a particular scheduler performs better or worse, with performance varying substantially across different schedulers, hardware, models, and workloads. Manually testing each configuration on GPUs can be prohibitively expensive. On the development side, unpredictable performance and unknown upper limits can lead to inconclusive trial-and-error processes, consuming resources on ideas that end up ineffective. To address these challenges, we introduce INFERMAX, an analytical framework that uses inference cost models to compare various schedulers, including an optimal scheduler formulated as a constraint satisfaction problem (CSP) to establish an upper bound on performance. Our framework offers in-depth analysis and raises essential questions, challenging assumptions and exploring opportunities for more efficient scheduling. Notably, our findings indicate that preempting requests can reduce GPU costs by 30% compared to avoiding preemptions at all. We believe our methods and insights will facilitate the cost-effective deployment and development of scalable, efficient inference systems and pave the way for cost-based scheduling.

  • Files
  • Details
  • Metrics
Type
research report
Author(s)
Kim, Kyoung-Min  

EPFL

Hong, Kijae

CERES TECHNOLOGIES

Gulcehre, Caglar  orcid-logo

EPFL

Ailamaki, Anastasia  

EPFL

Date Issued

2024-11-19

Total of pages

16

URL

ArXiv

https://arxiv.org/abs/2411.07447
Editorial or Peer reviewed

NON-REVIEWED

Written at

EPFL

EPFL units
DIAS  
CLAIRE  
Available on Infoscience
April 7, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/248784
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés