Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Micro BTB: A High Performance and Storage Efficient Last-Level Branch Target Buffer for Servers
 
conference paper

Micro BTB: A High Performance and Storage Efficient Last-Level Branch Target Buffer for Servers

Gupta, Vishal  
•
Panda, Biswabandan
January 1, 2022
Proceedings Of The 19Th Acm International Conference On Computing Frontiers 2022 (Cf 2022)
19th ACM International Conference on Computing Frontiers (CF)

High-performance branch target buffers (BTBs) and the L1I cache are key to high-performance front-end. Modern branch predictors are highly accurate, but with an increase in code footprint in modern-day server workloads, BTB and L1I misses are still frequent. Recent industry trend shows usage of large BTBs (100s of KB per core) that provide performance closer to the ideal BTB along with a decoupled front-end that provides efficient fetch-directed L1I instruction prefetching. On the other hand, techniques proposed by academia, like BTB prefetching and using retire order stream for learning, fail to provide significant performance with modern-day processor cores that are deeper and wider.

We solve the problem fundamentally by increasing the storage density of the last-level BTB. We observe that not all branch instructions require a full branch target address. Instead, we can store the branch target as a branch offset, relative to the branch instruction. Using branch offset enables the BTB to store multiple branches per entry. We reduce the BTB storage in half, but we observe that it increases skewness in the BTB. We revisit the need for skewed indexing and propose a skewed indexed and compressed last-level BTB design called MicroBTB (MBTB) that stores multiple branches per BTB entry. We evaluate MBTB on 100 industry-provided server workloads. A 4K-entry MBTB provides 17.61% performance improvement compared to an 8K-entry baseline BTB design with a storage savings of 47.5KB per core.

  • Details
  • Metrics
Type
conference paper
DOI
10.1145/3528416.3530224
Web of Science ID

WOS:000934076200002

Author(s)
Gupta, Vishal  
Panda, Biswabandan
Date Issued

2022-01-01

Publisher

ASSOC COMPUTING MACHINERY

Publisher place

New York

Published in
Proceedings Of The 19Th Acm International Conference On Computing Frontiers 2022 (Cf 2022)
ISBN of the book

978-1-4503-9338-6

Start page

12

End page

20

Subjects

Computer Science, Theory & Methods

•

Computer Science

•

superscalar cores

•

branch target buffer

•

performance

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
RS3LAB  
Event nameEvent placeEvent date
19th ACM International Conference on Computing Frontiers (CF)

Turin, ITALY

May 17-19, 2022

Available on Infoscience
March 13, 2023
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/195812
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés