Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Confluence: unified instruction supply for scale-out servers
 
conference paper

Confluence: unified instruction supply for scale-out servers

Kaynak, Cansu  
•
Grot, Boris  
•
Falsafi, Babak  
2015
Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48
the 48th International Symposium

Multi-megabyte instruction working sets of server workloads defy the capacities of latency-critical instruction-supply components of a core; the instruction cache (L1-I) and the branch target buffer (BTB). Recent work has proposed dedicated prefetching techniques aimed separately at L1-I and BTB, resulting in high metadata costs and/or only modest performance improvements due to the complex control-flow histories required to effectively fill the two components ahead of the core's fetch stream. This work makes the observation that the metadata for both the L1-I and BTB prefetchers require essentially identical information; the control-flow history. While the L1-I prefetcher necessitates the history at block granularity, the BTB requires knowledge of individual branches inside each block. To eliminate redundant metadata and multiple prefetchers, we introduce Confluence -- a frontend design with unified metadata for prefetching into both L1-I and BTB, whose contents are synchronized. Confluence leverages a stream-based prefetcher to proactively fill both components ahead of the core's fetch stream. The prefetcher maintains the control-flow history at block granularity and for each instruction block brought into the L1-I, eagerly inserts the set of branch targets contained in the block into the BTB. Confluence provides 85% of the performance improvement provided by an ideal frontend (with a perfect L1-I and BTB) with 1% area overhead per core, while the highest-performance alternative delivers only 62% of the ideal performance improvement with a per-core area overhead of 8%.

  • Files
  • Details
  • Metrics
Type
conference paper
DOI
10.1145/2830772.2830785
Author(s)
Kaynak, Cansu  
Grot, Boris  
Falsafi, Babak  
Date Issued

2015

Publisher

ACM Press

Publisher place

New York, New York, USA

Published in
Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48
Start page

166

End page

177

Subjects

Instruction streaming

•

Branch prediction

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
PARSA  
Event nameEvent placeEvent date
the 48th International Symposium

Waikiki, Hawaii

05-09 December 2015

Available on Infoscience
August 8, 2016
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/128437
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés