Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Confluence: unified instruction supply for scale-out servers
 
conference paper

Confluence: unified instruction supply for scale-out servers

Kaynak, Cansu  
•
Grot, Boris  
•
Falsafi, Babak  
2015
Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48
the 48th International Symposium

Multi-megabyte instruction working sets of server workloads defy the capacities of latency-critical instruction-supply components of a core; the instruction cache (L1-I) and the branch target buffer (BTB). Recent work has proposed dedicated prefetching techniques aimed separately at L1-I and BTB, resulting in high metadata costs and/or only modest performance improvements due to the complex control-flow histories required to effectively fill the two components ahead of the core's fetch stream. This work makes the observation that the metadata for both the L1-I and BTB prefetchers require essentially identical information; the control-flow history. While the L1-I prefetcher necessitates the history at block granularity, the BTB requires knowledge of individual branches inside each block. To eliminate redundant metadata and multiple prefetchers, we introduce Confluence -- a frontend design with unified metadata for prefetching into both L1-I and BTB, whose contents are synchronized. Confluence leverages a stream-based prefetcher to proactively fill both components ahead of the core's fetch stream. The prefetcher maintains the control-flow history at block granularity and for each instruction block brought into the L1-I, eagerly inserts the set of branch targets contained in the block into the BTB. Confluence provides 85% of the performance improvement provided by an ideal frontend (with a perfect L1-I and BTB) with 1% area overhead per core, while the highest-performance alternative delivers only 62% of the ideal performance improvement with a per-core area overhead of 8%.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

p166-kaynak.pdf

Type

Publisher's Version

Version

http://purl.org/coar/version/c_970fb48d4fbd8a85

Access type

openaccess

Size

1.19 MB

Format

Adobe PDF

Checksum (MD5)

90101c2ff4e7b1fe68709cb07a82c86e

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés