Confluence: unified instruction supply for scale-out servers

Kaynak, Cansu; Grot, Boris; Falsafi, Babak

doi:10.1145/2830772.2830785

Kaynak, Cansu; Grot, Boris; Falsafi, Babak

2015

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Multi-megabyte instruction working sets of server workloads defy the capacities of latency-critical instruction-supply components of a core; the instruction cache (L1-I) and the branch target buffer (BTB). Recent work has proposed dedicated prefetching techniques aimed separately at L1-I and BTB, resulting in high metadata costs and/or only modest performance improvements due to the complex control-flow histories required to effectively fill the two components ahead of the core's fetch stream. This work makes the observation that the metadata for both the L1-I and BTB prefetchers require essentially identical information; the control-flow history. While the L1-I prefetcher necessitates the history at block granularity, the BTB requires knowledge of individual branches inside each block. To eliminate redundant metadata and multiple prefetchers, we introduce Confluence -- a frontend design with unified metadata for prefetching into both L1-I and BTB, whose contents are synchronized. Confluence leverages a stream-based prefetcher to proactively fill both components ahead of the core's fetch stream. The prefetcher maintains the control-flow history at block granularity and for each instruction block brought into the L1-I, eagerly inserts the set of branch targets contained in the block into the BTB. Confluence provides 85% of the performance improvement provided by an ideal frontend (with a perfect L1-I and BTB) with 1% area overhead per core, while the highest-performance alternative delivers only 62% of the ideal performance improvement with a per-core area overhead of 8%.

Details

Title Confluence: unified instruction supply for scale-out servers

Author(s) Kaynak, Cansu ; Grot, Boris ; Falsafi, Babak

Published in Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48

Pages 166-177

Conference the 48th International Symposium, Waikiki, Hawaii, 05-09 December 2015

Date 2015

Publisher New York, New York, USA, ACM Press

Keywords

Instruction streaming; Branch prediction

DOI https://doi.org/10.1145/2830772.2830785

Laboratories PARSA

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > PARSA - Parallel Systems Architecture Laboratory
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Record creation date 2016-08-08

Files

Abstract

Details

PDF