Optimizing Transactions for Captured Memory

In this paper, we identify transaction-local memory as a major source of overhead from compiler instrumentation in software transactional memory (STM). Transaction-local memory is memory allocated inside a transaction, which cannot escape (i.e., is captured by) the allocating transaction. Accesses to such memory do not require calls to STM memory access functions (i.e., STM barriers). A compiler unaware of that may translate accesses to captured memory into expensive STM barriers. This presents us opportunities to improve STM performance. Our measurements with the STAMP benchmark suite (version 0.9.9) revealed that as many as 60% of the STM barriers generated by our baseline compiler access captured memory, including 90% of the write barriers and 45% of the read barriers. We propose runtime and compiler optimizations to elide STM barriers to captured memory. These techniques can also elide barriers for accesses to thread-local and read-only data. We implemented those optimizations in the Intel C++ STM compiler. Our experiments with the STAMP benchmark suite on a Intel Dunnington system (with 24 cores in a 4-node SMP system) show that these optimizations can improve performance by to 18% at 16 threads.

Published in:
Proceedings of the 21st annual symposium on Parallelism in algorithms and architectures, 214-222
Presented at:
21st Annual Symposium on Parallelism in Algorithms and Architectures, Calgary, AB, Canada, August 11-13,2009

 Record created 2010-02-02, last modified 2019-01-17

Download fulltextPDF
External link:
Download fulltextURL
Rate this document:

Rate this document:
(Not yet reviewed)