We describe an integrated compile-time and run-time system for efficient shared memory parallel computing on distributed memory machines. The combined system presents the user with a shared memory programming model, with its well-known benefits in terms of ease of use. The run-time system implements a consistent shared memory abstraction using memory access detection and automatic data caching. The compiler improves the efficiency of the shared memory implementation by directing the run-time system to exploit the message passing capabilities of the underlying hardware. To do so, the compiler analyzes shared memory accesses, and transforms the code to insert calls to the run-time system that provide it with the access information computed by the compiler. The run-time system is augmented with the appropriate entry points to use this information to implement bulk data transfer and to reduce the overhead of run-time consistency maintenance. In those cases where the compiler analysis succeeds for the entire program, we demonstrate that the combined system achieves performance comparable to that produced by compilers that directly target message passing. If the compiler analysis is successful only for parts of the program, for instance, because of irregular accesses to some of the arrays, the resulting optimizations can be applied to those parts for which the analysis succeeds. If the compiler analysis fails entirely, we rely on the run-time’s maintenance of shared memory, and thereby avoid the complexity and the limitations of compilers that directly target message passing. The result is a single system that combines efficient support for both regular and irregular memory access patterns.