The Peregrine RPC system provides performance very close to the optimum allowed by the hardware limits, while still supporting the complete RPC model. Implemented on an Ethernet network of Sun-3/60 workstations, a null RPC between two user-level threads executing on separate machines requires 573 microseconds. This time compares well with the fastest network RPC times reported in the literature, ranging from about 1100 to 2600 microseconds, and is only 309 microseconds above the measured hardware latency for transmitting the call and result packets in our environment. For large multi-packet RPC calls, the Peregrine user-level data transfer rate reaches 8.9 megabits per second, approaching the Ethernet’s 10 megabit per second network transmission rate. Between two user-level threads on the same machine, a null RPC requires 149 microseconds. This paper identifies some of the key performance optimizations used in Peregrine, and quantitatively assesses their benefits.