Optimizing TCP Receive Performance
The performance of the networking stack in an operating system depends on the overhead incurred by two of its components: the per-byte overhead incurred in data-touching operations, and the per-packet overhead of protocol processing and other operating system routines. While many mechanisms exist for reducing the per-byte and per-packet overheads for packet transmission in TCP, little progress has been made for addressing these issues for the performance of TCP receive. Conventionally, it has been shown that the performance of TCP receive is dominated by the cost of the per-byte data touching operations. However, in this paper we show that the per-packet cost of TCP processing dominates the TCP receive performance in modern operating systems. We show that the high per-packet overhead is caused by the forced one-to-one correspondence between the host packet unit and the network packet unit. We propose a restructuring of the TCP stack which allows the operating system to amortize the cost of per-packet operations over multiple network packets. By restructuring the TCP stack in this manner, we demonstrate performance improvements of 45-67% for TCP receive workloads involving bulk data transfers in native Linux, and of 86% for similar workloads running in a virtualized Linux guest OS running on Xen.