Cerebros: Evading the RPC Tax in Datacenters
The emerging paradigm of microservices decomposes online services into fine-grained software modules frequently communicating over the datacenter network, often using Remote Procedure Calls (RPCs). Ongoing advancements in the network stack have exposed the RPC layer itself as a bottleneck, that we show accounts for 40–90% of a microservice's total execution cycles. We break down the underlying modules that comprise production RPC layers and demonstrate, based on prior evidence, that CPUs can only expect limited improvements for such tasks, mandating a shift to hardware to remove the RPC layer as a limiter of microservice performance. Although recently proposed accelerators can efficiently handle a portion of the RPC layer, their overall benefit is limited by unnecessary CPU involvement, which occurs because the accelerators are architected as co-processors under the CPU's control. Instead, we show that conclusively removing the RPC layer bottleneck requires all of the RPC layer's modules to be executed by a NIC-attached hardware accelerator. We introduce Cerebros, a dedicated RPC processor that executes the Apache Thrift RPC layer and acts as an intermediary stage between the NIC and the microservice running on the CPU. Our evaluation using the DeathStarBench microservice suite shows that Cerebros reduces the CPU cycles spent in the RPC layer by 37–64×, yielding a 1.8–14× reduction in total cycles expended per microservice request.
Cerebros_Preprint_MICRO21.pdf
Preprint
openaccess
Copyright
782.82 KB
Adobe PDF
3cbd2db23ab0183b3a4e0b8864567320
Cerebros_ACM.pdf
Publisher's version
restricted
Copyright
902.03 KB
Adobe PDF
c26e688339bfb5a10ccec9e62a9fa951