Files

Abstract

Modern data-center network operating systems rely on proprietary user-space daemons wrapping SDKs from switch vendors. Linux-based variants of these operating systems have benefited from increasing and simplified dataplane offloading support in recent years: kernel resources such as routes and next hops are offloaded to hardware and the kernel can, e.g., learn new MAC entries seen from the hardware forwarding plane. However, the Linux kernel in these operating systems has to be extended and customized as it still lacks constructs (constructs exposed to user-space or in-kernel constructs) needed for complete control plane processing and dataplane offloading. Managing limited hardware resources and keeping the kernel and hardware forwarding planes equivalent are two examples of challenges where the lack of appropriate constructs forces switch operating systems to use user-space solutions coupled with proprietary SDKs and custom kernel modifications. Filling the gaps, i.e., designing and adding the missing constructs, would enable faster control plane processing (less user-kernel context switches) and faster dataplane configura- tion. It would also encourage in-tree kernel drivers from hardware vendors instead of the current closed-source user-space SDKs which would also improve performance while estab- lishing Linux as the standardized API for network switches. This thesis explores the different designs, performance challenges and trade-offs of completing the in-kernel switching API, starting from switch port configuration, faster control path packet processing and ending with in-kernel ASIC resource management. As most hardware vendors are not yet ready to open their drivers in the kernel, workarounds to still provide support for vendors’ SDKs through the same in-kernel API are presented as well. A user-space switch port driver for Mellanox switches was ported to kernel space in order to accelerate control plane processing and bootstrap the in-kernel switch port configuration design. A prefetching scheme which hides PCI latency was designed to manage limited and shared ASIC resources with synchronous feedback to user-space daemons in case of exhausted resources.

Details

PDF