Files

Abstract

Despite the high costs of acquisition and maintenance of modern data centers, machine resource utilization is often low. Servers running online interactive services are over-provisioned to support peak load (which only occurs for a fraction of the time), due to business continuity and fault-tolerance plans that dictate these services tolerate major unforeseen events (e.g., the failure of another data center). Underutilized computing resources correlate with reduced data center efficiency and capital losses incurred by operators. Colocating different types of workloads is a promising solution to increase resource utilization. The challenges are protecting interactive services from resource interference and handling limited resource availability: if a best-effort workload directly competes for resources with an interactive service, the latter can fail to meet its service-level objectives (SLOs), impacting user experience and resulting in lost revenue. In this dissertation we focus on how to mitigate CPU interference and limited memory availability when colocating interactive services with batch workloads such as data-parallel applications. Existing work on mitigating CPU interference assumes that progress and performance metrics are available in real-time, and that hardware support for low-level isolation mechanisms is prevalent, neither of which are always true in commercial environments. Furthermore, previous work on improving resource allocation in clusters does not attempt to over-commit memory due to concerns of unpredictable performance degradation and sporadic failures. We first focus on the colocation of open-source in-memory interactive services, and conduct a thorough investigation on how they are impacted by batch jobs. We find that existing isolation techniques do not prevent performance degradation, and explain in detail why they are insufficient. Second, we study the behavior of a large commercial-scale service (i.e., a component of a web search platform), finding that it is highly bursty. We show that, by keeping a headroom of idle cores available at all times, the service’s response tail-latency is maintained within its SLO even under heavy load. Third, we address the impact of limited memory availability on data-parallel applications by identifying an intrinsic property of their tasks: memory elasticity — the ability to handle memory pressure by writing intermediate data to external storage, incurring only moderate performance penalties when memory-constrained. We further show that memory elasticity is prevalent across various data-parallel application frameworks. We implement a scheduler that leverages memory elasticity and achieves reduced job completion times and increased memory utilization. We conclude that these techniques make it possible to colocate even highly-bursty latency-sensitive services with batch jobs, significantly increasing resource utilization.

Details

PDF