Dola's Second Colloquium

End-to-End Quality of Service for Large Distributed Storage

Carlos G. Maltzahn
University of California, Santa Cruz
CS Colloquium, Tuesday, October 16, 2007

Storage systems for large and distributed clusters of compute servers are themselves large and distributed. Their complexity and scale makes it hard to manage these systems, and in particular they make it hard to ensure that applications using them get good, predictable performance. At the same time, shared access to the system from multiple applications, users, and competition from internal system activities leads to a need for predictable performance. Dr. Maltzahn described about his project, which investigates mechanisms for improving storage system performance in large distributed storage systems through mechanisms that integrate the performance aspects of the path that I/O operations take through the system, from the application interface on the compute server, through the network, to the storage servers: I/O scheduling at the storage server, storage server cache management, client-to-server network flow control, client-to-server connection management, and client cache management. He gave an overview of the project and then focus on the first piece, the I/O scheduling at the storage server, and present Fahrrad, the universal real-time disk scheduler. A universal real-time disk scheduler manages the execution of disk requests to provide performance guarantees for a range of applications with a mixture of different performance requirements and behaviors. Existing systems handle mixed workloads by over-provisioning resources, by partitioning (separating the workloads by resource or by time), or by hierarchically combining separate schedulers for each class of work. The Fahrrad scheduler, on the other hand, is a unified disk scheduler, based on proven real-time scheduling principles, that provides efficient utilization of disk resources while supporting mixtures of applications with hard and soft real-time performance requirements. Fahrrad supports (nearly) arbitrarily hard performance guarantees, with and without timing constraints, which allows mixing a backup application, hard real-time sensor data recording, and soft real-time video playback in a single system. The results show that Fahrrad manages disk performance well and that it can use its knowledge of timing requirements to yield higher throughput for some workloads than current non-real-time schedulers can provide.

Last modified 14 November 2007 at 8:19 am by sahad