Exposing RDMA NIC Resources for Software-Defined Scheduling

Yibo Huang, Yiming Qiu, Yunming Xiao, Archit Bhatnagar, Sylvia Ratnasamy, Ang Chen

April 2025

PDF DOI

Abstract

Remote Direct Memory Access (RDMA) is emerging as a critical utility for large-scale datacenters, delivering significant performance improvements over the traditional TCP networking stack. Recent studies indicate that numerous applications can benefit from RDMA integration, and RDMA hardware resources are being shared among these diversifying applications. However, today’s RDMA frameworks mostly view their software and hardware stacks as two independent subsystems, making it difficult for developers to align the performance objectives of RDMA applications with the limited resources in RDMA hardware.

We are developing a framework called SwiftRDMA, with the vision of enabling software-defined RDMA scheduling. SwiftRDMA views RDMA resource sharing as a scheduling problem. SwiftRDMA pinpoints the root causes of RDMA resource contentions and SLO violations, linking them to a set of trackable signals and controllable actions. A software scheduler then translates various operator demands into scheduling policies, which leverage the exposed signals and actions to achieve intended performance objectives. We describe our progress so far, and demonstrate the potential benefit of our approach.

Topic

Cloud Infrastructure

Type

Conference paper

Publication

In APNet'25