OSDI '24 - Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents

Поділитися
Вставка
  • Опубліковано 5 лис 2024
  • Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents
    Qizheng Zhang, Stanford University; Ali Imran, Purdue University; Enkeleda Bardhi, Sapienza University of Rome; Tushar Swamy and Nathan Zhang, Stanford University; Muhammad Shahbaz, Purdue University and University of Michigan; Kunle Olukotun, Stanford University
    Recent work on in-network machine learning (ML) anticipates offline models to operate well in modern networking environments. However, upon deployment, these models struggle to cope with fluctuating traffic patterns and network conditions and, therefore, must be validated and updated frequently in an online fashion.
    This paper presents CARAVAN, a practical online learning system for in-network ML models. We tackle two primary challenges in facilitating online learning for networking: (a) the automatic labeling of evolving traffic and (b) the efficient monitoring and detection of model performance degradation to trigger retraining. CARAVAN repurposes existing systems (e.g., heuristics, access control lists, and foundation models)- not directly suitable for such dynamic environments-into high-quality labeling sources for generating labeled data for online learning. CARAVAN also introduces a new metric, accuracy proxy, to track model degradation and potential drift to efficiently trigger retraining. Our evaluations show that CARAVAN's labeling strategy enables in-network ML models to closely follow the changes in the traffic dynamics with a 30.3% improvement in F1 score (on average), compared to offline models. Moreover, CARAVAN sustains comparable inference accuracy to that of a continuous-learning system while consuming 61.3% less GPU compute time (on average) via accuracy proxy and retraining triggers.
    View the full OSDI '24 program at www.usenix.org...

КОМЕНТАРІ •