StitchCUDA: An Automated Multi-Agents End-to-End GPU Programming Framework with Rubric-based Agentic Reinforcement Learning

Published in International Conference on Machine Learning (ICML 2026), 2026

Abstract

Accepted to ICML 2026.

We propose StitchCUDA, an automated multi-agent framework for end-to-end GPU program optimization. The system employs three specialized agents: a Planner coordinating system design, a Coder implementing solutions incrementally, and a Verifier ensuring correctness through performance profiling tools. By incorporating rubric-based agentic reinforcement learning over two atomic skills (code generation and optimization), StitchCUDA prevents reward manipulation and enables advanced CUDA techniques such as kernel fusion.

Key Results

Nearly complete success rate on end-to-end GPU tasks
~1.72x speedup over multi-agent baselines on KernelBench
~2.73x improvement over reinforcement learning model baselines

Authors

Shiyang Li, Zijian Zhang, Winson Chen, Yuebo Luo, Mingyi Hong, Caiwen Ding

Recommended citation: S. Li, Z. Zhang, W. Chen, Y. Luo, M. Hong, C. Ding. "StitchCUDA: An Automated Multi-Agents End-to-End GPU Programming Framework with Rubric-based Agentic Reinforcement Learning." International Conference on Machine Learning (ICML), 2026.
Download Paper | Code

Share on

Twitter Facebook LinkedIn