StitchCUDA: An Automated Multi-Agents End-to-End GPU Programming Framework with Rubric-based Agentic Reinforcement Learning

Published in arXiv preprint, 2026

Abstract

We propose StitchCUDA, an automated multi-agent framework for end-to-end GPU program optimization. The system employs three specialized agents: a Planner coordinating system design, a Coder implementing solutions incrementally, and a Verifier ensuring correctness through performance profiling tools. By incorporating rubric-based agentic reinforcement learning over two atomic skills (code generation and optimization), StitchCUDA prevents reward manipulation and enables advanced CUDA techniques such as kernel fusion.

Key Results

  • Nearly complete success rate on end-to-end GPU tasks
  • ~1.72x speedup over multi-agent baselines on KernelBench
  • ~2.73x improvement over reinforcement learning model baselines

Authors

Shiyang Li, Zijian Zhang, Winson Chen, Yuebo Luo, Mingyi Hong, Caiwen Ding

Recommended citation: S. Li, Z. Zhang, W. Chen, Y. Luo, M. Hong, C. Ding. "StitchCUDA: An Automated Multi-Agents End-to-End GPU Programming Framework with Rubric-based Agentic Reinforcement Learning." arXiv preprint arXiv:2603.02637, 2026.
Download Paper