CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
Published in arXiv preprint, 2025
Abstract
We propose CudaForge, an agentic framework that automatically optimizes CUDA kernels using LLM-based agents with hardware profiling feedback. By integrating real-time GPU performance metrics into the agent’s optimization loop, CudaForge iteratively refines kernel implementations to achieve significant speedups over hand-tuned baselines.
Key Contributions
- An LLM-agent-based framework for automatic CUDA kernel optimization
- Hardware-in-the-loop feedback mechanism using GPU profiling data
- Demonstrated speedups across a range of computational kernels
Authors
Zijian Zhang, Rong Wang, Shiyang Li, Yuebo Luo, Mingyi Hone, Caiwen Ding

Recommended citation: Z. Zhang, R. Wang, S. Li, Y. Luo, M. Hone, C. Ding. "CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization." arXiv preprint arXiv:2511.01884, 2025.
Download Paper
