Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
Future Blog Post
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.
Blog Post number 4
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 3
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 2
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 1
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
portfolio
Portfolio item number 1
Short description of portfolio item number 1
Portfolio item number 2
Short description of portfolio item number 2 
publications

Liberator: A Data Reuse Framework for Out-of-Memory Graph Computing on GPUs
Published in IEEE Transactions on Parallel and Distributed Systems (TPDS), 2023
Liberator is a data reuse framework that enables efficient out-of-memory graph computing on GPUs by intelligently managing data transfers between host and device memory.
S. Li, R. Tang, J. Zhu, Z. Zhao, X. Gong, W. Wang, J. Zhang, P.-C. Yew. "Liberator: A Data Reuse Framework for Out-of-Memory Graph Computing on GPUs." IEEE Transactions on Parallel and Distributed Systems (TPDS), 34(6): 1954-1967, 2023.
Download Paper

OneGraph: A Cross-Architecture Framework for Large-Scale Graph Computing on GPUs Based on oneAPI
Published in CCF Transactions on High Performance Computing (CCF-THPC), 2024
OneGraph is a cross-architecture graph computing framework built on oneAPI that enables portable and efficient large-scale graph processing across different GPU architectures.
S. Li, J. Zhu, J. Han, Y. Peng, Z. Wang, X. Gong, G. Wang, J. Zhang, X. Wang. "OneGraph: A Cross-Architecture Framework for Large-Scale Graph Computing on GPUs Based on oneAPI." CCF Transactions on High Performance Computing (CCF-THPC), 6(2): 179-191, 2024.
Download Paper
DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs

Published in ACM International Conference on Supercomputing (ICS), 2025
DR-CircuitGNN accelerates the training of heterogeneous circuit graph neural networks on GPUs through novel data reuse strategies and GPU-optimized computation kernels.
Y. Luo, S. Li, J. Tao, K. G. Thorat, X. Xie, H. Peng, N. Xu, C. Ding, S. Huang. "DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs." In Proceedings of the 39th ACM International Conference on Supercomputing (ICS '25), 2025.
Download Paper
CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

Published in arXiv preprint, 2025
CudaForge is an agentic framework that automatically optimizes CUDA kernels using LLM-based agents with hardware profiling feedback, achieving significant speedups over hand-tuned baselines.
Z. Zhang, R. Wang, S. Li, Y. Luo, M. Hone, C. Ding. "CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization." arXiv preprint arXiv:2511.01884, 2025.
Download Paper
XuanJia: A Comprehensive Virtualization-Based Code Obfuscator for Binary Protection

Published in arXiv preprint, 2026
We present XuanJia, a comprehensive virtualization-based code obfuscator that leverages virtual machine protection to safeguard binary programs against reverse engineering and tampering attacks.
X. Zou, X. Gong, J. Zhang, S. Li, P.-C. Yew. "XuanJia: A Comprehensive Virtualization-Based Code Obfuscator for Binary Protection." arXiv preprint arXiv:2601.10261, 2026.
Download Paper
StitchCUDA: An Automated Multi-Agents End-to-End GPU Programming Framework with Rubric-based Agentic Reinforcement Learning

Published in arXiv preprint, 2026
StitchCUDA is a multi-agent framework for end-to-end GPU program optimization using rubric-based agentic reinforcement learning, achieving ~1.72x speedup over multi-agent baselines and ~2.73x over RL model baselines on KernelBench.
S. Li, Z. Zhang, W. Chen, Y. Luo, M. Hong, C. Ding. "StitchCUDA: An Automated Multi-Agents End-to-End GPU Programming Framework with Rubric-based Agentic Reinforcement Learning." arXiv preprint arXiv:2603.02637, 2026.
Download Paper
GSR-GNN: Training Acceleration and Memory-Saving Framework of Deep GNNs on Circuit Graph

Published in ACM/IEEE Design Automation Conference (DAC), 2026
GSR-GNN enables training GNNs with up to hundreds of layers on circuit graphs while reducing both compute and memory overhead, achieving up to 87.2% peak memory reduction and over 30x training speedup.
Y. Luo, S. Li, Y. Feng, V. Kancharla, S. Huang, C. Ding. "GSR-GNN: Training Acceleration and Memory-Saving Framework of Deep GNNs on Circuit Graph." In Proceedings of the 63rd ACM/IEEE Design Automation Conference (DAC '26), 2026.
Download Paper
talks
Talk 1 on Relevant Topic in Your Field
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Conference Proceeding talk 3 on Relevant Topic in Your Field
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
teaching
Teaching experience 1
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Teaching experience 2
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.
