Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

Future Blog Post

less than 1 minute read

Published: January 01, 2199

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published: August 14, 2015

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published: August 14, 2014

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published: August 14, 2013

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published: August 14, 2012

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

Liberator: A Data Reuse Framework for Out-of-Memory Graph Computing on GPUs

Published in IEEE Transactions on Parallel and Distributed Systems (TPDS), 2023

Liberator is a data reuse framework that enables efficient out-of-memory graph computing on GPUs by intelligently managing data transfers between host and device memory.

S. Li, R. Tang, J. Zhu, Z. Zhao, X. Gong, W. Wang, J. Zhang, P.-C. Yew. "Liberator: A Data Reuse Framework for Out-of-Memory Graph Computing on GPUs." IEEE Transactions on Parallel and Distributed Systems (TPDS), 34(6): 1954-1967, 2023.
Download Paper | Code

OneGraph: A Cross-Architecture Framework for Large-Scale Graph Computing on GPUs Based on oneAPI

Published in CCF Transactions on High Performance Computing (CCF-THPC), 2024

OneGraph is a cross-architecture graph computing framework built on oneAPI that enables portable and efficient large-scale graph processing across different GPU architectures.

S. Li, J. Zhu, J. Han, Y. Peng, Z. Wang, X. Gong, G. Wang, J. Zhang, X. Wang. "OneGraph: A Cross-Architecture Framework for Large-Scale Graph Computing on GPUs Based on oneAPI." CCF Transactions on High Performance Computing (CCF-THPC), 6(2): 179-191, 2024.
Download Paper

DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs

Published in ACM International Conference on Supercomputing (ICS), 2025

DR-CircuitGNN accelerates the training of heterogeneous circuit graph neural networks on GPUs through novel data reuse strategies and GPU-optimized computation kernels.

Y. Luo, S. Li, J. Tao, K. G. Thorat, X. Xie, H. Peng, N. Xu, C. Ding, S. Huang. "DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs." In Proceedings of the 39th ACM International Conference on Supercomputing (ICS '25), 2025.
Download Paper

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

Published in arXiv preprint, 2025

CudaForge is an agentic framework that automatically optimizes CUDA kernels using LLM-based agents with hardware profiling feedback, achieving significant speedups over hand-tuned baselines.

Z. Zhang, R. Wang, S. Li, Y. Luo, M. Hone, C. Ding. "CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization." arXiv preprint arXiv:2511.01884, 2025.
Download Paper

XuanJia: A Comprehensive Virtualization-Based Code Obfuscator for Binary Protection

Published in arXiv preprint, 2026

We present XuanJia, a comprehensive virtualization-based code obfuscator that leverages virtual machine protection to safeguard binary programs against reverse engineering and tampering attacks.

X. Zou, X. Gong, J. Zhang, S. Li, P.-C. Yew. "XuanJia: A Comprehensive Virtualization-Based Code Obfuscator for Binary Protection." arXiv preprint arXiv:2601.10261, 2026.
Download Paper

StitchCUDA: An Automated Multi-Agents End-to-End GPU Programming Framework with Rubric-based Agentic Reinforcement Learning

Published in International Conference on Machine Learning (ICML 2026), 2026

Accepted to ICML 2026. StitchCUDA is a multi-agent framework for end-to-end GPU program optimization using rubric-based agentic reinforcement learning, achieving ~1.72x speedup over multi-agent baselines and ~2.73x over RL model baselines on KernelBench.

S. Li, Z. Zhang, W. Chen, Y. Luo, M. Hong, C. Ding. "StitchCUDA: An Automated Multi-Agents End-to-End GPU Programming Framework with Rubric-based Agentic Reinforcement Learning." International Conference on Machine Learning (ICML), 2026.
Download Paper | Code

GSR-GNN: Training Acceleration and Memory-Saving Framework of Deep GNNs on Circuit Graph

Published in ACM/IEEE Design Automation Conference (DAC), 2026

GSR-GNN enables training GNNs with up to hundreds of layers on circuit graphs while reducing both compute and memory overhead, achieving up to 87.2% peak memory reduction and over 30x training speedup.

Y. Luo, S. Li, Y. Feng, V. Kancharla, S. Huang, C. Ding. "GSR-GNN: Training Acceleration and Memory-Saving Framework of Deep GNNs on Circuit Graph." In Proceedings of the 63rd ACM/IEEE Design Automation Conference (DAC '26), 2026.
Download Paper

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.

Shiyang Li

Sitemap

Pages

Posts

portfolio

publications

talks

teaching