Posts by Collection

portfolio

publications

Liberator: A Data Reuse Framework for Out-of-Memory Graph Computing on GPUs

Published in IEEE Transactions on Parallel and Distributed Systems (TPDS), 2023

Liberator is a data reuse framework that enables efficient out-of-memory graph computing on GPUs by intelligently managing data transfers between host and device memory.

S. Li, R. Tang, J. Zhu, Z. Zhao, X. Gong, W. Wang, J. Zhang, P.-C. Yew. "Liberator: A Data Reuse Framework for Out-of-Memory Graph Computing on GPUs." IEEE Transactions on Parallel and Distributed Systems (TPDS), 34(6): 1954-1967, 2023.
Download Paper

OneGraph: A Cross-Architecture Framework for Large-Scale Graph Computing on GPUs Based on oneAPI

Published in CCF Transactions on High Performance Computing (CCF-THPC), 2024

OneGraph is a cross-architecture graph computing framework built on oneAPI that enables portable and efficient large-scale graph processing across different GPU architectures.

S. Li, J. Zhu, J. Han, Y. Peng, Z. Wang, X. Gong, G. Wang, J. Zhang, X. Wang. "OneGraph: A Cross-Architecture Framework for Large-Scale Graph Computing on GPUs Based on oneAPI." CCF Transactions on High Performance Computing (CCF-THPC), 6(2): 179-191, 2024.
Download Paper

DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs

Published in ACM International Conference on Supercomputing (ICS), 2025

DR-CircuitGNN accelerates the training of heterogeneous circuit graph neural networks on GPUs through novel data reuse strategies and GPU-optimized computation kernels.

Y. Luo, S. Li, J. Tao, K. G. Thorat, X. Xie, H. Peng, N. Xu, C. Ding, S. Huang. "DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs." In Proceedings of the 39th ACM International Conference on Supercomputing (ICS '25), 2025.
Download Paper

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

Published in arXiv preprint, 2025

CudaForge is an agentic framework that automatically optimizes CUDA kernels using LLM-based agents with hardware profiling feedback, achieving significant speedups over hand-tuned baselines.

Z. Zhang, R. Wang, S. Li, Y. Luo, M. Hone, C. Ding. "CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization." arXiv preprint arXiv:2511.01884, 2025.
Download Paper

XuanJia: A Comprehensive Virtualization-Based Code Obfuscator for Binary Protection

Published in arXiv preprint, 2026

We present XuanJia, a comprehensive virtualization-based code obfuscator that leverages virtual machine protection to safeguard binary programs against reverse engineering and tampering attacks.

X. Zou, X. Gong, J. Zhang, S. Li, P.-C. Yew. "XuanJia: A Comprehensive Virtualization-Based Code Obfuscator for Binary Protection." arXiv preprint arXiv:2601.10261, 2026.
Download Paper

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.