Publications / 出版物
Journal Paper
T5. Theseus: Exploring Efficient Wafer-Scale Chip Design for Large Language Models
- Jingchen Zhu, Chenhao Xue, Yiqi Chen, Zhao Wang, Chen Zhang, Yu Shen, Yifan Chen, Zekang Cheng, Yu Jiang, Tianqi Wang, Yibo Lin, Wei Hu, Bin Cui, Runsheng Wang, Yun Liang, Guangyu Sun
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (T-CAD 2025)【PDF】
T4. DSTC: Dual-Side Sparsity Tensor Core for DNNs Acceleration on Modern GPU Architectures
- Chen Zhang, Yang Wang, Zhiqiang Xie, Cong Guo, Yunxin Liu, Jingwen Leng, Guangyu Sun, Zhigang Ji, Runsheng Wang, Yuan Xie, Ru Huang
- IEEE Transactions on Computers (TC 2025) 【PDF】
- Keyword: CNN, LSTM, LLM, GPU, Sparse Computing
T3. Fine-Grained Structured Sparse Computing for FPGA-Based AI Inference
- Chen Zhang, Shijie Cao, Guohao Dai, Chenbo Geng, Zhuliang Yao, Wencong Xiao, Yunxin Liu, Ming Wu, Lintao Zhang, Guangyu Sun, Zhigang Ji, Runsheng Wang, Ru Huang
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (T-CAD 2025) 【PDF】
- Keyword: CNN, LSTM, LLM, FPGA, Fine-Grained Structured Sparse Computing
T2. TSCompiler: Efficient Compilation Framework for Dynamic-shape Models
- Xiang Luo, Chen Zhang*, Chenbo Geng, Yanzhi Yi, Jiahui Hu, Renwei Zhang, Zhen Zhang, Gianpietro Consolaro, Fan Yang, Tun Lu, Ning Gu, Li Shang*
- SCIENCE CHINA Information Sciences (SCIS 2024)【PDF】
- Chen Zhang, Guangyu Sun, Zhenman Fang, Peipei Zhou, Peichen Pan, Jason Cong
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (T-CAD 2018) 【PDF】
- Award:
2017~2019 Donald O. Pederson Best Paper Award
- Keyword: Convolutional Neural Network, FPGA, Design Automation, Caffe, SDAccel
Conference Paper
C28. H^2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference
- Cong Li, Yihan Yin, Xintong Wu, Jingchen Zhu, Dimin Niu, Qiang Wu, Xin Si, Yuan Xie, Chen Zhang*, Guangyu Sun*
- Proceedings of the 52th Annual International Symposium on Computer Architecture (ISCA 2025)【PDF】
- Award:
Best Paper Award
C27. DATIS: DRAM Architecture and Technology Integrated Simulation
- Shiyu Xia, Chen Zhang*, Guangyu Sun, Guohao Dai, Runsheng Wang, Zhigang Ji*, Ru Huang
- Proceedings of the 2025 International Symposium of EDA (ISEDA 2025)【PDF】【Slide】
- Award:
Best Paper Award
C26. Tb-STC: Transposable Block-wise N:M Structured Sparse Tensor Core
- Jun Liu, Shulin Zeng, Junbo Zhao, Li Ding, Zeyu Wang, Jinhao Li, Zhenhua Zhu, Xuefei Ning, Chen Zhang, Yu Wang, Guohao Dai*
- Proceedings of the 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA 2025)【PDF】
- Yuanzheng Yao, Chen Zhang*, Chunyu Qi, Ruiyang Chen, Jun Wang, Zhihui Fu, Naifeng Jing, Xiaoyao Liang, and Zhuoran Song*
- Proceedings of the 61st ACM/IEEE Design Automation Conference (DAC 2025)【PDF】
C24. Oltron: Algorithm-Hardware Co-design for Outlier-Aware Quantization of LLMs with Inter-/Intra-Layer Adaptation
- Chenhao Xue, Chen Zhang*, Xun Jiang, Zhutianya Gao, Yibo Lin, Guangyu Sun*
- Proceedings of the 61st ACM/IEEE Design Automation Conference (DAC 2024)【PDF】【Slide】
C23. Amanda: Unified instrumentation framework for deep neural networks
- Yue Guan, Yuxian Qiu, Jingwen Leng, Fan Yang, Shuo Yu, Yunxin Liu, Yu Feng, Yuhao Zhu, Lidong Zhou, Yun Liang, Chen Zhang, Chao Li, Minyi Guo
- Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2024)
C22. Cambricon-r: A fully fused accelerator for real-time learning of neural scene representation
- Xinkai Song, Yuanbo Wen, Xing Hu, Tianbo Liu, Haoxuan Zhou, Husheng Han, Tian Zhi, Zidong Du, Wei Li, Rui Zhang, Chen Zhang, Lin Gao, Qi Guo, Tianshi Chen
- Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2023)
C21. RM-STC: Row-merge dataflow inspired GPU sparse tensor core for energy-efficient sparse acceleration
- Guyue Huang, Zhengyang Wang, Po-An Tsai, Chen Zhang, Yufei Ding, Yuan Xie
- Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2023)
C20. OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization
- Cong Guo, Jiaming Tang, Weiming Hu, Jingwen Leng, Chen Zhang, Fan Yang, Yunxin Liu, Minyi Guo, Yuhao Zhu
- Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA 2023)【PDF】
C19. Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training
- Cong Guo, Yuxian Qiu, Jingwen Leng, Chen Zhang, Ying Cao, Quanlu Zhang, Yunxin Liu, Fan Yang, Minyi Guo
- 2022 IEEE 40th International Conference on Computer Design (ICCD 2022)【PDF】
C18. Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization
- Cong Guo, Chen Zhang, Jingwen Leng, Zihan Liu, Fan Yang, Yunxin Liu, Minyi Guo, Yuhao Zhu
- Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO 2022)【PDF】
- Award:
MICRO 2022 Top Picks Honorable Mention
- Keywords: AI acceleration, Tensor Core, Quantization
C17. SQuant: On-the-fly data-free quantization via diagonal hessian approximation
- Cong Guo, Yuxian Qiu, Jingwen Leng, Xiaotian Gao, Chen Zhang, Yunxin Liu, Fan Yang, Yuhao Zhu, Minyi Guo
- International Conference on Learning (ICLR 2022) 【PDF】
C16. Dual-side sparse tensor core
- Yang Wang, Chen Zhang*, Zhiqiang Xie, Cong Guo, Yunxin Liu, Jingwen Leng
- 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA 2021)【PDF】
- Keywords: GPGPU, Sparse Tensor Core, AI acceleration
C15. Boosting mobile CNN inference through semantic memory
- Yun Li, Chen Zhang*, Shihao Han, Li Lyna Zhang, Baoqun Yin*, Yunxin Liu, Mengwei Xu
- Proceedings of the 29th ACM International Conference on Multimedia (Multimedia 2021)【PDF】【Web】
C14. Scylla: Qoe-aware continuous mobile vision with fpga-based dynamic deep neural network reconfiguration
- Shuang Jiang, Zhiyao Ma, Xiao Zeng, Chenren Xu, Mi Zhang, Chen Zhang, Yunxin Liu
- Proceedings of the 2022 IEEE Conference on Computer Communications (INFOCOM 2020)【PDF】
C13. Ladabert: Lightweight adaptation of bert through hybrid model compression
- Yihuan Mao, Yujing Wang, Chufan Wu, Chen Zhang, Yang Wang, Yaming Yang, Quanlu Zhang, Yunhai Tong, Jing Bai
- Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020)【PDF】
C12. Live video analytics with FPGA-based smart cameras
- Shang Wang, Chen Zhang*, Yuanchao Shu, Yunxin Liu*
- Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges (HotEdges 2019)【PDF】
C11. Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity
- Shijie Cao, Chen Zhang*, Zhuliang Yao, Wencong Xiao, Lanshun Nie, Dechen Zhan, Yunxin Liu, Ming Wu, Lintao Zhang
- Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2019) 【PDF】【Web】
- Industrial Impact: Used by NVIDIA Sparse Tensor Core (Ampere and Hopper Architecture)
- Keyword: Sparse Neural Network, Acceleration, FPGA
C10. Balanced sparsity for efficient dnn inference on gpu
- Zhuliang Yao, Shijie Cao, Wencong Xiao, Chen Zhang*, Lanshun Nie
- Proceedings of the AAAI conference on artificial intelligence (AAAI 2019) 【PDF】
- Industrial Impact: Used by NVIDIA Sparse Tensor Core (Ampere and Hopper Architecture)
- Keyword: Sparse Neural Network, Acceleration, GPGPU
C9. Seernet: Predicting convolutional neural network feature-map sparsity through low-bit quantization
- Shijie Cao, Lingxiao Ma, Wencong Xiao, Chen Zhang*, Yunxin Liu, Lintao Zhang, Lanshun Nie, Zhi Yang
- Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019) 【PDF】
C8. Best-effort FPGA programming: A few steps can go a long way
- Jason Cong, Zhenman Fang, Yuchen Hao, Peng Wei, Cody Hao Yu, Chen Zhang, Peipei Zhou
- arXiv preprint arXiv:1807.01340 (2018)
C7. Using data compression for optimizing FPGA-based convolutional neural network accelerators
- Yijin Guan, Ningyi Xu, Chen Zhang, Zhihang Yuan, Jason Cong
- International workshop on advanced parallel processing technologies (2017)
C6. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster
- Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, Jason Con
- Proceedings of the 2016 International Symposium on Low Power Electronics and Design (ISLPED 2016) 【PDF】
- Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, Jason Cong
- Proceedings of the 35th International Conference on Computer-Aided Design (ICCAD 2016) 【PDF】
C4. Optimizing FPGA-based accelerator design for deep convolutional neural networks
- Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, Jason Cong, “
- Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays (FPGA 2015)【PDF】
- Citation: 2218 (Top-1 citation in FPGA conference history since 1992)
- Award:
FPGA-2015 Best Paper Nomination
- Keyword: Convolutional Neural Network, FPGA, Acceleration, Roofline Model
C3. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD
- Peng Wang, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, Jason Cong
- Proceedings of the Ninth European Conference on Computer Systems (EuroSys 2014)【PDF】
C2. Memory partitioning for multidimensional arrays in high-level synthesis
- Yuxin Wang, Peng Li, Peng Zhang, Chen Zhang, Jason Cong
- Proceedings of the 50th Annual Design Automation Conference (DAC 2013)【PDF】
C1. Automatic multidimensional memory partitioning for FPGA-based accelerators
- Yuxin Wang, Peng Li, Peng Zhang, Chen Zhang, Jason Cong
- Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays (FPGA 2013)
Patent
- Bin Lin,Tao Peng,Chen Zhang,Minmin Sun,Lanbo Li,Xiafei Qiu,Shen Li,Yong Li,Wei Lin,Task handling methods as well as automated question answering methods, January 02, 2025, PCT/IB2025/050023
- 林彬,彭陶,张宸,孙敏敏,李澜博,邱侠斐,李深,李永,林伟,任务处理方法以及自动问答方法, January 03, 2024, 202410010610.3
- Haoran Li, Fei Sun, Yuan Gao, Guyue Huang, Ruiguang Zhong, Chen Zhang; GPU and Related Methods, November 24, 2023, China, CN117114960A
- Yuan Gao, Fei Sun, Haoran Li, Guyue Huang, Chen Zhang, Ruiguang Zhong; Thread Warp Execution Method and Related GPU, December 15, 2023, China, CN117237178A
- Haoran Li; Fei Sun; Yuan Gao; Guyue Huang; Ruiguang Zhong; Chen Zhang; GPU AND METHOD OF THE SAME, 2023-12-07, U.S.,US-20230367741-A1
- Yuan Gao; Fei Sun; Haoran Li; Guyue Huang; Chen Zhang; Ruiguang Zhong; WARP EXECUTION METHOD AND ASSOCIATED GPU, 2023-12-07,U.S., US-20230394617-A1
- Chen Zhang; Yunxin Liu; NEURAL NETWORK COMPRESSION BASED ON BANK-BALANCED SPARSITY, 2019-11-15, U.S., US20210150362A1
- Chen Zhang, Yunxin Liu; Sparse Convolutional Neural Network, June 18, 2019, China, CN112101511A