基于异构FPGA的卷积网络加速器

doi:10.16451/j.cnki.issn1003-6059.201910007

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (792 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要基于神经网络的方法计算量通常十分庞大,限制方法在嵌入式场景领域的应用.为了解决这一问题,文中提出基于异构现场可编程门阵列的卷积网络加速器.采用滑动窗并行加速卷积计算过程,可同时处理不同输入、输出通道的卷积过程.同时结合网络量化过程进行8 bit定点加速器设计,降低计算资源的使用.实验表明,文中定点加速器运算速度较快,功耗较小,算法性能损失较小.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	周锡雄
	钟胜
	张伟俊
	王建辉

关键词 ：卷积神经网络, 现场可编程门阵列(FPGA), 加速器, 并行化, 定点化

Abstract：Computational complexity of neural network methods is high, and its application in embedded scenarios is limited. To solve this problem, a convolutional network accelerator based on heterogeneous field programmable gate array is proposed. The sliding window is employed to accelerate the convolution calculation process, and thus the convolution process of different input and output channels can be handled. A 8 bit fixed-point accelerator is designed combining network quantization process, and the usage of computing resources is reduced. Experiments demonstrate that the proposed fixed-point accelerator achieves a higher computing speed and a lower power consumption with a less performance loss.

Key words： Convolutional Neural Network Field Programmable Gate Array(FPGA) Accelerator Parallelism Fixed-Point

收稿日期: 2019-06-04

ZTFLH:

O 235

基金资助:国家自然科学基金项目(No.61806081)资助

通讯作者: 钟胜,博士,教授,主要研究方向为模式识别、图像处理、实时嵌入式系统.E-mail:zhongsheng@hust.edu.cn.

作者简介: 周锡雄,硕士研究生,主要研究方向为深度神经网络、并行计算.E-mail:472838511@qq.com;张伟俊,博士研究生,主要研究方向为计算机视觉、模式识别.E-mail:starfire.zhang@gmail.com;王建辉,博士研究生,主要研究方向为计算机视觉、机器学习、深度神经网络、并行算.Email:wang.ddu@gmail.com.

引用本文:

周锡雄, 钟胜, 张伟俊, 王建辉. 基于异构FPGA的卷积网络加速器[J]. 模式识别与人工智能, 2019, 32(10): 927-935. ZHOU Xixiong, ZHONG Sheng, ZHANG Weijun, WANG Jianhui. Heterogeneous FPGA Based Convolutional Network Accelerator. , 2019, 32(10): 927-935.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201910007 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2019/V32/I10/927

[1] LECUN Y, BENGIO Y, HINTO G. Deep Learning. Nature, 2015, 521(7553): 436-444.
[2] RUSSAKOVSHY O, DENG J, SU H, et al. Imagenet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 2015, 115(3): 211-252.
[3] GIRSHICK R. Fast R-CNN // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1440-1448.
[4] SHELLHAMER E, LONG J, DARRELL T. Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.
[5] ZHANG Y, PEZESHKI M, BRAKEL P, et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks[C/OL]. [2019-05-25]. https://arxiv.org/pdf/1701.02720.pdf.
[6] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[C/OL]. [2019-05-25]. https://arxiv.org/pdf/1409.1556.pdf.
[7] NURVITADHI E, VENKATESH G, SIM J, et al. Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? // Proc of the ACM/SIGDA International Symposium on Field-Programm-able Gate Arrays. New York, USA: ACM, 2017: 5-14.
[8] 吴建鑫,高斌斌,魏秀参,等.资源受限的深度学习: 挑战与实践.中国科学(信息科学), 2018, 48(5): 501-510.
(WU J X, GAO B B, WEI X S, et al. Resource-Constrained Deep Learning: Challenges and Practices. Chinese Science(Information Science), 2018, 48(5): 501-510.)
[9] GOKHALE V, JIN J, DUNDAR A, et al. A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks // Proc of the IEEE Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 682-687.
[10] 刘志成,祝永新,汪辉,等.基于FPGA的卷积神经网络并行加速结构设计.微电子学与计算机, 2018, 35(10): 80-84.
(LIU Z C, ZHU Y X, WANG H, et al. Parallel Acceleration Structure Design of Convolutional Neural Network Based on FPGA. Microelectronics and Computer, 2018, 35(10): 80-84.)
[11] SANKARADAS M, JAKKULA V, CADAMBI S, et al. A Massively Parallel Coprocessor for Convolutional Neural Networks // Proc of the IEEE International Conference on Application-Specific Systems. Washington, USA: IEEE, 2009: 53-60.
[12] ZHANG C, LI P, SUN G Y, et al. Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks // Proc of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York, USA: ACM, 2015: 161-170.
[13] 余子健.基于FPGA的卷积神经网络加速器.硕士学位论文.杭州:浙江大学, 2016.
(YU Z J. Convolutional Neural Network Accelerator Based on FPGA. Master Dissertation. Hangzhou, China: Zhejiang University, 2016.)
[14] ZHANG C, PRASANNA V. Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System // Proc of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York, USA: ACM, 2017: 35-44.
[15] DICECCO R, LACEY G, VASILJEVIC J, et al. Caffeinated FPGAs: FPGA Framework for Convolutional Neural Networks // Proc of the IEEE International Conference on Field-Programmable Technology. Washington, USA: IEEE, 2016: 265-268.
[16] AYDONAT U, O'CONNELL S, CAPALIJA D, et al. An OpenCL^TM Deep Learning Accelerator on Arria 10 // Proc of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York, USA: ACM, 2017: 55-64.
[17] HAN S, MAO H Z, DALLY W J. Deep Compression: Compre-ssing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding[C/OL]. [2019-05-25]. https://arxiv.org/pdf/1510.00149.pdf.
[18] CHEN W L, WILSON J, TYREE S, et al. Compressing Neural Networks with the Hashing Trick. Proceedings of Machine Learning Research, 2015, 37: 2285-2294.
[19] VANHOUCKE V, SENIOR A, MAO M Z. Improving the Speed of Neural Networks on CPUs [C/OL]. [2019-05-15]. http://www.andrewsenior.com/papers/VanhouckeNIPS11.pdf.
[20] JACOB B, KLIGYS S, CHEN B, et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 2704-2713.
[21] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788.
[22] MIGACZ S. 8-Bit Inference with TensorRT // Proc of the GPU Technology Conference. Berlin, Germany: Springer, 2017: 2-7.