Abstract:Computational complexity of neural network methods is high, and its application in embedded scenarios is limited. To solve this problem, a convolutional network accelerator based on heterogeneous field programmable gate array is proposed. The sliding window is employed to accelerate the convolution calculation process, and thus the convolution process of different input and output channels can be handled. A 8 bit fixed-point accelerator is designed combining network quantization process, and the usage of computing resources is reduced. Experiments demonstrate that the proposed fixed-point accelerator achieves a higher computing speed and a lower power consumption with a less performance loss.
[1] LECUN Y, BENGIO Y, HINTO G. Deep Learning. Nature, 2015, 521(7553): 436-444. [2] RUSSAKOVSHY O, DENG J, SU H, et al. Imagenet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 2015, 115(3): 211-252. [3] GIRSHICK R. Fast R-CNN // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1440-1448. [4] SHELLHAMER E, LONG J, DARRELL T. Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651. [5] ZHANG Y, PEZESHKI M, BRAKEL P, et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks[C/OL]. [2019-05-25]. https://arxiv.org/pdf/1701.02720.pdf. [6] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[C/OL]. [2019-05-25]. https://arxiv.org/pdf/1409.1556.pdf. [7] NURVITADHI E, VENKATESH G, SIM J, et al. Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? // Proc of the ACM/SIGDA International Symposium on Field-Programm-able Gate Arrays. New York, USA: ACM, 2017: 5-14. [8] 吴建鑫,高斌斌,魏秀参,等.资源受限的深度学习: 挑战与实践.中国科学(信息科学), 2018, 48(5): 501-510. (WU J X, GAO B B, WEI X S, et al. Resource-Constrained Deep Learning: Challenges and Practices. Chinese Science(Information Science), 2018, 48(5): 501-510.) [9] GOKHALE V, JIN J, DUNDAR A, et al. A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks // Proc of the IEEE Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 682-687. [10] 刘志成,祝永新,汪 辉,等.基于FPGA的卷积神经网络并行加速结构设计.微电子学与计算机, 2018, 35(10): 80-84. (LIU Z C, ZHU Y X, WANG H, et al. Parallel Acceleration Structure Design of Convolutional Neural Network Based on FPGA. Microelectronics and Computer, 2018, 35(10): 80-84.) [11] SANKARADAS M, JAKKULA V, CADAMBI S, et al. A Massively Parallel Coprocessor for Convolutional Neural Networks // Proc of the IEEE International Conference on Application-Specific Systems. Washington, USA: IEEE, 2009: 53-60. [12] ZHANG C, LI P, SUN G Y, et al. Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks // Proc of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York, USA: ACM, 2015: 161-170. [13] 余子健.基于FPGA的卷积神经网络加速器.硕士学位论文.杭州:浙江大学, 2016. (YU Z J. Convolutional Neural Network Accelerator Based on FPGA. Master Dissertation. Hangzhou, China: Zhejiang University, 2016.) [14] ZHANG C, PRASANNA V. Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System // Proc of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York, USA: ACM, 2017: 35-44. [15] DICECCO R, LACEY G, VASILJEVIC J, et al. Caffeinated FPGAs: FPGA Framework for Convolutional Neural Networks // Proc of the IEEE International Conference on Field-Programmable Technology. Washington, USA: IEEE, 2016: 265-268. [16] AYDONAT U, O'CONNELL S, CAPALIJA D, et al. An OpenCLTM Deep Learning Accelerator on Arria 10 // Proc of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York, USA: ACM, 2017: 55-64. [17] HAN S, MAO H Z, DALLY W J. Deep Compression: Compre-ssing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding[C/OL]. [2019-05-25]. https://arxiv.org/pdf/1510.00149.pdf. [18] CHEN W L, WILSON J, TYREE S, et al. Compressing Neural Networks with the Hashing Trick. Proceedings of Machine Learning Research, 2015, 37: 2285-2294. [19] VANHOUCKE V, SENIOR A, MAO M Z. Improving the Speed of Neural Networks on CPUs [C/OL]. [2019-05-15]. http://www.andrewsenior.com/papers/VanhouckeNIPS11.pdf. [20] JACOB B, KLIGYS S, CHEN B, et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 2704-2713. [21] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788. [22] MIGACZ S. 8-Bit Inference with TensorRT // Proc of the GPU Technology Conference. Berlin, Germany: Springer, 2017: 2-7.