Self-circulation Intelligent Text Recognition Based on Multi-stage Data Generation
MA Xinqiang1,2,3, LIU Lina2, LI Xuewei4, GU Ye4, HUANG Yi1,2,3, LIU Yong2
1. College of Computer Science and Technology, Guizhou University, Guiyang 550025;
2. Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou 310027;
3. Institute of Intelligent Computing and Visualization Based on Big Data, Chongqing University of Arts and Sciences, Chong-qing 402160;
4. Material Branch, State Grid Zhejiang Electric Power Co.Ltd., Hangzhou 310000algorithm gains good recognition performance in multiple public English datasets and Chinese-specific complex text scenarios.
There are few effective big data annotation methods for both English and Chinese recognition in complex and diverse scenarios. Therefore, multi-stage data generation self-circulation training algorithm(MSDG-OCR) for complex and diverse text recognition scenarios is proposed. Text data is generated randomly according to the defined generated data parameters, and the data annotation process is omitted. Grounded on convolutional recurrent neural network(CRNN) model, multi-stage self-circulation training is carried out, and the recognition accuracy of the samples is continuously improved by controlling the data generation strategy during the loop process. Experiments show that the proposed.
马新强, 刘丽娜, 李雪维, 顾晔, 黄羿, 刘勇. 基于多阶段数据生成的自循环文本智能识别[J]. 模式识别与人工智能, 2020, 33(5): 468-476.
MA Xinqiang, LIU Lina, LI Xuewei, GU Ye, HUANG Yi, LIU Yong. Self-circulation Intelligent Text Recognition Based on Multi-stage Data Generation. , 2020, 33(5): 468-476.
[1] WANG T, WU D J, COATES A, et al. End-to-End Text Recognition with Convolutional Neural Networks // Proc of the 21st International Conference on Pattern Recognition. Washington, USA: IEEE, 2012: 3304-3308.
[2] BAI X, YAO C, LIU W Y. Strokelets: A Learned Multi-scale Mid-level Representation for Scene Text Recognition. IEEE Transactions on Image Processing, 2016, 25(6): 2789-2802.
[3] SHI B G, BAI X, YAO C. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(11): 2298-2304.
[4] LEE C Y, OSINDERO S. Recursive Recurrent Nets with Attention Modeling for OCR in the Wild // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2231-2239.
[5] CHENG Z Z, BAI F, XU Y L, et al. Focusing Attention: Towards Accurate Text Recognition in Natural Images // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 5076-5084.
[6] BISSACCO A, CUMMINS M, NETZER Y, et al. PhotoOCR: Rea-ding Text in Uncontrolled Conditions // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2013: 785-792.
[7] GOODFELLOW I J, BULATOV Y, IBARZ J, et al. Multi-digit Number Recognition from Street View Imagery Using Deep Convolutional Neural Networks[C/OL]. [2019-08-22]. https://arxiv.org/pdf/1312.6082.pdf.
[8] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition[C/OL]. [2019-08-22]. https://arxiv.org/pdf/1406.2227.pdf.
[9] LIU Y, WANG Z W, JIN H L, et al. Synthetically Supervised Feature Learning for Scene Text Recognition // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 435-451.
[10] BHUNIA A K, BHUNIA A K, BANERJEE P, et al. Word Level Font-to-Font Image Translation Using Convolutional Recurrent Ge-nerative Adversarial Networks // Proc of the 24th International Conference on Pattern Recognition. Washington, USA: IEEE, 2018: 3645-3650.
[11] AZADI S, FISHER M, KIM V G, et al. Multi-content GAN for Few-Shot Font Style Transfer // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 7564-7573.
[12] YIN X C, YIN X W, HUANG K Z, et al. Robust Text Detection in Natural Scene Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(5): 970-983.
[13] GRAVES A, LIWICKI M, FERNÁNDEZ S, et al. A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(5): 855-868.
[14] BARTZ C, YANG H J, MEINEL C. STN-OCR: A Single Neural Network for Text Detection and Text Recognition[C/OL]. [2019-08-22]. https://arxiv.org/pdf/1707.08831.pdf.
[15] LIU X B, LIANG D, YAN S, et al. FOTS: Fast Oriented Text Spotting with a Unified Network // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 5676-5685.
[16] GOMEZ R, BITEN A F, COMEZ L, et al. Selective Style Transfer for Text[C/OL]. [2019-08-22]. https://arxiv.org/pdf/1906.01466.pdf.
[17] GRAVES A, FERNÁNDEZ S, GOMEZ F, et al. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks // Proc of the 23rd International Conference on Machine Learning. Berlin, Germany: Springer, 2006: 369-376.
[18] LI B Y, LIU Y, WANG X G. Gradient Harmonized Single-Stage Detector // Proc of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2019: 8577-8584.
[19] WANG K, BABENKO B, BELONGIES. End-to-End Scene Text Recognition // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2011: 1457-1464.
[20] KARATZAS D, SHAFAIT F, UCHIDA S, et al. ICDAR 2013 Robust Reading Competition // Proc of the 12th International Confe-rence on Document Analysis and Recognition. Washington, USA: IEEE, 2013: 1484-1493.
[21] MISHRA A, ALAHARI K, JAWAHAR C V. Scene Text Recognition Using Higher Order Language Priors[C/OL]. [2019-08-22]. https://www.di.ens.fr/willow/pdfscurrent/mishra12a.pdf.
[22] TIAN Z, HUANG W L, HE T, et al. Detecting Text in Natural Image with Connectionist Text Proposal Network // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 56-72.
[23] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Deep Structured Output Learning for Unconstrained Text Recognition[C/OL]. [2019-08-22]. https://arxiv.org/pdf/1412.5903.pdf.