Abstract:Completely automated public turing test to tell computers and humans apart (CAPTCHA) is a kind of network security mechanism based on hard artificial problems. Study of recognition of CAPTCHA impels it to become more secure, and some hard atifical problems to be solved. Firstly, CAPTCHA recognition methods of state of the art are analyzed. Then, a recognition method is brought up based on recurrent neural network (RNN) which is composed by long short-term memory (LSTM) blocks. Thirdly, feature extraction for CAPTCHA recognition is studied. Finally, a decoding algorithm is proposed to improve the recognition rate. Experimental results show that the proposed recognition method is efficient. Gray value of images is proved to be a kind of good feature for RNN. Furthermore, the proposed decoding algorithm gets high recognition rates with low time complexity.
[1] von Ahn L, Blum M, Hopper N J, et al. CAPTCHA: Using Hard AI Problems for Security // Proc of the 22nd International Conference on Theory and Applications of Cryptographic Techniques. Warsaw, Poland, 2003: 294-311 [2] Rusu A, Thomas A, Govindaraju V. Generation and Use of Handwritten CAPTCHAs. International Journal on Document Analysis and Recognition, 2010, 13(1): 49-64 [3] Rusu A, Govindaraju V. Handwritten CAPTCHA: Using the Difference in the Abilities of Humans and Machines in Reading Handwritten Words // Proc of the 9th International Workshop on Frontiers in Handwriting Recognition. Tokyo, Japan, 2004: 226-231 [4] von Ahn L, Maurer B, McMillen C, et al. reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science, 2008, 321(5895): 1465-1468 [5] Xu Ming. Recognition and Anti-Recognition of Verification Code. Master Dissertation. Nanjing, China: Nanjing University of Science and Technology. College of Computer Science and Technology, 2007 (in Chinese) (许 明.验证码的识别和反识别.硕士学位论文.南京:南京理工大学.计算机科学与技术学院, 2007) [6] Chellapilla K, Simard P. Using Machine Learning to Break Visual Human Interaction Proofs (HIPs) // Weiss Y, Schlkopf B, Platt J, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2004, 17: 265-272 [7] Jiang Peng. Investigation on Verification Code and Its Implementation as Web Service. Master Dissertation. Nanjing, China: Nanjing University of Science and Technology. College of Computer Science and Technology, 2007 (in Chinese) (姜 鹏.验证码识别及其Web Service的实现研究.硕士学位论文.南京:南京理工大学.计算机科学与技术学院, 2007) [8] Hocevar S. PWNTCHA-Pretend Were Not a Turing Computer But a Human Antagonist [EB/OL]. [2010-02-15]. http://sam.zoy.org/wiki/PWNtcha [9] Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Computation, 1997, 9(8): 1735-1780 [10] Graves A. Supervised Sequence Labelling with Recurrent Neural Networks. Ph.D Dissertation. Manno, Switzerland: Technical University of Munich. Dalle Molle Institute for Artificial Intelligence, 2008 [11] Graves A, Liwicki M. A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Trans on Pattern Analysis and Machine Intelligence, 2009, 31(5): 855-868 [12] Varga T. Off-line Cursive Handwriting Recognition Using Synthetic Training Data. Ph.D Dissertation. Bern, Switzerland: University of Bern. Institute of Computer Science and Applied Mathematics (IAM), 2006 [13] Su Tonghua. Off-line Recognition of Chinese Handwriting: From Isolated Character to Realistic Text. Ph.D Dissertation. Harbin, China: Harbin Institute of Technology. School of Computer Science and Technology, 2008 (in Chinese) (苏统华.脱机中文手写识别——从孤立汉字到真实文本.博士学位论文.哈尔滨:哈尔滨工业大学.计算机科学与技术学院, 2008) [14] Zhao Wei, Liu Jiafeng, Tang Xianglong, et al. Cascaded HMM Training Algorithm for Continuous Character Recognition. Chinese Journal of Computers, 2007, 30(12): 2142-2150 (in Chinese) (赵 巍,刘家锋,唐降龙,等.连续字符识别的级联HMM训练算法.计算机学报, 2007, 30(12): 2142-2150) [15] Gers F A, Schmidhuber J. LSTM Recurrent Networks Learn Simple Context-Free and Context-Sensitive Languages. IEEE Trans on Neural Networks, 2001, 12(6): 1333-1340 [16] Mitchell T M. Machine Learning. New York, USA: McGraw Hill, 1997 [17] Yang Jian, Zhang D, Alejandro F, et al. Two-Dimensional PCA: A New Approach to Appearance-Based Face Representation and Recognition. IEEE Trans on Pattern Analysis and Machine Intelligence, 2004, 26(1): 117-129 [18] Wachenfeld S, Klein H V, Jiang Xiaoyi. Recognition of Screen-Rendered Text // Proc of the 18th International Conference on Pattern Recognition. Hongkong, China, 2006, Ⅱ: 1086-1089 [19] Li Ying. Investigation on Generation and Recognition of Verification Code. Master Dissertation. Nanjing, China: Nanjing University of Science and Technology. College of Computer Science and Technology, 2008 (in Chinese) (李 颖.Web验证码的生成与识别.硕士学位论文.南京:南京理工大学.计算机科学与技术学院, 2008)