Fast Keyword Spotting in Handwritten Chinese Documents Using Index
YU Geng1, YIN Fei2, CHEN You-Bin1, LIU Cheng-Lin2
1.School of Automation, Huazhong University of Science and Technology, Wuhan 430074 2.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190
Abstract:In document retrieval, high retrieval precision and speed can hardly be achieved simultaneously. A fast keyword spotting method for handwritten Chinese documents is proposed. By this method, keyword spotting is accelerated with accuracy preserved. Firstly, compressed index files are generated from the candidate segmentation recognition lattice of text lines recognition, then keywords are retrieved from the index files. Experimental results on the handwritten Chinese documents database CASIA-HWDB demonstrate the effectiveness of the proposed method. Moreover, it reduces the size of index and the retrieval time.
[1] Smith R. An Overview of the Tesseract OCR Engine // Proc of the 9th International Conference on Document Analysis and Recognition. Curitiba, Brazil, 2007, Ⅱ: 629-633 [2] Rath T M, Manmatha R. Word Image Matching Using Dynamic Time Warping // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Madison, USA, 2003, II: 521-527 [3] Rath T M, Manmatha R. Features for Word Spotting in Historical Manuscripts // Proc of the 7th International Conference on Document Analysis and Recognition. Edinburgh, UK, 2003, I: 218-222 [4] Rodriguez-Serrano J A, Perronnin F. Handwritten Word-Spotting Using Hidden Markov Models and Universal Vocabularies. Pattern Recognition, 2009, 42(9): 2106-2116 [5] Frinken V, Fischer A, Manmatha R, et al. A Novel Word Spotting Method Based on Recurrent Neural Networks. IEEE Trans on Pa-ttern Analysis and Machine Intelligence, 2012, 34(2): 211-224 [6] Lopresti D P, Ma M Y, Wang P S P, et al. Ink Matching of Cursive Chinese Handwritten Annotations. International Journal of Pattern Recognition and Artificial Intelligence, 1998, 12(1): 119-141
[7] Zhuang Y T, Zhang X F, Wu J Q, et al. Retrieval of Chinese Calligraphic Character Image // Proc of the 5th Pacific Rim Conference on Multimedia. Tokyo, Japan, 2004, I: 17-24 [8] Huang L, Yin F, Chen Q H, et al. Keyword Spotting in Unconstrained Handwritten Chinese Documents Using Contextual Word Model. Image and Vision Computing, 2013, 31(12): 958-968 [9] Zhang H, Wang D H, Liu C L. Character Confidence Based on N-Best List for Keyword Spotting in Online Chinese Handwritten Documents. Pattern Recognition, 2014, 47(5): 1880-1890 [10] Yin F, Liu C L. Handwritten Chinese Text Line Segmentation by Clustering with Distance Metric Learning. Pattern Recognition, 2009, 42(12): 3146-3157 [11] Liu C L, Koga M, Fujisawa H. Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading. IEEE Trans on Pattern Analysis and Machine Intelligence, 2002, 24(11): 1425-1437 [12] Cao H G, Bhardwaj A, Govindaraju V. A Probabilistic Method for Keyword Retrieval in Handwritten Document Images. Pattern Re-cognition, 2009, 42(12): 3374-3382 [13] Liu C L. One-vs-All Training of Prototype Classifier for Pattern Classification and Retrieval // Proc of the 20th International Conference on Pattern Recognition. Istanbul, Turkey, 2010: 3328-3331 [14] Liu C L. Normalization-Cooperated Gradient Feature Extraction for Handwritten Character Recognition. IEEE Trans on Pattern Analysis and Machine Intelligence, 2007, 29(8): 1465-1469 [15] Wang Q F, Yin F, Liu C L. Improving Handwritten Chinese Text Recognition by Confidence Transformation // Proc of the International Conference on Document Analysis and Recognition. Beijing, China, 2011: 518-522 [16] Wang Q F, Yin F, Liu C L. Handwritten Chinese Text Recognition by Integrating Multiple Contexts. IEEE Trans on Pattern Analysis and Machine Intelligence, 2012, 34(8): 1469-1481 [17] Liu C L, Yin F, Wang D H, et al. CASIA Online and Offline Chinese Handwriting Databases // Proc of the International Conference on Document Analysis and Recognition. Beijing, China, 2011: 37-41