Abstract:The handwritten numeral string extraction in form document is studied. A method is proposed to effectively discern and capture the characters from overlapping borders based on hybrid binarization. Two key problems are investigated in detail including the location and the extraction on the cell of interest (COI) with broken strokes mended. The extracted handwritten characters remain integrated even for characters in different writing styles. Experimental results demonstrate that the proposed method is efficient.
[1] Mori S, Suen C Y, Yamamoto K. Historical Review of OCR Research and Development. Proc of the IEEE, 1992, 80(7): 1029-1058 [2] Rodriguez C, Mugucrza J, Navarro M, et al. A Two-Stage Classifier for Broken and Blurred Digits in Forms // Proc of the 14th International Conference on Pattern Recognition. Brisbane, Australia, 1998, Ⅱ: 1101-1105 [3] Naoi S, Yabuki M. Global Interpolation Method II for Handwritten Numbers Overlapping a Border by Automatic Knowledge Acquisition of Overlapped Conditions // Proc of the 4th International Conference on Document Analysis and Recognition. Ulm, Germany, 1997, Ⅱ: 540-543 [4] Zheng Yefeng, Liu Changsong, Ding Xiaoqing. Form Frame Line Removal by Line Width Thresholding Method. Pattern Recognition and Artificial Intelligence, 2001, 14(2): 206-210 (in Chinese) (郑冶枫,刘长松,丁晓青.线宽阈值法去除表格框线.模式识别与人工智能, 2001, 14(2): 206-210) [5] Yoo J Y, Kim M K, Han S Y, et al. Line Removal and Restoration of Handwritten Characters on the Form Documents // Proc of the 4th International Conference on Document Analysis and Recognition. Ulm, Germany, 1997, Ⅰ: 128-131 [6] Chung Y, Lee K, Yaik J, et al. Extraction and Restoration of Digits Touching or Overlapping Lines // Proc of the 13th International Conference on Pattern Recognition. Vienna, Australia, 1996, Ⅲ: 155-159 [7] Tseng Y H, Lee H J. Interfered-Character Recognition by Removing Interfering-Lines and Adjusting Feature Weights // Proc of the 14th International Conference on Pattern Recognition. Brisbane, Australia, 1998, Ⅱ: 1865-1867 [8] Hu Zhongshan, Lou Zhen, Yang Jingyu, et al. Line-Style Noise Removal in Document Image Processing. Journal of Computer Research and Development, 1999, 36(8): 992-995 (in Chinese) (胡钟山,娄 震,杨静宇,等.文档处理中消除线噪声的研究.计算机研究与发展, 1999, 36(8): 992-995) [9] Hori O, Doermann D S. Robust Table-Form Structure Analysis Based on Box-Driven Reasoning // Proc of the 3rd International Conference on Document Analysis and Recognition. Montreal, Canada, 1995, Ⅰ: 218-221 [10] Zhang Chongyan, Chen Qiang, Lou Zhen, et al. A Form Frame Line Removal Algorithm Based on Gray-Level Image. Journal of Computer Research and Development, 2005, 42(4): 635-639 (in Chinese) (张重阳,陈 强,娄 震,等.基于灰度图像的表格框线去除算法.计算机研究与发展, 2005, 42(4): 635-639) [11] Cheriet M, Said J N, Suen C Y. A Formal Model for Document Processing of Business Forms // Proc of the 3rd International Conference on Document Analysis and Recognition. Montreal, Canada, 1995, Ⅰ: 210-213 [12] Ye X Y, Cheriet M, Suen C Y, et al. Extraction of Bankcheck Items by Mathematical Morphology. International Journal on Document Analysis and Recognition, 1999, 2(2/3): 53-66 [13] Ye X Y, Cheriet M, Suen C Y. A Generic Method of Cleaning and Enhancing Handwritten Data from Business Forms. International Journal on Document Analysis and Recognition, 2001, 4(2): 84-96 [14] Yu B, Jain A K. A Generic System for Form Dropout. IEEE Trans on Pattern Analysis and Machine Intelligence, 1996, 18(11): 1127-1134 [15] Chen J S, Tseng D C. Overlapped-Character Separation and Reconstruction for Table-Form Documents // Proc of the IEEE International Conference on Image Processing. Lausanne, Switzerland, 1996, Ⅰ: 233-236 [16] Gloger J M. Use of the Hough Transform to Separate Merged Text/Graphics in Forms // Proc of the 11th International Conference on Pattern Recognition. The Hague, Netherlands, 1992, Ⅱ: 268-271 [17] Naoi S, Hotta Y, Yabuki M, et al. Global Interpolation in the Segmentation of Handwritten Characters Overlapping a Border // Proc of the 1st International Conference on Image Processing. Austin, USA, 1994, Ⅰ: 149-153 [18] Guillevic D, Suen C Y. Cursive Script Recognition: A Fast Reader Scheme // Proc of the 2nd International Conference on Document Analysis and Recognition. Tsukuba Science City, Japan, 1993: 311-314 [19] Sezgin M, Sankur B. Survey over Image Thresholding Techniques and Quantitative Performance Evaluation. Journal of Electronic Imaging, 2004, 13(1): 146-165 [20] Chang F. Retrieving Information from Document Images: Problems and Solutions. International Journal on Document Analysis and Recognition, 2001, 4(1): 46-55 [21] Gonzalez R C, Woods R E, Eddins S L. Digital Image Processing. 2nd Edition. Milan, Italy: Addison-Wesley, 2003 [22] Ridler T W, Calvard S. Picture Thresholding Using an Iterative Threshold Selection Method. IEEE Trans on Systems, Man and Cybernetics, 1978, 8(8): 630-632 [23] Sauvola J, Pietaksinen M. Adaptive Document Image Binarization. Journal of Pattern Recognition, 2000, 33(2): 225-236 [24] Hwang W L, Chang F. Character Extraction from Documents Using Wavelet Maxima. Image and Vision Computing, 1998, 16(5): 307-315 [25] Sonka M, Hlavac V, Boyle R. Image Processing, Analysis, and Machine Vision. 2nd Edition. London, UK: Chapman & Hall, 2002 [26] Wang Jianguo, Yan Hong. Mending Broken Handwriting with a Macrostructure Analysis Method to Improve Recognition. Pattern Recognition Letters, 1999, 20(8): 855-864