Abstract:The main difference between English OCR system and other European OCR systems is character set. Therefore, European OCR system construction mainly depends on European character recognition. European character set is divided into two parts in this paper: English characters and special characters. Two key problems are considered, i.e. how to decrease the misclassification rate between English characters and special characters, and how to improve the recognition accuracy for special characters. Experimental result shows that the new system is more effective than the previous ones. Furthermore, the ideas proposed in this paper can be generalized to distinguish any similar symbols.
王恺,史广顺,王庆人. 欧洲文字识别方法研究[J]. 模式识别与人工智能, 2006, 19(4): 491-496.
WANG Kai, SHI GuangShun, WANG QingRen. Research on European Character Recognition. , 2006, 19(4): 491-496.
[1] Rice S V, Jenkins F R, Nartker T A. The Fifth Annual Test of OCR Accuracy. Technical Report, TR-96-01, Information Science Research Institute, University of Nevada, Las Vegas, USA, 1996 [2] Wang K, Wang Q R. Research on Chinese/English Mixed Document Recognition. Journal of Software, 2005, 16(5): 786-798 (in Chinese) (王 恺, 王庆人.中英文混合文章识别问题. 软件学报, 2005, 16(5): 786-798) [3] Spitz A L. Determination of the Script and Language Content of Document Images. IEEE Trans on Pattern Analysis and Machine Intelligence, 1997, 19(3): 235-245 [4] Takehiro N, Spitz A L. European Language Determination from Image. In: Proc of the International Conference on Document Analysis and Recognition. Tsukuba, Japan, 1993, 159-162 [5] Korkmaz S U, Akinci G K Y, Atalay V. A Character Recognizer for Turkish Language. In: Proc of the International Conference on Document Analysis and Recognition. Edinburgh, UK, 2003, Ⅱ: 1238-1241 [6] Baird H S. Anatomy of a Versatile Page Reader. Proc of the IEEE, 1992, 80(7): 1059-1065 [7] Baird H S, Gilbert D, Ittner D J. A Family of European Page Readers. In: Proc of the 12th IAPR International Conference on Pattern Recognition. Jerusalem, Israel, 1994, Ⅱ: 540-543 [8] Lü Y, Shi P F. A Practical Parallel Thinning Algorithm and Its Implementation. Computer Engineering and Design, 2000, 21(4): 53-56 (in Chinese) (吕 岳, 施鹏飞. 一种实用并行细化算法及其实现. 计算机工程与设计, 2000, 21(4): 53-56) [9] Wang Q R. Decision Tree Approach to Pattern Recognition Problems in a Large Character Set. Ph.D Dissertation. Department of Computer Science, Concordia University, Mortreal, Canada, 1984 [10] Zhang X Z, Yan C D, Liu X Y. A Method of Chinese Recognition Based on Characteristic Dot Matching. Journal of Chinese Information Processing, 1987, 11(3): 13-19 (in Chinese) (张炘中, 闫昌德, 刘秀英. 汉字识别的特征点法及其一种应用. 中文信息学报, 1987, 11(3): 13-19) [11] Liu W Y, Sheng Y Q, Qiao H, Fang Z L. Characteristic Dot Matching Method of Realizing Rapidly Recognition of the Number Plate. Journal of Optoelectronics·Laser, 2002, 13(3): 274-276 (in Chinese) (刘维一, 盛益强, 乔 辉, 方志良. 特征点匹配法实现汽车牌照的快速识别. 光电子·激光, 2002, 13(3): 274-276) [12] Xing X H, Gu G H. Method of Quickly Recognizing Vehicle Plate Based on Pattern Matching and Characteristic Dot Matching. Optoelectronic Technology, 2003, 23(4): 268-270 (in Chinese) (邢向华, 顾国华. 基于模板匹配和特征点匹配相结合的快速车牌识别方法. 光电子技术, 2003, 23(4): 268-270)