An Improved MultiPattern String Matching Algorithm
DAI LiuLing1,2, HUANG HeYan 2, CHEN ZhaoXiong2
1.Software Institute, Beijing Institute of Technology, Beijing 100081 2.Language Information Engineering Center, Chinese Academy of Sciences, Beijing 100083
Abstract:A new algorithm for matching multiple strings at the same time is suggested. The new algorithm is based on the ideas of QS and SunWu algorithm, named as QMS (Quick Multipattern Searching) algorithm in this paper. QMS uses hashing and PREFIX table to decrease the number of comparisons. During the computation of the shift distance, the character closely after the current window is considered. Because the shift distance is computed with more accurate technique, larger average shift distance is acquired. More characters can be skipped when the text is scanned, so the algorithm becomes very efficient. Tests on an actual corpus show that QMS algorithm is much more efficient than SunWu algorithm under common circumstances.
[1] Tan J L. String Matching Algorithm and Application of Network Content Analysis. Ph.D Dissertation. Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 2003 (in Chinese) (谭建龙. 串匹配算法机器在网络内容分析中的应用.博士学位论文. 中国科学院计算技术研究所,北京, 2003) [2] Boyer R S, Moore J S. A Fast String Searching Algorithm. Communications of the ACM, 1977, 20(10): 762-772 [3] Sunday D M. A Very Fast Substring Search Algorithm. Communications of the ACM, 1990, 33(8): 132-142 [4] Lecroq T. Experimental Results on String Matching Algorithms. Software-Practice & Experience, 1995, 25(7): 727-765 [5] Wang Y C, Shen Z, Xu Y Z. Improved Algorithms for Matching Multiple Patterns. Journal of Computer Research and Development, 2002, 39(1): 55-60 (in Chinese) (王永成, 沈 州, 许一震. 改进的多模式匹配算法. 计算机研究与发展, 2002, 39(1): 55-60) [6] Aho A V, Corasick M J. Efficient String Matching: An Aid to Bibliographic Search. Communication of the ACM, 1975, 18(6): 333-340 [7] Wu S, Manber U. A Fast Algorithm for Multi-Pattern Searching. Technical Report, TR-94-17, Department of Computer Science, University of Arizona, Tucson, USA, 1994 [8] Zhang X, Tan J L, Cheng X Q. An Improved Wu-Manber Multiple Keywords Match Algorithm. Journal of Computer Application, 2003, 23(7): 29-31 (in Chinese) (张 鑫, 谭建龙, 程学旗. 一种改进的Wu-Manber多关键字匹配算法. 计算机应用, 2003, 23(7): 29-31) [9] http://www.research.att.com/~lewis/reuters21578.html