|
|
Impact of Pattern Feature on Pattern Matching Problem with Wildcards and Length Constraints |
WANG Hai-Ping1, HU Xue-Gang1, XIE Fei1,2, GUO Dan1, WU Xin-Dong1,3 |
1. Department of Computer Science and Technology,School of Computer Science and Information,Hefei University of Technology,Hefei 230009 2.Department of Computer Science and Technology,Hefei Normal University,Hefei 230601 3. Department of Computer Science,The University of Vermont,Burlington 05405 |
|
|
Abstract Pattern matching with wildcards and length constraints (PMWL) provides more convenience to users since its flexibility in definition which also leads to difficulties in solving problem. Currently, to our knowledge, no polynomial algorithms obtain the complete solution of this problem, and the analysis for completeness is far from sufficient. In this paper, the pattern feature is proved to be the key factor for the completeness of PMWL and a concept, denoted as rep, is provided which measures the repetitions in the pattern. The completeness of PMWL is proved under a certain condition when rep=0. And the reason of incompleteness under the condition of rep>0 is also explained clearly. In the experiments, approximation ratio is utilized as a measurement to demonstrate the impact of rep on the PMWL problem.
|
Received: 26 July 2011
|
|
|
|
|
[1] Fischer M J,Paterson M S.String Matching and Other Products // Karp R M,ed.Complexity of Computation.Cambridge,USA: MIT Press,1974: 113-125 [2] Zhang Minghua,Kao B,Cheung D W,et al.Mining Periodic Patterns with Gap Requirement from Sequences // Proc of the ACM SIGMOD International Conference on Special Interest Group on Management of Data.Baltimore,USA,2005: 623-633 [3] Cole J R,Chai B,Marsh T L,et al.The Ribosomal Database Project(RDP-II): Sequences and Tools for High-Throughput rRNA Analysis.Nucleic Acids Research,2005,33 (z1): 294-296 [4] He Dan,Wu Xindong,Zhu Xingquan.SAIL-APPROX: An Efficient On-line Algorithm for Approximate Pattern Matching with Wildcards and Length Constraints // Proc of the IEEE International Conference on Bioinformatics and Biomedicine.Silicon Valley,USA,2007: 151-158 [5] Xie Fei,Wu Xindong,Hu Xuegang,et al.Sequential Pattern Mining with Wildcards // Proc of the 22nd IEEE International Conference on Tools with Artificial Intelligence.Arras,France,2010: 241-247 [6] Ji Xiaonan,Bailey J,Dong Guozhu.Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints.Knowledge and Information Systems,2007,11(3): 259-286 [7] He Yu,Wu Xindong,Zhu Xingquan,et al.Mining Frequent Patterns with Wildcards from Biological Sequences // Proc of the IEEE International Conference on Information Reuse and Integration.Las Vegas,USA,2007: 329-334 [8] Califf M E,Mooney R J.Bottom-up Relational Learning of Pattern Matching Rules for Information Extraction.Journal of Machine Learning Research,2003,4(6): 177-210 [9] Manber U,Baeza-Yates R.An Algorithm for String Matching with a Sequence of Don’t Cares.Information Processing Letters,1991,37(3): 133-136 [10] Min Fan,Wu Xindong,Lu Zhenyu.Pattern Matching with Independent Wildcard Gaps // Proc of the 8th IEEE International Conference on Dependable,Autonomic and Secure Computing.Chengdu,China,2009: 194-199 [11] Chen Gong,Wu Xindong,Zhu Xingquan,et al.Efficient String Matching with Wildcards and Length Constraints.Knowledge and Information Systems,2006,10(4): 399-419 [12] Guo Dan,Hong Xiaoli,Hu Xuegang,et al.A Bit-Parallel Algorithm for Sequential Pattern Matching with Wildcards.Cybernetics and Systems,2011,42(6): 382-401 [13] Wu Youxi,Wu Xindong,Jiang He,et al.A Heuristic Algorithm for MPMGOOC.Chinese Journal of Computers,2011,34(8): 1452-1462(in Chinese) (武优西,吴信东,江 贺,等.一种求解MPMGOOC问题的启发式算法.计算机学报,2011,34(8): 1452-1462) [14] Zhu Xingquan,Wu Xindong.Mining Complex Patterns across Sequences with Gap Requirements // Proc of the International Joint Conference on Artificial Intelligence.Hyderabad,India,2007: 726-735 |
|
|
|