Abstract:Discovering interesting web access patterns from web logs is a web usagemining problem with many practical applications. Some conventional algorithms, such as GSP, HPrefix and WAPmine have low efficiency on low support thresholds. An algorithm based on the topdown manner is proposed for mining web access pattern. Instead of stubbornly building intermediate data for each step of mining process, it selectively builds intermediate data according to the features of current area. The experimental results on various datasets show that the proposed algorithm has better performance than WAPmine.
[1] Han Jiawei, Meng Xiaofeng, Wang Jing, et al. Research on Web Mining: A Survey. Journal of Computer Research and Development, 2001, 38(4): 405414 (in Chinese) (韩家炜,孟小峰,王 静,等.Web挖掘研究.计算机研究与发展, 2001, 38(4): 405414) [2] Srivastava J, Cooley R, Deshpande M, et al. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explorations, 2000, 1(2): 1223 [3] Yang Yizhen, Guan Xudong, You Jinyuan. Mining the Page Clustering Based on Web Page and the Site Topology. Journal of Software, 2002, 13(3): 467469 (in Chinese) (杨怡珍,管旭东,尤晋元.基于页面内容和站点结构的页面聚类挖掘算法.软件学报, 2002, 13(3): 467469) [4] Agrawal R, Srikant R. Mining Sequential Patterns // Proc of the 11th International Conference on Data Engineering. Taipei, China, 1995: 314 [5] Srikant R, Agrawal R. Mining Sequential Patterns: Generalizations and Performance Improvements // Proc of the 5th International Conference on Extending Database Technology. Avignon, France, 1996: 317 [6] Mohammed T Z. SPADE: An Efficient Algorithm for Mining Frequent Sequence. Machine Learning, 2001, 42(1/2): 3160 [7] Pei Jian, Han Jiawei, MortazaviAsl B, et al. PrefixSpan: Mining Sequential Patterns by PrefixProjected Growth // Proc of the 12th International Conference on Data Engineering. Los Alamitos, USA, 2001: 215224 [8] Yan Xifeng, Han Jiawei, Afshar R. CloSpan: Mining Closed Sequential Patterns in Large Datasets // Proc of the SIAM International Conference on Data Mining. San Francisco, USA, 2003: 166177 [9] Han Jiawei, Pei Jian, Yan Xifeng. Sequential Pattern Mining by PatternGrowth: Principles and Extensions. Studies in Fuzziness and Soft Computing, 2005, 180: 183220 [10] Pei Jian, Han Jiawei, Mortazaviasl B, et al. Mining Access Patterns Efficiently from Web Logs // Proc of the PacificAsia Conference on Knowledge Discovery and Data Mining. Kyoto, Japan, 2000: 396407 [11] Madria S K, Bhowmick S S, Ng W K, et al. Research Issues in Web Data Mining // Proc of the 1st International Conference on Data Warehousing and Knowledge Discovery. Florence, Italy, 1999: 303312