1. School of Computer Science and Technology,Fudan University,Shanghai 200433 2.School of Computer Engineering and Science,Shanghai University,Shanghai 200072
Abstract:Mining and analysis of transaction sequences provide quantifiable schemes for decision makers to generate sales strategies. By studying the structure of transaction sequence sets according to the commodity sales amount and their variation trend,a kind of growth pattern is defined which reflects the variation trend of commodity price,as well as two methods of similarity measure,shifted window combined distance and angle vector distance,are defined. Based on those definitions,a clustering research is conducted by a goal function with time constraints. The experiments are conducted on the real commodity transaction sequence datasets. The results show that,combined with the growth patterns of two functions,it produces better clustering results under the condition of the time constraint,which could be well explained in practice.
[1] Berndt D J,Clifford J. Finding Patterns in Time Series: A Dynamic Programming Approach // Proc of the International Conference on
Knowledge Discovery and Data Mining. Menlo Park,USA,1996: 229-248 [2] Sakoe H,Chiba S. Dynamic Programming Algorithm Optimization for Spoken Word Recognition. IEEE Trans on Acoustics,Speech and Signal Processing,1978,26(1): 43-49 [3] Nikolov V. Optimizations in Time Series Clustering and Prediction // Proc of the 11th International Conference on Computer Systems and Technologies. Sofia,Bulgaria,2010: 528-533 [4] Mller-Levet C S,Klawonn F,Cho K H,et al. Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points // Proc of the International Symposium on Intelligent Data Analysis. Berlin,Germany,2003: 330-340 [5] Kumar M,Patel N R,Woo J. Clustering Seasonality Patterns in the Presence of Errors // Proc of the International Conference on Knowledge Discovery and Data Mining. Edmorton,Canada,2002: 557-563 [6] Xiong Yimin,Yeung D Y. Time Series Clustering with ARMA Mixtures. Pattern Recognition,2004,37(8): 1675-1689 [7] Smyth P. Clustering Sequences with Hidden Markov Models // Proc of the Conference on Advances in Neural Information Processing Systems. Cambridge,USA: MIT Press,1997: 648-654 [8] Li C,Biswas G. A Bayesian Approach to Temporal Data Clustering Using Hidden Markov Models // Proc of the International Conference on Machine Learning. San Franciso,USA: Morgan Kaufman Publishers,2000: 543-550 [9] Alon J,Sclaroff,Kollios G,et al. Discovering Clusters in Motion Time-Series Data // Proc of the Computer Vision and Pattern Recognition. Madison,USA,2003,I: 375-381 [10] Guralnik V,Karypis G. A Scalable Algorithm for Clustering Sequential Data // Proc of the IEEE International Conference on Data Mining. Washington,USA,2001: 179-186 [11] Wang Jianyong,Zhang Yuzhou,Zhou Lizhu,et al. Discriminating Subsequence Discovery for Sequence Clustering // Proc of the 7th SIAM International Conference on Data Mining. Minneapolis,USA,2007: 605-610 [12] Yang Jiong,Wang Wei. CLUSEQ: Efficient and Effective Sequence Clustering // Proc of the 19th International Conference on Data Engineering. Bangalore,India,2003: 101-112