Abstract:As a typical method for multivariate statistical analysis, multidimensional scaling(MDS) has been widely used in dimensionality reduction and visualization. Based on a distance matrix of n samples, the coordinates of MDS in a low-dimensional Euclidean space are obtained. The time complexity for classical MDS algorithm(CMDS) is Θ(n3), which affects the speed of MDS. A new MDS algorithm based on the idea of divide-and-conquer is proposed. The distance matrix is divided into several submatrices along its diagonal, then the submatrices are solved respectively. With orthogonal transformation and translation transformation, all of these subproblem solutions are aligned to get the global solution to the original distance matrix. The result of the proposed algorithm is completely the same as that of CMDS. When the dimensionality of sample space is far less than the number of samples, the time complexity of the proposed algorithm is only Θ(nlgn). Thus, compared with CMDS, the speed of the proposed algorithm is greatly improved, which makes MDS applicable to larger datasets.
[1] Cox M A A, Cox T F. Multidimensional Scaling. 2nd Edition. Boca Raton, USA: CRC Press Inc, 2000 [2] Zhang R C. Multivariate Statistical Analysis. Beijing, China: Science Press, 2006 (in Chinese) (张润楚.多元统计分析.北京:科学出版社, 2006) [3] Chalmers M. A Linear Iteration Time Layout Algorithm for Visualizing High-Dimensional Data // Proc of the 7th IEEE Visualization Conference. San Francisco, USA, 1996: 127-131 [4] Morrison A, Ross G, Chalmers M. Fast Multidimensional Scaling through Sampling, Springs and Interpolation. Information Visualization, 2003, 2(1): 68-77 [5] Williams M, Munzner T. Steerable, Progressive Multidimensional Scaling // Proc of the IEEE Symposium on Information Visualization. Austin, USA, 2004: 57-64 [6] Faloutsos C, Lin K I. FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets // Proc of the ACM SIGMOD International Conference on Management of Data. San Jose, USA, 1995: 163-174 [7] Wang J T L, Wang X, Lin K I, et al. Evaluating a Class of Distance-Mapping Algorithms for Data Mining and Clustering // Proc of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, USA, 1999: 307-311 [8] de Silva V, Tenenbaum J B. Sparse Multidimensional Scaling Using Landmark Points. Technical Report. Palo Alto, USA: Stanford University, 2004 [9] Platt,J C. FastMap, MetricMap, and Landmark MDS are all Nystrm Algorithms // Proc of the 10th International Workshop on Artificial Intelligence and Statistics. The Savannah Hotel, Barbados, 2005: 261-268 [10] Meng D Y, Xu Z B, Zhang L, et al. A Cyclic Weighted Median Method for L1 Low-Rank Matrix Factorization with Missing Entries // Proc of the 27th AAAI Conference on Artificial Intelligence. Bellevue, USA, 2013: 704-710 [11] Mackey L, Talwalkar A, Jordan M I. Divide-and-Conquer Matrix Factorization // Proc of the 25th Annual Conference on Neural Information Processing Systems. Granada, Spain, 2011: 1134-1142 [12] Shi C Y. Mathematical Logic and Set Theory. Beijing, China: Tsinghua University Press, 2000 (in Chinese) (石纯一.数理逻辑与集合论.北京:清华大学出版社, 2000) [13] Young G, Householder A S. Discussion of a Set of Points in Terms of Their Mutual Distances. Psychometrika, 1938, 3(1): 19-22 [14] Gao H X. Applied Multivariate Statistical Analysis. Beijing, China: Peking University Press, 2005 (in Chinese) (高惠璇.应用多元统计分析.北京:北京大学出版社, 2005) [15] Cormen T H, Leiserson C E, Rivest R L, et al. Introduction to Algorithms. 3rd Edition. Cambridge, USA: MIT Press, 2009