Abstract:A new algorithm to estimate the intrinsic dimension of data sets is proposed. The method is constructed by approximation and separation, which comes from the topological structure of data set and the distance characteristics of high dimensional space. Thereinto, the topological structure is intruduced by LLE. It discloses the relation between dimension and neighborhood then improves LLE. Experiments show that this method is reasonable and reliable than PCA.
[1] Verleysen M. Learning High-Dimensional Data. In: Ablameyko S, et al, eds. Limitation and Future Trends in Neural Computation. Amsterdam, the Netherlands: IOS Press, 2003, 141-162 [2] Cox T F, Cox M A A. Multidimensional Scaling. London, UK: Chapman and Hall, 1994 [3] Kohonen T. Self-Organizing Maps. 3rd Edition. New York, USA: Springer-Verlag, 2001 [4] Zha H Y, Zhang Z Y. Isometric Embedding and Continuum ISOMAP. In: Proc of the 20th International Conference on Machine Learning. Washington, USA, 2003, 864-871 [5] Kegl B. Intrinsic Dimension Estimation Using Packing Numbers. In: Proc of the Conference on Neural Information Processing Systems. Vancouver, Canada, 2002, 681-688 [6] Huo X M, Chen J H. Local Linear Projection (LLP). In: Proc of the 1st IEEE Workshop on Genomic Signal Processing and Statistics. Raleigh, USA, 2002. http://www.gensips.gatech.edu/proceedings/contributed/cp1-07.pdf [7] Bruske J, Sommer G. Intrinsic Dimensionality Estimation with Optimally Topology Preserving Maps. IEEE Trans on Pattern Analysis and Machine Intelligence, 1998, 20(5): 572-575 [8] Jolliffe I T. Principle Component Analysis. New York, USA: Springer-Verlag, 1986 [9] Roweis S T, Saul L K. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 2000, 290(5500): 2323-2326 [10] Beyer K S, Goldstein J, Ramakrishnan R, Shaft U. When is “Nearest Neighbor” Meaningful? In: Proc of the 7th International Conference on Database Theory. London, UK: Springer-Verlag, 1999, 217-235