Abstract:A high-dimensional linear indexing method is presented by sorting principal component based on elliptical-shaped clustering. The proposed approach reduces the number of data points accessed during the k-nearest neighbor search. The dataset is partitioned into some elliptical-shaped clusters, and KL transform is performed on each cluster. The approximate vectors are built at the KL transform domain on each cluster. When performing k-nearest neighbor search, the partial distortion searching algorithm is used to reject the improper approximate vectors. The clusters are accessed in increasing order of their lower bound from the query point. The experimental results on large image databases with high dimensions show that compared with other well-known vector approximate method, the proposed approach reduces the number of approximate vectors accessed and provides a higher search speed.
[1] Rui Yong, Huang T S, Chang S F. Image Retrieval: Current Techniques, Promising Directions, and Open Issues. Journal of Visual Communication and Image Representation, 1999, 10(1): 36-92 [2] Bhm C, Berchtold S, Keim D. Searching in High-Dimensional Spaces-Index Structures for Improving the Performance of Multimedia Databases. ACM Computing Surveys, 2001, 33(3): 322-373 [3] Weber R, Schek H J, Blott S. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces // Proc of the 24th International Conference on Very Large Data Bases. New York, USA, 1998: 194-205 [4] Jagadish H V, Ooi B C, Tan K L, et al. iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor Search. ACM Trans on Database Systems, 2005, 30(2): 364-397 [5] Lejesk H, Asmundsson F, Jansson B, et al. NV-tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections. IEEE Trans on Pattern Analysis and Machine Intelligence, 2009, 31(5): 869-883
[6] Beyer K S, Goldstein J, Ramakrishnan R. When is ‘Nearest Neighbor’ Meaningful? // Proc of the 7th International Conference on Database Theory. Jerusalem, Israel, 1999: 217-235 [7] Blott S, Weber R. Whats Wrong with High-Dimensional Similarity Search? Proc of the VLDB Endowment, 2008, 1(1): 3 [8] Ferhatosmanoglu H, Tuncel E, Agrawal D. Vector Approximation Based Indexing for Non-Uniform High Dimensional Data Sets // Proc of the ACM International Conference on Information and Knowledge Management. McLean, USA, 2000: 202-209 [9] Cha G H, Zhu Xiaoming, Petkovic D, et al. An Efficient Indexing Method for Nearest Neighbor Searches in High-Dimensional Image Databases. IEEE Trans on Multimedia, 2002, 4(1): 76-87 [10] Berchtold S, Bohm C, Jagadish H V, et al. Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces // Proc of the 16th International Conference on Data Engineering. San Diego, USA, 2000: 577-588 [11] Cha G H, Chung C W. The GC-Tree: A High-Dimensional Index Structure for Similarity Search in Image Databases. IEEE Trans on Multimedia, 2002, 4(2): 235-247 [12] Dong Daoguo, Liang Liuhong, Xue Xiangyang. VAR-Tree-A New High-Dimensional Data Index Structure. Journal of Computer Research and Development, 2005, 42(1): 10-17 (in Chinese) (董道国,梁刘红,薛向阳.VAR-Tree——一种新的高维数据索引结构.计算机研究与发展, 2005, 42(1): 10-17) [13] Shen Hengtao, Zhou Xiaofang, Zhou Aoying. An Adaptive and Dynamic Dimensionality Reduction Method for High-Dimensional Indexing. The VLDB Journal-The International Journal on Very Large Data Bases, 2007, 16(2): 219-234 [14] Cui Jiangtao, Zhou Shuisheng, Sun Junding. Efficient High-Dimensional Indexing by Sorting Principal Component. Pattern Recognition Letters, 2007, 28(6): 2412-2418 [15] Chakrabarti K, Mehrotra S. Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces // Proc of the 26th International Conference on Very Large Data Bases. Cairo, Egypt, 2000: 89-100