Advertisement

Quantization Techniques for Similarity Search in High-Dimensional Data Spaces

  • Christian Garcia-Arellano
  • Ken Sevcik
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2712)

Abstract

In the recent years, several techniques have been developed for efficient similarity search in high-dimensional data spaces. Some of the techniques, based on the idea of vector approximation via quantization, have been shown to be the most effective. The VA-file was the first technique to use vector approximation. The IQ-tree and the A-tree are subsequent techniques that impose a directory structure over the quantized VA-file representation. The performance gains of the IQ-tree result mainly from an optimized I/O strategy permitted by the directory structure. Those of the A-tree result mainly from exploiting the clustering of the data itself. In our work, first we evaluate the relative performance of these two enhanced approaches over high-dimensional data sets with different clustering characteristics. Second, we present the Clustered IQ-Tree, which is an indexing strategy that combines the best features of the IQ-tree and the A-tree, leading to better query performance than the former and more stable performance than the latter across different types of data sets.

Keywords

Range Query Query Performance Vector Approximation Neighbor Query Quantization Technique 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    D. Barbara and P. Chen. Using the fractal dimension to cluster data sets. In Proc. of the 6th KDDM, pages 260–264, 2000.Google Scholar
  2. 2.
    S. Berchtold, C. Böhm, H. V. Jagadish, H.-P. Kriegel, and J. Sander. Independent quantization: An index compression technique for high-dimensional data spaces. In Proc. of the 16th ICDE, pages 577–588, 2000.Google Scholar
  3. 3.
    S. Berchtold, C. Böhm, and H.-P. Kriegel. The pyramid-technique: towards breaking the curse of dimensionality. In Proc. of ACM SIGMOD Int. Conf., pages 142–153, 1998.Google Scholar
  4. 4.
    S. Berchtold, D. Keim, and H.-P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proc. of the 22nd VLDB, pages 28–39, 1996.Google Scholar
  5. 5.
    K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is “nearest neighbor” meaningful? In In Proc. of the 7th ICDT, pages 217–235, 1999.Google Scholar
  6. 6.
    C. Böhm. A cost model for query processing in high-dimensional data spaces. ACM Transactions on Database Systems, 25:129–178, 2000.CrossRefGoogle Scholar
  7. 7.
    C. Böhm, S. Berchtold, and D. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comp. Surveys, 33(3):322–373, 2001.CrossRefGoogle Scholar
  8. 8.
    K. Chakrabarti and S. Mehrotra. Local dimensionality reduction: A new approach to indexing high dimensional spaces. In The VLDB Journal, pages 89–100, 2000.Google Scholar
  9. 9.
    S. Chen, P. Gibbons, T. Mowry, and G. Valentin. Fractal prefetching b+-trees: Optimizing both cache and disk performance. Proc. of ACM SIGMOD Int. Conf., pages 157–168, 2002.Google Scholar
  10. 10.
    H. Ferhatosmanoglu, I. Stanoi, D. Agrawal, and A. E. Abbadi. Constrained nearest neighbor queries. In In Proc. of the 7th Int. Symp. on Spatial and Temporal Databases SSTD, pages 257–278, 2001.Google Scholar
  11. 11.
    H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. E. Abbadi. Approximate nearest neighbor searching in multimedia databases. In Proc. of the 17th ICDE, pages 503–511, 2001.Google Scholar
  12. 12.
    E. Forgy. Cluster analysis for multivariate data: Efficiency vs. interpretability of classifications. Biometrics, 21, 1965.Google Scholar
  13. 13.
    V. Gaede and O. Günther. Multidimensional access methods. ACM Comp. Surveys, 30(2):170–231, 1998.CrossRefGoogle Scholar
  14. 14.
    C. Garcia-Arellano. Quantization techniques for similarity search in high-dimensional data spaces, 2002. Master’s Thesis. Computer Science Deptartment, University of Toronto, Canada.Google Scholar
  15. 15.
    C. Garcia-Arellano and K. Sevcik. Quantization techniques for similarity search in high-dimensional data spaces, 2003. Technical Report CSRG-471. Computer Science Deptartment, University of Toronto, Canada.Google Scholar
  16. 16.
    N. Katayama and S. Satoh. The SR-tree: an index structure for high-dimensional nearest neighbor queries. In Proc. of ACM SIGMOD Int. Conf., pages 369–380, 1997.Google Scholar
  17. 17.
    C. Li, E. Chang, H. Garcia-Molina, and G. Wiederhold. Clustering for approximate similarity search in high-dimensional spaces. IEEE Trans. on Knowledge and Data Engineering, 14(4):792–808, 2002.CrossRefGoogle Scholar
  18. 18.
    Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima. The A-tree: An index structure for high-dimensional spaces using relative approximation. In Proc. of the 26th VLDB, pages 516–526, 2000.Google Scholar
  19. 19.
    B. Seeger, P. A. Larson, and R. McFayden. Reading a set of disk pages. In Proc. of the 19th VLDB, pages 592–603, 1998.Google Scholar
  20. 20.
    R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proc. of the 24th VLDB, pages 194–205, 24–27 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Christian Garcia-Arellano
    • 1
    • 2
  • Ken Sevcik
    • 1
  1. 1.Department of Computer ScienceUniversity of TorontoCanada
  2. 2.IBM Toronto LabTorontoCanada

Personalised recommendations