An unbalanced approach to metric space searching
Proximity queries (the searching problem generalized beyond exact match) is mostly modeled as metric space. A metric space consists of a collection of objects and a distance function defined among them. The goal is to preprocess the data set (a slow procedure) to quickly answer proximity queries. Th...
Autores principales: | , , |
---|---|
Formato: | Objeto de conferencia |
Lenguaje: | Inglés |
Publicado: |
2005
|
Materias: | |
Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/21087 |
Aporte de: | Aportado por :
SEDICI (UNLP) de
Universidad Nacional de La Plata .
|
Sumario: | Proximity queries (the searching problem generalized beyond exact match) is mostly modeled as metric space. A metric space consists of a collection of objects and a distance function defined among them. The goal is to preprocess the data set (a slow procedure) to quickly answer proximity queries. This problem have received a lot of attention recently, specially in the pattern recognition community.
The Excluded Middle Vantage Point Forest (VP–forest) is a data structure designed to search in high dimensional vector spaces. A VP–forest is built as a collection of balanced Vantage Point Trees (VP–trees).
In this work we propose a novel two-fold approach for searching. Firstly we extend the VP– forest to search in metric spaces, and more importantly we test a counterintuitive modification to the VP–tree, namely to unbalance it. In exact searching an unbalanced data structure perform poorly, and most of the algorithmic effort is directed to obtain a balanced data structure. The unbalancing approach is motivated by a recent data structure (the List of Clusters ) specialized in high dimensional metric space searches, which is an extremely unbalanced data structure (a linked list) outperforming other approaches. |
---|