An unbalanced approach to metric space searching

Proximity queries (the searching problem generalized beyond exact match) is mostly modeled as metric space. A metric space consists of a collection of objects and a distance function defined among them. The goal is to preprocess the data set (a slow procedure) to quickly answer proximity queries. Th...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Chávez, Edgar, Ludueña, Verónica, Reyes, Nora Susana
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2005
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/21087
Aporte de:
Descripción
Sumario:Proximity queries (the searching problem generalized beyond exact match) is mostly modeled as metric space. A metric space consists of a collection of objects and a distance function defined among them. The goal is to preprocess the data set (a slow procedure) to quickly answer proximity queries. This problem have received a lot of attention recently, specially in the pattern recognition community. The Excluded Middle Vantage Point Forest (VP–forest) is a data structure designed to search in high dimensional vector spaces. A VP–forest is built as a collection of balanced Vantage Point Trees (VP–trees). In this work we propose a novel two-fold approach for searching. Firstly we extend the VP– forest to search in metric spaces, and more importantly we test a counterintuitive modification to the VP–tree, namely to unbalance it. In exact searching an unbalanced data structure perform poorly, and most of the algorithmic effort is directed to obtain a balanced data structure. The unbalancing approach is motivated by a recent data structure (the List of Clusters ) specialized in high dimensional metric space searches, which is an extremely unbalanced data structure (a linked list) outperforming other approaches.