Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams

With the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Rodríguez-Betancourt, Esteban, Casasola-Murillo, Edgar
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2024
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/177179
Aporte de:
id I19-R120-10915-177179
record_format dspace
spelling I19-R120-10915-1771792025-03-07T20:07:00Z http://sedici.unlp.edu.ar/handle/10915/177179 Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams Rodríguez-Betancourt, Esteban Casasola-Murillo, Edgar 2024-08 2024 2025-03-07T16:45:47Z en Ciencias Informáticas Databases Indexes Natural Language Processing Word Embeddings With the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications. Sociedad Argentina de Informática e Investigación Operativa Objeto de conferencia Objeto de conferencia http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) application/pdf 150-157
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
Databases
Indexes
Natural Language Processing
Word Embeddings
spellingShingle Ciencias Informáticas
Databases
Indexes
Natural Language Processing
Word Embeddings
Rodríguez-Betancourt, Esteban
Casasola-Murillo, Edgar
Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
topic_facet Ciencias Informáticas
Databases
Indexes
Natural Language Processing
Word Embeddings
description With the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications.
format Objeto de conferencia
Objeto de conferencia
author Rodríguez-Betancourt, Esteban
Casasola-Murillo, Edgar
author_facet Rodríguez-Betancourt, Esteban
Casasola-Murillo, Edgar
author_sort Rodríguez-Betancourt, Esteban
title Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title_short Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title_full Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title_fullStr Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title_full_unstemmed Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title_sort teaching sql new tricks: efficient vector indexing with trigrams
publishDate 2024
url http://sedici.unlp.edu.ar/handle/10915/177179
work_keys_str_mv AT rodriguezbetancourtesteban teachingsqlnewtricksefficientvectorindexingwithtrigrams
AT casasolamurilloedgar teachingsqlnewtricksefficientvectorindexingwithtrigrams
_version_ 1847925348812980224