Static analysis for optimizing big data queries
Query languages for big data analysis provide user extensibility through a mechanism of user-deined operators (UDOs). These operators allow programmers to write proprietary functionalities on top of a relational query skeleton. However, achieving efective query optimization for such languages is ext...
Guardado en:
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | CONF |
Materias: | |
Acceso en línea: | http://hdl.handle.net/20.500.12110/paper_97814503_vPartF130154_n_p932_Garbervetsky |
Aporte de: |
id |
todo:paper_97814503_vPartF130154_n_p932_Garbervetsky |
---|---|
record_format |
dspace |
spelling |
todo:paper_97814503_vPartF130154_n_p932_Garbervetsky2023-10-03T16:43:17Z Static analysis for optimizing big data queries Garbervetsky, D. Pavlinovic, Z. Barnett, M. Musuvathi, M. Mytkowicz, T. Zoppi, E. Zisman A. Bodden E. Schafer W. van Deursen A. Special Interest Group on Software Engineering (ACM SIGSOFT) Big Data Query optimization Static analysis UDOs Query languages Software engineering Static analysis Data dependencies Data query Error prones Optimizers Query optimization Real-world Relational queries UDOs Big data Query languages for big data analysis provide user extensibility through a mechanism of user-deined operators (UDOs). These operators allow programmers to write proprietary functionalities on top of a relational query skeleton. However, achieving efective query optimization for such languages is extremely challenging since the optimizer needs to understand data dependencies induced by UDOs. SCOPE, the query language from Microsoft, allows for hand coded declarations of UDO data dependencies. Unfortunately, most programmers avoid using this facility since writing and maintaining the declarations is tedious and error-prone. In this work, we designed and implemented two sound and robust static analyses for computing UDO data dependencies. The analyses can detect what columns of an input table are never used or pass-through a UDO unchanged. This information can be used to signiicantly improve execution of SCOPE scripts. We evaluate our analyses on thousands of real-world queries and show we can catch many unused and pass-through columns automatically without relying on any manually provided declarations. © 2017 Association for Computing Machinery. CONF info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_97814503_vPartF130154_n_p932_Garbervetsky |
institution |
Universidad de Buenos Aires |
institution_str |
I-28 |
repository_str |
R-134 |
collection |
Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) |
topic |
Big Data Query optimization Static analysis UDOs Query languages Software engineering Static analysis Data dependencies Data query Error prones Optimizers Query optimization Real-world Relational queries UDOs Big data |
spellingShingle |
Big Data Query optimization Static analysis UDOs Query languages Software engineering Static analysis Data dependencies Data query Error prones Optimizers Query optimization Real-world Relational queries UDOs Big data Garbervetsky, D. Pavlinovic, Z. Barnett, M. Musuvathi, M. Mytkowicz, T. Zoppi, E. Zisman A. Bodden E. Schafer W. van Deursen A. Special Interest Group on Software Engineering (ACM SIGSOFT) Static analysis for optimizing big data queries |
topic_facet |
Big Data Query optimization Static analysis UDOs Query languages Software engineering Static analysis Data dependencies Data query Error prones Optimizers Query optimization Real-world Relational queries UDOs Big data |
description |
Query languages for big data analysis provide user extensibility through a mechanism of user-deined operators (UDOs). These operators allow programmers to write proprietary functionalities on top of a relational query skeleton. However, achieving efective query optimization for such languages is extremely challenging since the optimizer needs to understand data dependencies induced by UDOs. SCOPE, the query language from Microsoft, allows for hand coded declarations of UDO data dependencies. Unfortunately, most programmers avoid using this facility since writing and maintaining the declarations is tedious and error-prone. In this work, we designed and implemented two sound and robust static analyses for computing UDO data dependencies. The analyses can detect what columns of an input table are never used or pass-through a UDO unchanged. This information can be used to signiicantly improve execution of SCOPE scripts. We evaluate our analyses on thousands of real-world queries and show we can catch many unused and pass-through columns automatically without relying on any manually provided declarations. © 2017 Association for Computing Machinery. |
format |
CONF |
author |
Garbervetsky, D. Pavlinovic, Z. Barnett, M. Musuvathi, M. Mytkowicz, T. Zoppi, E. Zisman A. Bodden E. Schafer W. van Deursen A. Special Interest Group on Software Engineering (ACM SIGSOFT) |
author_facet |
Garbervetsky, D. Pavlinovic, Z. Barnett, M. Musuvathi, M. Mytkowicz, T. Zoppi, E. Zisman A. Bodden E. Schafer W. van Deursen A. Special Interest Group on Software Engineering (ACM SIGSOFT) |
author_sort |
Garbervetsky, D. |
title |
Static analysis for optimizing big data queries |
title_short |
Static analysis for optimizing big data queries |
title_full |
Static analysis for optimizing big data queries |
title_fullStr |
Static analysis for optimizing big data queries |
title_full_unstemmed |
Static analysis for optimizing big data queries |
title_sort |
static analysis for optimizing big data queries |
url |
http://hdl.handle.net/20.500.12110/paper_97814503_vPartF130154_n_p932_Garbervetsky |
work_keys_str_mv |
AT garbervetskyd staticanalysisforoptimizingbigdataqueries AT pavlinovicz staticanalysisforoptimizingbigdataqueries AT barnettm staticanalysisforoptimizingbigdataqueries AT musuvathim staticanalysisforoptimizingbigdataqueries AT mytkowiczt staticanalysisforoptimizingbigdataqueries AT zoppie staticanalysisforoptimizingbigdataqueries AT zismana staticanalysisforoptimizingbigdataqueries AT boddene staticanalysisforoptimizingbigdataqueries AT schaferw staticanalysisforoptimizingbigdataqueries AT vandeursena staticanalysisforoptimizingbigdataqueries AT specialinterestgrouponsoftwareengineeringacmsigsoft staticanalysisforoptimizingbigdataqueries |
_version_ |
1807314917178474496 |