Static analysis for optimizing big data queries

Query languages for big data analysis provide user extensibility through a mechanism of user-deined operators (UDOs). These operators allow programmers to write proprietary functionalities on top of a relational query skeleton. However, achieving efective query optimization for such languages is ext...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Garbervetsky, D., Pavlinovic, Z., Barnett, M., Musuvathi, M., Mytkowicz, T., Zoppi, E., Zisman A., Bodden E., Schafer W., van Deursen A., Special Interest Group on Software Engineering (ACM SIGSOFT)
Formato: CONF
Materias:
Acceso en línea:http://hdl.handle.net/20.500.12110/paper_97814503_vPartF130154_n_p932_Garbervetsky
Aporte de:
id todo:paper_97814503_vPartF130154_n_p932_Garbervetsky
record_format dspace
spelling todo:paper_97814503_vPartF130154_n_p932_Garbervetsky2023-10-03T16:43:17Z Static analysis for optimizing big data queries Garbervetsky, D. Pavlinovic, Z. Barnett, M. Musuvathi, M. Mytkowicz, T. Zoppi, E. Zisman A. Bodden E. Schafer W. van Deursen A. Special Interest Group on Software Engineering (ACM SIGSOFT) Big Data Query optimization Static analysis UDOs Query languages Software engineering Static analysis Data dependencies Data query Error prones Optimizers Query optimization Real-world Relational queries UDOs Big data Query languages for big data analysis provide user extensibility through a mechanism of user-deined operators (UDOs). These operators allow programmers to write proprietary functionalities on top of a relational query skeleton. However, achieving efective query optimization for such languages is extremely challenging since the optimizer needs to understand data dependencies induced by UDOs. SCOPE, the query language from Microsoft, allows for hand coded declarations of UDO data dependencies. Unfortunately, most programmers avoid using this facility since writing and maintaining the declarations is tedious and error-prone. In this work, we designed and implemented two sound and robust static analyses for computing UDO data dependencies. The analyses can detect what columns of an input table are never used or pass-through a UDO unchanged. This information can be used to signiicantly improve execution of SCOPE scripts. We evaluate our analyses on thousands of real-world queries and show we can catch many unused and pass-through columns automatically without relying on any manually provided declarations. © 2017 Association for Computing Machinery. CONF info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_97814503_vPartF130154_n_p932_Garbervetsky
institution Universidad de Buenos Aires
institution_str I-28
repository_str R-134
collection Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic Big Data
Query optimization
Static analysis
UDOs
Query languages
Software engineering
Static analysis
Data dependencies
Data query
Error prones
Optimizers
Query optimization
Real-world
Relational queries
UDOs
Big data
spellingShingle Big Data
Query optimization
Static analysis
UDOs
Query languages
Software engineering
Static analysis
Data dependencies
Data query
Error prones
Optimizers
Query optimization
Real-world
Relational queries
UDOs
Big data
Garbervetsky, D.
Pavlinovic, Z.
Barnett, M.
Musuvathi, M.
Mytkowicz, T.
Zoppi, E.
Zisman A.
Bodden E.
Schafer W.
van Deursen A.
Special Interest Group on Software Engineering (ACM SIGSOFT)
Static analysis for optimizing big data queries
topic_facet Big Data
Query optimization
Static analysis
UDOs
Query languages
Software engineering
Static analysis
Data dependencies
Data query
Error prones
Optimizers
Query optimization
Real-world
Relational queries
UDOs
Big data
description Query languages for big data analysis provide user extensibility through a mechanism of user-deined operators (UDOs). These operators allow programmers to write proprietary functionalities on top of a relational query skeleton. However, achieving efective query optimization for such languages is extremely challenging since the optimizer needs to understand data dependencies induced by UDOs. SCOPE, the query language from Microsoft, allows for hand coded declarations of UDO data dependencies. Unfortunately, most programmers avoid using this facility since writing and maintaining the declarations is tedious and error-prone. In this work, we designed and implemented two sound and robust static analyses for computing UDO data dependencies. The analyses can detect what columns of an input table are never used or pass-through a UDO unchanged. This information can be used to signiicantly improve execution of SCOPE scripts. We evaluate our analyses on thousands of real-world queries and show we can catch many unused and pass-through columns automatically without relying on any manually provided declarations. © 2017 Association for Computing Machinery.
format CONF
author Garbervetsky, D.
Pavlinovic, Z.
Barnett, M.
Musuvathi, M.
Mytkowicz, T.
Zoppi, E.
Zisman A.
Bodden E.
Schafer W.
van Deursen A.
Special Interest Group on Software Engineering (ACM SIGSOFT)
author_facet Garbervetsky, D.
Pavlinovic, Z.
Barnett, M.
Musuvathi, M.
Mytkowicz, T.
Zoppi, E.
Zisman A.
Bodden E.
Schafer W.
van Deursen A.
Special Interest Group on Software Engineering (ACM SIGSOFT)
author_sort Garbervetsky, D.
title Static analysis for optimizing big data queries
title_short Static analysis for optimizing big data queries
title_full Static analysis for optimizing big data queries
title_fullStr Static analysis for optimizing big data queries
title_full_unstemmed Static analysis for optimizing big data queries
title_sort static analysis for optimizing big data queries
url http://hdl.handle.net/20.500.12110/paper_97814503_vPartF130154_n_p932_Garbervetsky
work_keys_str_mv AT garbervetskyd staticanalysisforoptimizingbigdataqueries
AT pavlinovicz staticanalysisforoptimizingbigdataqueries
AT barnettm staticanalysisforoptimizingbigdataqueries
AT musuvathim staticanalysisforoptimizingbigdataqueries
AT mytkowiczt staticanalysisforoptimizingbigdataqueries
AT zoppie staticanalysisforoptimizingbigdataqueries
AT zismana staticanalysisforoptimizingbigdataqueries
AT boddene staticanalysisforoptimizingbigdataqueries
AT schaferw staticanalysisforoptimizingbigdataqueries
AT vandeursena staticanalysisforoptimizingbigdataqueries
AT specialinterestgrouponsoftwareengineeringacmsigsoft staticanalysisforoptimizingbigdataqueries
_version_ 1807314917178474496