Design and implementation of ETL processes using BPMN and relational algebra

"Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. The Business Process Modeling and Notation (BPMN) has been proposed for expressing ETL processes...

Descripción completa

Detalles Bibliográficos
Autores principales: Awiti, Judith, Vaisman, Alejandro Ariel, Zimányi, Esteban
Formato: Artículos de Publicaciones Periódicas
Lenguaje:Inglés
Publicado: 2020
Materias:
ETL
Acceso en línea:http://ri.itba.edu.ar/handle/123456789/3080
Aporte de:
id I32-R138-123456789-3080
record_format dspace
spelling I32-R138-123456789-30802022-12-07T13:05:48Z Design and implementation of ETL processes using BPMN and relational algebra Awiti, Judith Vaisman, Alejandro Ariel Zimányi, Esteban ALMACENES DE DATOS OLAP ETL BPMN "Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. The Business Process Modeling and Notation (BPMN) has been proposed for expressing ETL processes at a conceptual level. A different approach is studied in this paper, where relational algebra (RA), extended with update operations, is used for specifying ETL processes. In this approach, data tasks in an ETL workflow can be automatically translated into SQL queries to be executed over a DBMS. To illustrate this study, the paper addresses the problem of updating Slowly Changing Dimensions (SCDs) with dependencies, that is, the case when updating a SCD table impacts on associated SCD tables. Tackling this problem requires extending the classic RA with update operations. The paper also shows the implementation of a portion of the TPC-DI benchmark that results from both approaches. Thus, the paper presents three implementations: (a) An SQL implementation based on the extended RA-based specification of an ETL process expressed in BPMN4ETL; and (b) Two implementations of workflows that follow from BPMN4ETL, one that uses the Pentaho DI tool, and another one that uses Talend Open Studio for DI. Experiments over these implementations of the TPC-DI benchmark for different scale factors were carried out, and are described and discussed in the paper, showing that the extended RA approach results in more efficient processes than the ones produced by implementing the BPMN4ETL specification over the mentioned ETL tools. The reasons for this result are also discussed." 2020-09-28T20:01:47Z 2020-09-28T20:01:47Z 2020-06-13 Artículos de Publicaciones Periódicas 0169-023X http://ri.itba.edu.ar/handle/123456789/3080 en info:eu-repo/semantics/altIdentifier/10.1016 / j.datak.2020.101837 info:eu-repo/semantics/acceptedVersion info:eu-repo/grantAgreement/EC/EMJDs/IT4BI-DC/ BE. Bruselas info:eu-repo/grantAgreement/ANPCyT/PICT/2017-1054/AR. Ciudad Autónoma de Buenos Aires application/pdf
institution Instituto Tecnológico de Buenos Aires (ITBA)
institution_str I-32
repository_str R-138
collection Repositorio Institucional Instituto Tecnológico de Buenos Aires (ITBA)
language Inglés
topic ALMACENES DE DATOS
OLAP
ETL
BPMN
spellingShingle ALMACENES DE DATOS
OLAP
ETL
BPMN
Awiti, Judith
Vaisman, Alejandro Ariel
Zimányi, Esteban
Design and implementation of ETL processes using BPMN and relational algebra
topic_facet ALMACENES DE DATOS
OLAP
ETL
BPMN
description "Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. The Business Process Modeling and Notation (BPMN) has been proposed for expressing ETL processes at a conceptual level. A different approach is studied in this paper, where relational algebra (RA), extended with update operations, is used for specifying ETL processes. In this approach, data tasks in an ETL workflow can be automatically translated into SQL queries to be executed over a DBMS. To illustrate this study, the paper addresses the problem of updating Slowly Changing Dimensions (SCDs) with dependencies, that is, the case when updating a SCD table impacts on associated SCD tables. Tackling this problem requires extending the classic RA with update operations. The paper also shows the implementation of a portion of the TPC-DI benchmark that results from both approaches. Thus, the paper presents three implementations: (a) An SQL implementation based on the extended RA-based specification of an ETL process expressed in BPMN4ETL; and (b) Two implementations of workflows that follow from BPMN4ETL, one that uses the Pentaho DI tool, and another one that uses Talend Open Studio for DI. Experiments over these implementations of the TPC-DI benchmark for different scale factors were carried out, and are described and discussed in the paper, showing that the extended RA approach results in more efficient processes than the ones produced by implementing the BPMN4ETL specification over the mentioned ETL tools. The reasons for this result are also discussed."
format Artículos de Publicaciones Periódicas
author Awiti, Judith
Vaisman, Alejandro Ariel
Zimányi, Esteban
author_facet Awiti, Judith
Vaisman, Alejandro Ariel
Zimányi, Esteban
author_sort Awiti, Judith
title Design and implementation of ETL processes using BPMN and relational algebra
title_short Design and implementation of ETL processes using BPMN and relational algebra
title_full Design and implementation of ETL processes using BPMN and relational algebra
title_fullStr Design and implementation of ETL processes using BPMN and relational algebra
title_full_unstemmed Design and implementation of ETL processes using BPMN and relational algebra
title_sort design and implementation of etl processes using bpmn and relational algebra
publishDate 2020
url http://ri.itba.edu.ar/handle/123456789/3080
work_keys_str_mv AT awitijudith designandimplementationofetlprocessesusingbpmnandrelationalalgebra
AT vaismanalejandroariel designandimplementationofetlprocessesusingbpmnandrelationalalgebra
AT zimanyiesteban designandimplementationofetlprocessesusingbpmnandrelationalalgebra
_version_ 1765660772536418304