Design and implementation of ETL processes using BPMN and relational algebra
"Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. The Business Process Modeling and Notation (BPMN) has been proposed for expressing ETL processes...
Autores principales: | , , |
---|---|
Formato: | Artículos de Publicaciones Periódicas |
Lenguaje: | Inglés |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | http://ri.itba.edu.ar/handle/123456789/3080 |
Aporte de: |
id |
I32-R138-123456789-3080 |
---|---|
record_format |
dspace |
spelling |
I32-R138-123456789-30802022-12-07T13:05:48Z Design and implementation of ETL processes using BPMN and relational algebra Awiti, Judith Vaisman, Alejandro Ariel Zimányi, Esteban ALMACENES DE DATOS OLAP ETL BPMN "Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. The Business Process Modeling and Notation (BPMN) has been proposed for expressing ETL processes at a conceptual level. A different approach is studied in this paper, where relational algebra (RA), extended with update operations, is used for specifying ETL processes. In this approach, data tasks in an ETL workflow can be automatically translated into SQL queries to be executed over a DBMS. To illustrate this study, the paper addresses the problem of updating Slowly Changing Dimensions (SCDs) with dependencies, that is, the case when updating a SCD table impacts on associated SCD tables. Tackling this problem requires extending the classic RA with update operations. The paper also shows the implementation of a portion of the TPC-DI benchmark that results from both approaches. Thus, the paper presents three implementations: (a) An SQL implementation based on the extended RA-based specification of an ETL process expressed in BPMN4ETL; and (b) Two implementations of workflows that follow from BPMN4ETL, one that uses the Pentaho DI tool, and another one that uses Talend Open Studio for DI. Experiments over these implementations of the TPC-DI benchmark for different scale factors were carried out, and are described and discussed in the paper, showing that the extended RA approach results in more efficient processes than the ones produced by implementing the BPMN4ETL specification over the mentioned ETL tools. The reasons for this result are also discussed." 2020-09-28T20:01:47Z 2020-09-28T20:01:47Z 2020-06-13 Artículos de Publicaciones Periódicas 0169-023X http://ri.itba.edu.ar/handle/123456789/3080 en info:eu-repo/semantics/altIdentifier/10.1016 / j.datak.2020.101837 info:eu-repo/semantics/acceptedVersion info:eu-repo/grantAgreement/EC/EMJDs/IT4BI-DC/ BE. Bruselas info:eu-repo/grantAgreement/ANPCyT/PICT/2017-1054/AR. Ciudad Autónoma de Buenos Aires application/pdf |
institution |
Instituto Tecnológico de Buenos Aires (ITBA) |
institution_str |
I-32 |
repository_str |
R-138 |
collection |
Repositorio Institucional Instituto Tecnológico de Buenos Aires (ITBA) |
language |
Inglés |
topic |
ALMACENES DE DATOS OLAP ETL BPMN |
spellingShingle |
ALMACENES DE DATOS OLAP ETL BPMN Awiti, Judith Vaisman, Alejandro Ariel Zimányi, Esteban Design and implementation of ETL processes using BPMN and relational algebra |
topic_facet |
ALMACENES DE DATOS OLAP ETL BPMN |
description |
"Extraction, transformation, and loading (ETL) processes are used to extract data from internal
and external sources of an organization, transform these data, and load them into a data
warehouse. The Business Process Modeling and Notation (BPMN) has been proposed for
expressing ETL processes at a conceptual level. A different approach is studied in this paper,
where relational algebra (RA), extended with update operations, is used for specifying ETL
processes. In this approach, data tasks in an ETL workflow can be automatically translated
into SQL queries to be executed over a DBMS. To illustrate this study, the paper addresses the
problem of updating Slowly Changing Dimensions (SCDs) with dependencies, that is, the case
when updating a SCD table impacts on associated SCD tables. Tackling this problem requires
extending the classic RA with update operations. The paper also shows the implementation
of a portion of the TPC-DI benchmark that results from both approaches. Thus, the paper
presents three implementations: (a) An SQL implementation based on the extended RA-based
specification of an ETL process expressed in BPMN4ETL; and (b) Two implementations of
workflows that follow from BPMN4ETL, one that uses the Pentaho DI tool, and another one
that uses Talend Open Studio for DI. Experiments over these implementations of the TPC-DI
benchmark for different scale factors were carried out, and are described and discussed in the
paper, showing that the extended RA approach results in more efficient processes than the ones
produced by implementing the BPMN4ETL specification over the mentioned ETL tools. The
reasons for this result are also discussed." |
format |
Artículos de Publicaciones Periódicas |
author |
Awiti, Judith Vaisman, Alejandro Ariel Zimányi, Esteban |
author_facet |
Awiti, Judith Vaisman, Alejandro Ariel Zimányi, Esteban |
author_sort |
Awiti, Judith |
title |
Design and implementation of ETL processes using BPMN and relational algebra |
title_short |
Design and implementation of ETL processes using BPMN and relational algebra |
title_full |
Design and implementation of ETL processes using BPMN and relational algebra |
title_fullStr |
Design and implementation of ETL processes using BPMN and relational algebra |
title_full_unstemmed |
Design and implementation of ETL processes using BPMN and relational algebra |
title_sort |
design and implementation of etl processes using bpmn and relational algebra |
publishDate |
2020 |
url |
http://ri.itba.edu.ar/handle/123456789/3080 |
work_keys_str_mv |
AT awitijudith designandimplementationofetlprocessesusingbpmnandrelationalalgebra AT vaismanalejandroariel designandimplementationofetlprocessesusingbpmnandrelationalalgebra AT zimanyiesteban designandimplementationofetlprocessesusingbpmnandrelationalalgebra |
_version_ |
1765660772536418304 |