Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence

This thesis presents the design and implementation of a data-intensive framework to estimate CEO overconfidence by integrating multiple data sources and analytical methods. Drawing from traditional sentiment-based approaches and recent advances in generative artificial intelligence (GenAI), the proj...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Juárez, Lucio Ignacio
Formato: Tesis de maestría
Lenguaje:Inglés
Publicado: Universidad Torcuato Di Tella 2025
Materias:
Acceso en línea:https://repositorio.utdt.edu/handle/20.500.13098/13737
Aporte de:
id I57-R163-20.500.13098-13737
record_format dspace
spelling I57-R163-20.500.13098-137372025-10-22T05:03:10Z Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence Juárez, Lucio Ignacio Inteligencia artificial Innovación Análisis de datos Toma de decisiones Artificial intelligence Innovation Data analysis Decision making This thesis presents the design and implementation of a data-intensive framework to estimate CEO overconfidence by integrating multiple data sources and analytical methods. Drawing from traditional sentiment-based approaches and recent advances in generative artificial intelligence (GenAI), the project develops a dynamic and scalable index to classify CEO behavior based on press articles, structured financial disclosures, and contextual analysis. The research contributes both conceptually and practically by replicating existing overconfidence indicators—such as the Conf(Press) index—and extending them through natural language processing techniques that account for nuance, context, and industry-specific factors. The methodology combines structured data acquisition from sources like ProQuest’s TDM Studio, EBSCO, and The New York Times with sentiment scoring (VADER), keyword-based classifiers, and a GenAI-powered prompt framework. By applying these techniques to over 7,000 curated CEO-related articles, the thesis constructs a CEO Overconfidence Index that enables comparative analysis across sectors, particularly between innovation-driven and traditional industries. The resulting data product captures how overconfidence varies over time and in response to events, revealing both the limitations of static keyword methods and the added value of contextual AI models. Ultimately, this work contributes to the field of behavioral corporate finance by offering a novel pipeline to estimate executive psychological traits from textual data. It also provides a governance-relevant tool for investors, analysts, and policymakers to identify behavioral risk factors in leadership. While GenAI adds adaptability and interpretive depth, the thesis emphasizes that its primary value lies in the integration of classic and emerging methods into a unified, sector-aware overconfidence framework. Juárez, L. (2025) “Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence”. [Tesis de maestría. Universidad Torcuato Di Tella]. Repositorio Digital Universidad Torcuato Di Tella https://repositorio.utdt.edu/handle/20.500.13098/13737 Universidad Torcuato Di Tella 2025-10-21T22:22:07Z 2025 info:eu-repo/semantics/masterThesis https://repositorio.utdt.edu/handle/20.500.13098/13737 eng Tesis y Trabajos Finales de la Universidad Torcuato Di Tella info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/deed.es 57 p. application/pdf application/pdf
institution Universidad Torcuato Di Tella
institution_str I-57
repository_str R-163
collection Repositorio Digital Universidad Torcuato Di Tella
language Inglés
orig_language_str_mv eng
topic Inteligencia artificial
Innovación
Análisis de datos
Toma de decisiones
Artificial intelligence
Innovation
Data analysis
Decision making
spellingShingle Inteligencia artificial
Innovación
Análisis de datos
Toma de decisiones
Artificial intelligence
Innovation
Data analysis
Decision making
Juárez, Lucio Ignacio
Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence
topic_facet Inteligencia artificial
Innovación
Análisis de datos
Toma de decisiones
Artificial intelligence
Innovation
Data analysis
Decision making
description This thesis presents the design and implementation of a data-intensive framework to estimate CEO overconfidence by integrating multiple data sources and analytical methods. Drawing from traditional sentiment-based approaches and recent advances in generative artificial intelligence (GenAI), the project develops a dynamic and scalable index to classify CEO behavior based on press articles, structured financial disclosures, and contextual analysis. The research contributes both conceptually and practically by replicating existing overconfidence indicators—such as the Conf(Press) index—and extending them through natural language processing techniques that account for nuance, context, and industry-specific factors. The methodology combines structured data acquisition from sources like ProQuest’s TDM Studio, EBSCO, and The New York Times with sentiment scoring (VADER), keyword-based classifiers, and a GenAI-powered prompt framework. By applying these techniques to over 7,000 curated CEO-related articles, the thesis constructs a CEO Overconfidence Index that enables comparative analysis across sectors, particularly between innovation-driven and traditional industries. The resulting data product captures how overconfidence varies over time and in response to events, revealing both the limitations of static keyword methods and the added value of contextual AI models. Ultimately, this work contributes to the field of behavioral corporate finance by offering a novel pipeline to estimate executive psychological traits from textual data. It also provides a governance-relevant tool for investors, analysts, and policymakers to identify behavioral risk factors in leadership. While GenAI adds adaptability and interpretive depth, the thesis emphasizes that its primary value lies in the integration of classic and emerging methods into a unified, sector-aware overconfidence framework.
format Tesis de maestría
author Juárez, Lucio Ignacio
author_facet Juárez, Lucio Ignacio
author_sort Juárez, Lucio Ignacio
title Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence
title_short Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence
title_full Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence
title_fullStr Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence
title_full_unstemmed Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence
title_sort integrating a pipeline of diverse data sources to estimate ceo overconfidence
publisher Universidad Torcuato Di Tella
publishDate 2025
url https://repositorio.utdt.edu/handle/20.500.13098/13737
work_keys_str_mv AT juarezlucioignacio integratingapipelineofdiversedatasourcestoestimateceooverconfidence
_version_ 1847744063769411584