Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence
This thesis presents the design and implementation of a data-intensive framework to estimate CEO overconfidence by integrating multiple data sources and analytical methods. Drawing from traditional sentiment-based approaches and recent advances in generative artificial intelligence (GenAI), the proj...
Guardado en:
| Autor principal: | |
|---|---|
| Formato: | Tesis de maestría |
| Lenguaje: | Inglés |
| Publicado: |
Universidad Torcuato Di Tella
2025
|
| Materias: | |
| Acceso en línea: | https://repositorio.utdt.edu/handle/20.500.13098/13737 |
| Aporte de: |
| id |
I57-R163-20.500.13098-13737 |
|---|---|
| record_format |
dspace |
| spelling |
I57-R163-20.500.13098-137372025-10-22T05:03:10Z Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence Juárez, Lucio Ignacio Inteligencia artificial Innovación Análisis de datos Toma de decisiones Artificial intelligence Innovation Data analysis Decision making This thesis presents the design and implementation of a data-intensive framework to estimate CEO overconfidence by integrating multiple data sources and analytical methods. Drawing from traditional sentiment-based approaches and recent advances in generative artificial intelligence (GenAI), the project develops a dynamic and scalable index to classify CEO behavior based on press articles, structured financial disclosures, and contextual analysis. The research contributes both conceptually and practically by replicating existing overconfidence indicators—such as the Conf(Press) index—and extending them through natural language processing techniques that account for nuance, context, and industry-specific factors. The methodology combines structured data acquisition from sources like ProQuest’s TDM Studio, EBSCO, and The New York Times with sentiment scoring (VADER), keyword-based classifiers, and a GenAI-powered prompt framework. By applying these techniques to over 7,000 curated CEO-related articles, the thesis constructs a CEO Overconfidence Index that enables comparative analysis across sectors, particularly between innovation-driven and traditional industries. The resulting data product captures how overconfidence varies over time and in response to events, revealing both the limitations of static keyword methods and the added value of contextual AI models. Ultimately, this work contributes to the field of behavioral corporate finance by offering a novel pipeline to estimate executive psychological traits from textual data. It also provides a governance-relevant tool for investors, analysts, and policymakers to identify behavioral risk factors in leadership. While GenAI adds adaptability and interpretive depth, the thesis emphasizes that its primary value lies in the integration of classic and emerging methods into a unified, sector-aware overconfidence framework. Juárez, L. (2025) “Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence”. [Tesis de maestría. Universidad Torcuato Di Tella]. Repositorio Digital Universidad Torcuato Di Tella https://repositorio.utdt.edu/handle/20.500.13098/13737 Universidad Torcuato Di Tella 2025-10-21T22:22:07Z 2025 info:eu-repo/semantics/masterThesis https://repositorio.utdt.edu/handle/20.500.13098/13737 eng Tesis y Trabajos Finales de la Universidad Torcuato Di Tella info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/deed.es 57 p. application/pdf application/pdf |
| institution |
Universidad Torcuato Di Tella |
| institution_str |
I-57 |
| repository_str |
R-163 |
| collection |
Repositorio Digital Universidad Torcuato Di Tella |
| language |
Inglés |
| orig_language_str_mv |
eng |
| topic |
Inteligencia artificial Innovación Análisis de datos Toma de decisiones Artificial intelligence Innovation Data analysis Decision making |
| spellingShingle |
Inteligencia artificial Innovación Análisis de datos Toma de decisiones Artificial intelligence Innovation Data analysis Decision making Juárez, Lucio Ignacio Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence |
| topic_facet |
Inteligencia artificial Innovación Análisis de datos Toma de decisiones Artificial intelligence Innovation Data analysis Decision making |
| description |
This thesis presents the design and implementation of a data-intensive framework to estimate CEO overconfidence by integrating multiple data sources and analytical methods. Drawing from traditional sentiment-based approaches and recent advances in generative artificial intelligence (GenAI), the project develops a dynamic and scalable index to classify CEO behavior based on press articles, structured financial disclosures, and contextual analysis. The research contributes both conceptually and practically by replicating existing overconfidence indicators—such as the Conf(Press) index—and extending them through natural language processing techniques that account for nuance, context, and industry-specific factors. The methodology combines structured data acquisition from sources like ProQuest’s TDM Studio, EBSCO, and The New York Times with sentiment scoring (VADER), keyword-based classifiers, and a GenAI-powered prompt framework. By applying these techniques to over 7,000 curated CEO-related articles, the thesis constructs a CEO Overconfidence Index that enables comparative analysis across sectors, particularly between innovation-driven and traditional industries. The resulting data product captures how overconfidence varies over time and in response to events, revealing both the limitations of static keyword methods and the added value of contextual AI models. Ultimately, this work contributes to the field of behavioral corporate finance by offering a novel pipeline to estimate executive psychological traits from textual data. It also provides a governance-relevant tool for investors, analysts, and policymakers to identify behavioral risk factors in leadership. While GenAI adds adaptability and interpretive depth, the thesis emphasizes that its primary value lies in the integration of classic and emerging methods into a unified, sector-aware overconfidence framework. |
| format |
Tesis de maestría |
| author |
Juárez, Lucio Ignacio |
| author_facet |
Juárez, Lucio Ignacio |
| author_sort |
Juárez, Lucio Ignacio |
| title |
Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence |
| title_short |
Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence |
| title_full |
Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence |
| title_fullStr |
Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence |
| title_full_unstemmed |
Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence |
| title_sort |
integrating a pipeline of diverse data sources to estimate ceo overconfidence |
| publisher |
Universidad Torcuato Di Tella |
| publishDate |
2025 |
| url |
https://repositorio.utdt.edu/handle/20.500.13098/13737 |
| work_keys_str_mv |
AT juarezlucioignacio integratingapipelineofdiversedatasourcestoestimateceooverconfidence |
| _version_ |
1847744063769411584 |