Crop yield prediction with ensemble algorithms and Artificial Neural Networks (ANN)

World cereal production is set to grow by around 1% per year for the next decade, and while crop areas are not expanding, the major driver for the growth production is expected to come from yield improvements. Crop yields have been commonly modelled in two ways: process-based modelling (also known a...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Ruiz Moreno, Tobías
Otros Autores: Merener, Martín
Formato: Tesis de maestría acceptedVersion
Lenguaje:Inglés
Publicado: Universidad Torcuato Di Tella 2023
Materias:
Acceso en línea:https://repositorio.utdt.edu/handle/20.500.13098/11868
Aporte de:
id I57-R163-20.500.13098-11868
record_format dspace
spelling I57-R163-20.500.13098-118682023-06-07T07:33:49Z Crop yield prediction with ensemble algorithms and Artificial Neural Networks (ANN) Ruiz Moreno, Tobías Merener, Martín Predicción tecnológica Statistical modelo econométrico World Cereal production Yield improvements Machine Learning Random Forest Long Short-Term Memory (LSTM) Neural Networks Simulación de Monte Carlo World cereal production is set to grow by around 1% per year for the next decade, and while crop areas are not expanding, the major driver for the growth production is expected to come from yield improvements. Crop yields have been commonly modelled in two ways: process-based modelling (also known as crop simulation) and statistical modelling. Recently, machine learning started to deliver interesting results, mainly because it has the advantage of dealing with non-linear relationships between factors. Weather plays an important role in defining crop yields. Being able to simulate accurate weather conditions and predict crop yield has been an important topic in the industry. The objective of this work is to model crop yields using Random Forest regressor and Long Short-Term Memory (LSTM) Neural Networks (NN) in 9 annual crops in Argentina: wheat, barley, maize, soybean, sunflower, sorghum, rice, cotton and peanut. Soil and weather data was collected and transformed for 80 counties in Argentina. Hyperparameters for the 2 models were optimized and accuracy metrics were compared. Weather information was simulated estimating the distribution of the historical information using KDE (Kernel Density Estimator) and Monte Carlo to generate random sampling. Feature importance analysis allowed to reduce the number of factors up to 7 without compromising model accuracy. From the 9 crops studied, soybean, maize, sunflower, sorghum, wheat and barley models returned reasonable accuracy metrics. Except for the last two (wheat and barley) which are winter crops, the remaining 4 summer crops (soybean, maize, sorghum and sunflower) were forecasted simulating rainfall in different stages of the growing season and returned estimations with an error below 20% (MAPE) before harvest. Random forest outperformed classic MLR statistical model by more than 30% on average over all the crops, but overfitting was significantly high. LSTM did not perform as well as Random forest: although LSTM did not overfit, performance was slightly better than baseline with large variations between crops. This work demonstrates that machine learning algorithms are a competitive alternative to statistical modelling for crop yield prediction, and weather simulations can return reasonably accurate predictions before harvest. This allows the agricultural community to anticipate strategic decisions based on crop production forecasts. 2023-06-06T17:21:23Z 2023-06-06T17:21:23Z 2019 info:eu-repo/semantics/masterThesis info:ar-repo/semantics/tesis de maestría info:eu-repo/semantics/acceptedVersion https://repositorio.utdt.edu/handle/20.500.13098/11868 eng info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-sa/2.5/ar/ 100 p. application/pdf application/pdf Argentina Universidad Torcuato Di Tella
institution Universidad Torcuato Di Tella
institution_str I-57
repository_str R-163
collection Repositorio Digital Universidad Torcuato Di Tella
language Inglés
orig_language_str_mv eng
topic Predicción tecnológica
Statistical
modelo econométrico
World Cereal production
Yield improvements
Machine Learning
Random Forest
Long Short-Term Memory (LSTM) Neural Networks
Simulación de Monte Carlo
spellingShingle Predicción tecnológica
Statistical
modelo econométrico
World Cereal production
Yield improvements
Machine Learning
Random Forest
Long Short-Term Memory (LSTM) Neural Networks
Simulación de Monte Carlo
Ruiz Moreno, Tobías
Crop yield prediction with ensemble algorithms and Artificial Neural Networks (ANN)
topic_facet Predicción tecnológica
Statistical
modelo econométrico
World Cereal production
Yield improvements
Machine Learning
Random Forest
Long Short-Term Memory (LSTM) Neural Networks
Simulación de Monte Carlo
description World cereal production is set to grow by around 1% per year for the next decade, and while crop areas are not expanding, the major driver for the growth production is expected to come from yield improvements. Crop yields have been commonly modelled in two ways: process-based modelling (also known as crop simulation) and statistical modelling. Recently, machine learning started to deliver interesting results, mainly because it has the advantage of dealing with non-linear relationships between factors. Weather plays an important role in defining crop yields. Being able to simulate accurate weather conditions and predict crop yield has been an important topic in the industry. The objective of this work is to model crop yields using Random Forest regressor and Long Short-Term Memory (LSTM) Neural Networks (NN) in 9 annual crops in Argentina: wheat, barley, maize, soybean, sunflower, sorghum, rice, cotton and peanut. Soil and weather data was collected and transformed for 80 counties in Argentina. Hyperparameters for the 2 models were optimized and accuracy metrics were compared. Weather information was simulated estimating the distribution of the historical information using KDE (Kernel Density Estimator) and Monte Carlo to generate random sampling. Feature importance analysis allowed to reduce the number of factors up to 7 without compromising model accuracy. From the 9 crops studied, soybean, maize, sunflower, sorghum, wheat and barley models returned reasonable accuracy metrics. Except for the last two (wheat and barley) which are winter crops, the remaining 4 summer crops (soybean, maize, sorghum and sunflower) were forecasted simulating rainfall in different stages of the growing season and returned estimations with an error below 20% (MAPE) before harvest. Random forest outperformed classic MLR statistical model by more than 30% on average over all the crops, but overfitting was significantly high. LSTM did not perform as well as Random forest: although LSTM did not overfit, performance was slightly better than baseline with large variations between crops. This work demonstrates that machine learning algorithms are a competitive alternative to statistical modelling for crop yield prediction, and weather simulations can return reasonably accurate predictions before harvest. This allows the agricultural community to anticipate strategic decisions based on crop production forecasts.
author2 Merener, Martín
author_facet Merener, Martín
Ruiz Moreno, Tobías
format Tesis de maestría
Tesis de maestría
acceptedVersion
author Ruiz Moreno, Tobías
author_sort Ruiz Moreno, Tobías
title Crop yield prediction with ensemble algorithms and Artificial Neural Networks (ANN)
title_short Crop yield prediction with ensemble algorithms and Artificial Neural Networks (ANN)
title_full Crop yield prediction with ensemble algorithms and Artificial Neural Networks (ANN)
title_fullStr Crop yield prediction with ensemble algorithms and Artificial Neural Networks (ANN)
title_full_unstemmed Crop yield prediction with ensemble algorithms and Artificial Neural Networks (ANN)
title_sort crop yield prediction with ensemble algorithms and artificial neural networks (ann)
publisher Universidad Torcuato Di Tella
publishDate 2023
url https://repositorio.utdt.edu/handle/20.500.13098/11868
work_keys_str_mv AT ruizmorenotobias cropyieldpredictionwithensemblealgorithmsandartificialneuralnetworksann
_version_ 1768720880714645504