Transformer-based automatic music mood classification using multi-modal framework

According to studies, music affects our moods, and we are also inclined to choose a theme based on our current moods. Audio-based techniques can achieve promising results, but lyrics also give relevant information about the moods of a song which may not be present in the audio part. So a multi-modal...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Suresh Kumar, Sujeesha Ajithakumari, Rajan, Rajeev
Formato: Articulo
Lenguaje:Inglés
Publicado: 2023
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/152120
Aporte de:
id I19-R120-10915-152120
record_format dspace
spelling I19-R120-10915-1521202023-04-25T20:03:53Z http://sedici.unlp.edu.ar/handle/10915/152120 issn:1666-6038 Transformer-based automatic music mood classification using multi-modal framework Clasificación automática del estado de ánimo de la música basada en transformadores utilizando un marco multimodal Suresh Kumar, Sujeesha Ajithakumari Rajan, Rajeev 2023-04 2023-04-25T18:00:47Z en Ciencias Informáticas BERT Bidirectional GRU Music Selfattention Transformer Música Autoatención Transformador According to studies, music affects our moods, and we are also inclined to choose a theme based on our current moods. Audio-based techniques can achieve promising results, but lyrics also give relevant information about the moods of a song which may not be present in the audio part. So a multi-modal with both textual features and acoustic features can provide enhanced accuracy. Sequential networks such as long short-term memory networks (LSTM) and gated recurrent unit networks (GRU) are widely used in the most state-of-the-art natural language processing (NLP) models. A transformer model uses selfattention to compute representations of its inputs and outputs, unlike recurrent unit networks (RNNs) that use sequences and transformers that can parallelize over input positions during training. In this work, we proposed a multi-modal music mood classification system based on transformers and compared the system’s performance using a bi-directional GRU (Bi-GRU)- based system with and without attention. The performance is also analyzed for other state-of-the-art approaches. The proposed transformer-based model acquired higher accuracy than the Bi-GRU-based multimodal system with single-layer attention by providing a maximum accuracy of 77.94%. Según los estudios, la música afecta nuestro estado de ánimo y estamos también inclinados a elegir un tema basado en nuestros estados de ánimo actuales. basado en audio técnicas pueden lograr resultados prometedores, pero las letras también dan información sobre los estados de ánimo de una canción que puede no estar presente en la parte de audio Por lo tanto, un multimodal con características tanto textuales como acústicas puede proporcionar una mayor precisión. Redes secuenciales tales ya que las redes de memoria a -18- corto plazo (LSTM) y las redes de unidades recurrentes (GRU) son ampliamente utilizadas en el procesamiento de lenguaje natural (NLP) más avanzado. modelos Un modelo de transformador utiliza la atención propia para calcular las representaciones de sus entradas y salidas, a diferencia de las redes de unidades recurrentes (RNN) que utilizan secuencias y transformadores que pueden paralelizarse sobre las posiciones de entrada durante el entrenamiento. En este trabajo, propusimos un sistema de clasificación de estados de ánimo musicales multimodal basado en transformadores y comparamos el rendimiento del sistema usando un sistema bidireccional basado en GRU (Bi-GRU) con y sin atención. El rendimiento también se analiza para otros enfoques de vanguardia. El modelo basado en transformadores propuesto adquirió mayor precisión que el sistema multimodal basado en Bi-GRU con atención monocapa al proporcionar una precisión máxima del 77,94%. Facultad de Informática Articulo Articulo http://creativecommons.org/licenses/by-nc/4.0/ Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) application/pdf
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
BERT
Bidirectional GRU
Music
Selfattention
Transformer
Música
Autoatención
Transformador
spellingShingle Ciencias Informáticas
BERT
Bidirectional GRU
Music
Selfattention
Transformer
Música
Autoatención
Transformador
Suresh Kumar, Sujeesha Ajithakumari
Rajan, Rajeev
Transformer-based automatic music mood classification using multi-modal framework
topic_facet Ciencias Informáticas
BERT
Bidirectional GRU
Music
Selfattention
Transformer
Música
Autoatención
Transformador
description According to studies, music affects our moods, and we are also inclined to choose a theme based on our current moods. Audio-based techniques can achieve promising results, but lyrics also give relevant information about the moods of a song which may not be present in the audio part. So a multi-modal with both textual features and acoustic features can provide enhanced accuracy. Sequential networks such as long short-term memory networks (LSTM) and gated recurrent unit networks (GRU) are widely used in the most state-of-the-art natural language processing (NLP) models. A transformer model uses selfattention to compute representations of its inputs and outputs, unlike recurrent unit networks (RNNs) that use sequences and transformers that can parallelize over input positions during training. In this work, we proposed a multi-modal music mood classification system based on transformers and compared the system’s performance using a bi-directional GRU (Bi-GRU)- based system with and without attention. The performance is also analyzed for other state-of-the-art approaches. The proposed transformer-based model acquired higher accuracy than the Bi-GRU-based multimodal system with single-layer attention by providing a maximum accuracy of 77.94%.
format Articulo
Articulo
author Suresh Kumar, Sujeesha Ajithakumari
Rajan, Rajeev
author_facet Suresh Kumar, Sujeesha Ajithakumari
Rajan, Rajeev
author_sort Suresh Kumar, Sujeesha Ajithakumari
title Transformer-based automatic music mood classification using multi-modal framework
title_short Transformer-based automatic music mood classification using multi-modal framework
title_full Transformer-based automatic music mood classification using multi-modal framework
title_fullStr Transformer-based automatic music mood classification using multi-modal framework
title_full_unstemmed Transformer-based automatic music mood classification using multi-modal framework
title_sort transformer-based automatic music mood classification using multi-modal framework
publishDate 2023
url http://sedici.unlp.edu.ar/handle/10915/152120
work_keys_str_mv AT sureshkumarsujeeshaajithakumari transformerbasedautomaticmusicmoodclassificationusingmultimodalframework
AT rajanrajeev transformerbasedautomaticmusicmoodclassificationusingmultimodalframework
AT sureshkumarsujeeshaajithakumari clasificacionautomaticadelestadodeanimodelamusicabasadaentransformadoresutilizandounmarcomultimodal
AT rajanrajeev clasificacionautomaticadelestadodeanimodelamusicabasadaentransformadoresutilizandounmarcomultimodal
_version_ 1765660074924048384