Turn-taking cues in task-oriented dialogue
As interactive voice response systems become more prevalent and provide increasingly more complex functionality, it becomes clear that the challenges facing such systems are not solely in their synthesis and recognition capabilities. Issues such as the coordination of turn exchanges between system a...
Guardado en:
Autores principales: | , |
---|---|
Formato: | JOUR |
Materias: | |
Acceso en línea: | http://hdl.handle.net/20.500.12110/paper_08852308_v25_n3_p601_Gravano |
Aporte de: |
id |
todo:paper_08852308_v25_n3_p601_Gravano |
---|---|
record_format |
dspace |
spelling |
todo:paper_08852308_v25_n3_p601_Gravano2023-10-03T15:40:43Z Turn-taking cues in task-oriented dialogue Gravano, A. Hirschberg, J. Dialogue IVR systems Prosody Turn-taking Back channels Columbia Dialogue Interactive voice response Interactive voice response systems IVR systems Prosody System usability Turn-taking Speech recognition As interactive voice response systems become more prevalent and provide increasingly more complex functionality, it becomes clear that the challenges facing such systems are not solely in their synthesis and recognition capabilities. Issues such as the coordination of turn exchanges between system and user also play an important role in system usability. In particular, both systems and users have difficulty determining when the other is taking or relinquishing the turn. In this paper, we seek to identify turn-taking cues correlated with human-human turn exchanges which are automatically computable. We compare the presence of potential prosodic, acoustic, and lexico-syntactic turn-yielding cues in prosodic phrases preceding turn changes (smooth switches) vs. turn retentions (holds) vs. backchannels in the Columbia Games Corpus, a large corpus of task-oriented dialogues, to determine which features reliably distinguish between these three. We identify seven turn-yielding cues, all of which can be extracted automatically, for future use in turn generation and recognition in interactive voice response (IVR) systems. Testing Duncan's (1972) hypothesis that these turn-yielding cues are linearly correlated with the occurrence of turn-taking attempts, we further demonstrate that, the greater the number of turn-yielding cues that are present, the greater the likelihood that a turn change will occur. We also identify six cues that precede backchannels, which will also be useful for IVR backchannel generation and recognition; these cues correlate with backchannel occurrence in a quadratic manner. We find similar results for overlapping and for non-overlapping speech. © 2010 Elsevier Ltd. All rights reserved. Fil:Gravano, A. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. JOUR info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_08852308_v25_n3_p601_Gravano |
institution |
Universidad de Buenos Aires |
institution_str |
I-28 |
repository_str |
R-134 |
collection |
Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) |
topic |
Dialogue IVR systems Prosody Turn-taking Back channels Columbia Dialogue Interactive voice response Interactive voice response systems IVR systems Prosody System usability Turn-taking Speech recognition |
spellingShingle |
Dialogue IVR systems Prosody Turn-taking Back channels Columbia Dialogue Interactive voice response Interactive voice response systems IVR systems Prosody System usability Turn-taking Speech recognition Gravano, A. Hirschberg, J. Turn-taking cues in task-oriented dialogue |
topic_facet |
Dialogue IVR systems Prosody Turn-taking Back channels Columbia Dialogue Interactive voice response Interactive voice response systems IVR systems Prosody System usability Turn-taking Speech recognition |
description |
As interactive voice response systems become more prevalent and provide increasingly more complex functionality, it becomes clear that the challenges facing such systems are not solely in their synthesis and recognition capabilities. Issues such as the coordination of turn exchanges between system and user also play an important role in system usability. In particular, both systems and users have difficulty determining when the other is taking or relinquishing the turn. In this paper, we seek to identify turn-taking cues correlated with human-human turn exchanges which are automatically computable. We compare the presence of potential prosodic, acoustic, and lexico-syntactic turn-yielding cues in prosodic phrases preceding turn changes (smooth switches) vs. turn retentions (holds) vs. backchannels in the Columbia Games Corpus, a large corpus of task-oriented dialogues, to determine which features reliably distinguish between these three. We identify seven turn-yielding cues, all of which can be extracted automatically, for future use in turn generation and recognition in interactive voice response (IVR) systems. Testing Duncan's (1972) hypothesis that these turn-yielding cues are linearly correlated with the occurrence of turn-taking attempts, we further demonstrate that, the greater the number of turn-yielding cues that are present, the greater the likelihood that a turn change will occur. We also identify six cues that precede backchannels, which will also be useful for IVR backchannel generation and recognition; these cues correlate with backchannel occurrence in a quadratic manner. We find similar results for overlapping and for non-overlapping speech. © 2010 Elsevier Ltd. All rights reserved. |
format |
JOUR |
author |
Gravano, A. Hirschberg, J. |
author_facet |
Gravano, A. Hirschberg, J. |
author_sort |
Gravano, A. |
title |
Turn-taking cues in task-oriented dialogue |
title_short |
Turn-taking cues in task-oriented dialogue |
title_full |
Turn-taking cues in task-oriented dialogue |
title_fullStr |
Turn-taking cues in task-oriented dialogue |
title_full_unstemmed |
Turn-taking cues in task-oriented dialogue |
title_sort |
turn-taking cues in task-oriented dialogue |
url |
http://hdl.handle.net/20.500.12110/paper_08852308_v25_n3_p601_Gravano |
work_keys_str_mv |
AT gravanoa turntakingcuesintaskorienteddialogue AT hirschbergj turntakingcuesintaskorienteddialogue |
_version_ |
1782030044064907264 |