Spoken language recognition based on senone posteriors

This paper explores in depth a recently proposed approach to spoken language recognition based on the estimated posteriors for a set of senones representing the phonetic space of one or more languages. A neural network (NN) is trained to estimate the posterior probabilities for the senones at a fram...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Ferrer, L.
Otros Autores: Lei, Y., McLaren, M., Scheffer, N., Chng E.S, Li H., Meng H., Ma B., Xie L., Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
Formato: Acta de conferencia Capítulo de libro
Lenguaje:Inglés
Publicado: International Speech and Communication Association 2014
Acceso en línea:Registro en Scopus
Handle
Registro en la Biblioteca Digital
Aporte de:Registro referencial: Solicitar el recurso aquí
LEADER 08396caa a22008297a 4500
001 PAPER-14654
003 AR-BaUEN
005 20230518204513.0
008 190411s2014 xx ||||fo|||| 10| 0 eng|d
024 7 |2 scopus  |a 2-s2.0-84910030410 
040 |a Scopus  |b spa  |c AR-BaUEN  |d AR-BaUEN 
100 1 |a Ferrer, L. 
245 1 0 |a Spoken language recognition based on senone posteriors 
260 |b International Speech and Communication Association  |c 2014 
270 1 0 |m Ferrer, L.; Speech Technology and Research Laboratory, SRI InternationalUnited States 
506 |2 openaire  |e Política editorial 
504 |a Haizhou, L., Bin, M., Kong, A.L., Spoken language recognition: From fundamentals to practice (2013) Proceedings of the IEEE 
504 |a Martinez, D.G., Plchot, O., Burget, L., Glembek, O., Matejka, P., Language recognition in ivectors space (2013) Proc. Inter Speech, , Lyon, France, Aug 
504 |a Matejka, P., Schwarz, P., Cernocky, J., Chytil, P., Phonotactic language identification using high quality phoneme recognition (2005) Interspeech-2005 
504 |a Shen, W., Campbell, W., Gleason, T., Reynolds, D., Singer, E., Experiments with lattice-based pprlm language identification (2006) Odyssey 2006 -The Speaker and Language Recognition Workshop, pp. 1-6 
504 |a Stolcke, A., Akbacak, M., Ferrer, L., Kajarekar, S., Richey, C., Scheffer, N., Shriberg, E., Improving language recognition with multilingual phone recognition and speaker adaptation transforms (2010) Proc. Odyssey-10, , Brno, Czech Republic, June 
504 |a D'Haro, L.F., Glembek, O., Plchot, O., Matejka, P., Soufifar, M., Cordoba, R., Cernocky, J., Phonotactic language recognition using i-vectors and phoneme posteriogram counts (2012) Interspeech-2012, pp. 42-45 
504 |a Diez, M., Varona, A., Penagarikano, M., Fuentes, L.J.R.-, Bordel, G., On the use of log-likelihood ratios as features in spoken language recognition (2012) IEEE Workshop on Spoken Language Technology (SLT 2012), , Miami, Florida, USA 
504 |a Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Odyssey-14, , Joensuu, Finland, June 
504 |a Young, S.J., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modelling (1994) HLT '94 Proceedings of the Workshop on Human Language Technology, pp. 307-312 
504 |a Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Kingsbury, B., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups (2012) Signal Processing Magazine, IEEE, 29 (6), pp. 82-97 
504 |a Dahl, G.E., Yu, D., Deng, L., Acero, A., Context dependent pre-trained deep neural networks for largevocabulary speech recognition (2012) IEEE Trans. ASLP, 20, pp. 30-42 
504 |a Lecun, Y., Bengio, Y., (1995) Convolutional Networks for Images, Speech, and Time-series, pp. 255-258. , MIT Press 
504 |a Lecun, Y., Bottou, L., Bengio, Y., Haffner, P., Gradient based learning applied to document recognition (1998) Proceedings of the IEEE, pp. 2278-2324 
504 |a Abdel-Hamid, O., Mohamed, A., Jiangy, H., Penn, G., Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition (2012) ICASSP-2012, pp. 4277-4280 
504 |a Sainath, T., Mohamed, A., Kingsbury, B., Ramabhadran, B., Deep convolutional neural networks for lvcsr (2013) ICASSP-2013, pp. 8614-8618 
504 |a Abdel-Hamid, O., Deng, L., Yu, D., Exploring convolutional neural network structures and optimization techniques for speech recognition (2013) Interspeech-2013, pp. 3366-3370 
504 |a Scheffer, N., Lei, Y., Ferrer, L., Factor analysis back ends for mllr transforms in speaker recognition (2013) Proc. Inter Speech, , Lyon, France, Aug 
504 |a Van Leeuwen, D.A., Brummer, N., Channel dependent gmm and multi-class logistic regression models for language recognition (2006) Proc. Odyssey-06, , Puerto Rico, USA, June 
504 |a Brummer, N., Van Leeuwen, D.A., On calibration of language recognition scores (2006) Proc. Odyssey-06, , Puerto Rico, USA, June 
504 |a Lawson, A., McLaren, M., Lei, Y., Mitra, V., Scheffer, N., Ferrer, L., Graciarena, M., Improving language identification robustness to highly channel-degraded speech through multiple system fusion (2013) Proc. Inter Speech, , Lyon, France, Aug 
504 |a Walker, K., Strassel, S., The rats radio traffic collection system (2012) Odyssey 2012: The Speaker and Language Recognition Workshop 
504 |a DARPA RATS Program, , http://www.darpa.mil/OurWork/I2O/Programs/RobustAutomaticTranscriptionofSpeech(RATS).aspx 
504 |a McLaren, M., Scheffer, N., Ferrer, L., Lei, Y., Effective use of dcts for contextualizing features for speaker recognition (2014) Proc. ICASSP, , Florence, May 
504 |a McLaren, M., Scheffer, N., Graciarena, M., Ferrer, L., Lei, Y., Improving speaker identification robustness to highly channel-degraded speech through multiple system fusion (2013) Proc. ICASSP, , Vancouver, May 
504 |a Kim, C., Stern, R.M., Power-normalized cepstral coefficients (pncc) for robust speech recognition (2012) Proc. ICASSP, , Kyoto, Mar 
504 |a NIST LRE09 Evaluation Plan, , http://www.itl.nist.gov/iad/mig/tests/lre/2009/LRE09EvalPlanv6.pdfA4 - Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat 
520 3 |a This paper explores in depth a recently proposed approach to spoken language recognition based on the estimated posteriors for a set of senones representing the phonetic space of one or more languages. A neural network (NN) is trained to estimate the posterior probabilities for the senones at a frame level. A feature vector is then derived for every sample using these posteriors. The effect of the language used in training the NN and the number of senones are studied. Speech-activity detection (SAD) and dimensionality reduction approaches are also explored and Gaussian and NN backends are compared. Results are presented on heavily degraded speech data. The proposed system is shown to give over 40% relative gain compared to a state-of-the-art language recognition system at sample durations from 3 to 120 seconds. Copyright © 2014 ISCA.  |l eng 
593 |a Speech Technology and Research Laboratory, SRI InternationalCA, United States 
593 |a Departamento de Computación, FCEN, Universidad de Buenos Aires and CONICET, Argentina 
690 1 0 |a SPEECH COMMUNICATION 
690 1 0 |a ACTIVITY DETECTION 
690 1 0 |a DIMENSIONALITY REDUCTION 
690 1 0 |a FEATURE VECTORS 
690 1 0 |a LANGUAGE RECOGNITION 
690 1 0 |a NEURAL NETWORK (NN) 
690 1 0 |a POSTERIOR PROBABILITY 
690 1 0 |a SPEECH DATA 
690 1 0 |a SPOKEN LANGUAGE RECOGNITION 
690 1 0 |a SPEECH RECOGNITION 
700 1 |a Lei, Y. 
700 1 |a McLaren, M. 
700 1 |a Scheffer, N. 
700 1 |a Chng E.S. 
700 1 |a Li H. 
700 1 |a Meng H. 
700 1 |a Ma B. 
700 1 |a Xie L. 
700 1 |a Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat 
711 2 |d 14 September 2014 through 18 September 2014  |g Código de la conferencia: 108771 
773 0 |d International Speech and Communication Association, 2014  |h pp. 2150-2154  |p Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH  |n Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH  |x 2308457X  |t 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 
856 4 1 |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-84910030410&partnerID=40&md5=b075fdac7d57e2798a0d973b5dd08e1b  |y Registro en Scopus 
856 4 0 |u https://hdl.handle.net/20.500.12110/paper_2308457X_v_n_p2150_Ferrer  |y Handle 
856 4 0 |u https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p2150_Ferrer  |y Registro en la Biblioteca Digital 
961 |a paper_2308457X_v_n_p2150_Ferrer  |b paper  |c PE 
962 |a info:eu-repo/semantics/conferenceObject  |a info:ar-repo/semantics/documento de conferencia  |b info:eu-repo/semantics/publishedVersion 
999 |c 75607