Spoken language recognition based on senone posteriors
This paper explores in depth a recently proposed approach to spoken language recognition based on the estimated posteriors for a set of senones representing the phonetic space of one or more languages. A neural network (NN) is trained to estimate the posterior probabilities for the senones at a fram...
Guardado en:
| Autor principal: | |
|---|---|
| Otros Autores: | , , , , , , , , |
| Formato: | Acta de conferencia Capítulo de libro |
| Lenguaje: | Inglés |
| Publicado: |
International Speech and Communication Association
2014
|
| Acceso en línea: | Registro en Scopus Handle Registro en la Biblioteca Digital |
| Aporte de: | Registro referencial: Solicitar el recurso aquí |
| LEADER | 08396caa a22008297a 4500 | ||
|---|---|---|---|
| 001 | PAPER-14654 | ||
| 003 | AR-BaUEN | ||
| 005 | 20230518204513.0 | ||
| 008 | 190411s2014 xx ||||fo|||| 10| 0 eng|d | ||
| 024 | 7 | |2 scopus |a 2-s2.0-84910030410 | |
| 040 | |a Scopus |b spa |c AR-BaUEN |d AR-BaUEN | ||
| 100 | 1 | |a Ferrer, L. | |
| 245 | 1 | 0 | |a Spoken language recognition based on senone posteriors |
| 260 | |b International Speech and Communication Association |c 2014 | ||
| 270 | 1 | 0 | |m Ferrer, L.; Speech Technology and Research Laboratory, SRI InternationalUnited States |
| 506 | |2 openaire |e Política editorial | ||
| 504 | |a Haizhou, L., Bin, M., Kong, A.L., Spoken language recognition: From fundamentals to practice (2013) Proceedings of the IEEE | ||
| 504 | |a Martinez, D.G., Plchot, O., Burget, L., Glembek, O., Matejka, P., Language recognition in ivectors space (2013) Proc. Inter Speech, , Lyon, France, Aug | ||
| 504 | |a Matejka, P., Schwarz, P., Cernocky, J., Chytil, P., Phonotactic language identification using high quality phoneme recognition (2005) Interspeech-2005 | ||
| 504 | |a Shen, W., Campbell, W., Gleason, T., Reynolds, D., Singer, E., Experiments with lattice-based pprlm language identification (2006) Odyssey 2006 -The Speaker and Language Recognition Workshop, pp. 1-6 | ||
| 504 | |a Stolcke, A., Akbacak, M., Ferrer, L., Kajarekar, S., Richey, C., Scheffer, N., Shriberg, E., Improving language recognition with multilingual phone recognition and speaker adaptation transforms (2010) Proc. Odyssey-10, , Brno, Czech Republic, June | ||
| 504 | |a D'Haro, L.F., Glembek, O., Plchot, O., Matejka, P., Soufifar, M., Cordoba, R., Cernocky, J., Phonotactic language recognition using i-vectors and phoneme posteriogram counts (2012) Interspeech-2012, pp. 42-45 | ||
| 504 | |a Diez, M., Varona, A., Penagarikano, M., Fuentes, L.J.R.-, Bordel, G., On the use of log-likelihood ratios as features in spoken language recognition (2012) IEEE Workshop on Spoken Language Technology (SLT 2012), , Miami, Florida, USA | ||
| 504 | |a Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Odyssey-14, , Joensuu, Finland, June | ||
| 504 | |a Young, S.J., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modelling (1994) HLT '94 Proceedings of the Workshop on Human Language Technology, pp. 307-312 | ||
| 504 | |a Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Kingsbury, B., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups (2012) Signal Processing Magazine, IEEE, 29 (6), pp. 82-97 | ||
| 504 | |a Dahl, G.E., Yu, D., Deng, L., Acero, A., Context dependent pre-trained deep neural networks for largevocabulary speech recognition (2012) IEEE Trans. ASLP, 20, pp. 30-42 | ||
| 504 | |a Lecun, Y., Bengio, Y., (1995) Convolutional Networks for Images, Speech, and Time-series, pp. 255-258. , MIT Press | ||
| 504 | |a Lecun, Y., Bottou, L., Bengio, Y., Haffner, P., Gradient based learning applied to document recognition (1998) Proceedings of the IEEE, pp. 2278-2324 | ||
| 504 | |a Abdel-Hamid, O., Mohamed, A., Jiangy, H., Penn, G., Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition (2012) ICASSP-2012, pp. 4277-4280 | ||
| 504 | |a Sainath, T., Mohamed, A., Kingsbury, B., Ramabhadran, B., Deep convolutional neural networks for lvcsr (2013) ICASSP-2013, pp. 8614-8618 | ||
| 504 | |a Abdel-Hamid, O., Deng, L., Yu, D., Exploring convolutional neural network structures and optimization techniques for speech recognition (2013) Interspeech-2013, pp. 3366-3370 | ||
| 504 | |a Scheffer, N., Lei, Y., Ferrer, L., Factor analysis back ends for mllr transforms in speaker recognition (2013) Proc. Inter Speech, , Lyon, France, Aug | ||
| 504 | |a Van Leeuwen, D.A., Brummer, N., Channel dependent gmm and multi-class logistic regression models for language recognition (2006) Proc. Odyssey-06, , Puerto Rico, USA, June | ||
| 504 | |a Brummer, N., Van Leeuwen, D.A., On calibration of language recognition scores (2006) Proc. Odyssey-06, , Puerto Rico, USA, June | ||
| 504 | |a Lawson, A., McLaren, M., Lei, Y., Mitra, V., Scheffer, N., Ferrer, L., Graciarena, M., Improving language identification robustness to highly channel-degraded speech through multiple system fusion (2013) Proc. Inter Speech, , Lyon, France, Aug | ||
| 504 | |a Walker, K., Strassel, S., The rats radio traffic collection system (2012) Odyssey 2012: The Speaker and Language Recognition Workshop | ||
| 504 | |a DARPA RATS Program, , http://www.darpa.mil/OurWork/I2O/Programs/RobustAutomaticTranscriptionofSpeech(RATS).aspx | ||
| 504 | |a McLaren, M., Scheffer, N., Ferrer, L., Lei, Y., Effective use of dcts for contextualizing features for speaker recognition (2014) Proc. ICASSP, , Florence, May | ||
| 504 | |a McLaren, M., Scheffer, N., Graciarena, M., Ferrer, L., Lei, Y., Improving speaker identification robustness to highly channel-degraded speech through multiple system fusion (2013) Proc. ICASSP, , Vancouver, May | ||
| 504 | |a Kim, C., Stern, R.M., Power-normalized cepstral coefficients (pncc) for robust speech recognition (2012) Proc. ICASSP, , Kyoto, Mar | ||
| 504 | |a NIST LRE09 Evaluation Plan, , http://www.itl.nist.gov/iad/mig/tests/lre/2009/LRE09EvalPlanv6.pdfA4 - Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat | ||
| 520 | 3 | |a This paper explores in depth a recently proposed approach to spoken language recognition based on the estimated posteriors for a set of senones representing the phonetic space of one or more languages. A neural network (NN) is trained to estimate the posterior probabilities for the senones at a frame level. A feature vector is then derived for every sample using these posteriors. The effect of the language used in training the NN and the number of senones are studied. Speech-activity detection (SAD) and dimensionality reduction approaches are also explored and Gaussian and NN backends are compared. Results are presented on heavily degraded speech data. The proposed system is shown to give over 40% relative gain compared to a state-of-the-art language recognition system at sample durations from 3 to 120 seconds. Copyright © 2014 ISCA. |l eng | |
| 593 | |a Speech Technology and Research Laboratory, SRI InternationalCA, United States | ||
| 593 | |a Departamento de Computación, FCEN, Universidad de Buenos Aires and CONICET, Argentina | ||
| 690 | 1 | 0 | |a SPEECH COMMUNICATION |
| 690 | 1 | 0 | |a ACTIVITY DETECTION |
| 690 | 1 | 0 | |a DIMENSIONALITY REDUCTION |
| 690 | 1 | 0 | |a FEATURE VECTORS |
| 690 | 1 | 0 | |a LANGUAGE RECOGNITION |
| 690 | 1 | 0 | |a NEURAL NETWORK (NN) |
| 690 | 1 | 0 | |a POSTERIOR PROBABILITY |
| 690 | 1 | 0 | |a SPEECH DATA |
| 690 | 1 | 0 | |a SPOKEN LANGUAGE RECOGNITION |
| 690 | 1 | 0 | |a SPEECH RECOGNITION |
| 700 | 1 | |a Lei, Y. | |
| 700 | 1 | |a McLaren, M. | |
| 700 | 1 | |a Scheffer, N. | |
| 700 | 1 | |a Chng E.S. | |
| 700 | 1 | |a Li H. | |
| 700 | 1 | |a Meng H. | |
| 700 | 1 | |a Ma B. | |
| 700 | 1 | |a Xie L. | |
| 700 | 1 | |a Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat | |
| 711 | 2 | |d 14 September 2014 through 18 September 2014 |g Código de la conferencia: 108771 | |
| 773 | 0 | |d International Speech and Communication Association, 2014 |h pp. 2150-2154 |p Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH |n Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |x 2308457X |t 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 | |
| 856 | 4 | 1 | |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-84910030410&partnerID=40&md5=b075fdac7d57e2798a0d973b5dd08e1b |y Registro en Scopus |
| 856 | 4 | 0 | |u https://hdl.handle.net/20.500.12110/paper_2308457X_v_n_p2150_Ferrer |y Handle |
| 856 | 4 | 0 | |u https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p2150_Ferrer |y Registro en la Biblioteca Digital |
| 961 | |a paper_2308457X_v_n_p2150_Ferrer |b paper |c PE | ||
| 962 | |a info:eu-repo/semantics/conferenceObject |a info:ar-repo/semantics/documento de conferencia |b info:eu-repo/semantics/publishedVersion | ||
| 999 | |c 75607 | ||