Robust estimation of multivariate location and scatter in the presence of missing data
Two main issues regarding data quality are data contamination (outliers) and data completion (missing data). These two problems have attracted much attention and research but surprisingly, they are seldom considered together. Popular robust methods such as S-estimators of multivariate location and s...
Guardado en:
| Autor principal: | |
|---|---|
| Otros Autores: | , |
| Formato: | Capítulo de libro |
| Lenguaje: | Inglés |
| Publicado: |
2012
|
| Acceso en línea: | Registro en Scopus DOI Handle Registro en la Biblioteca Digital |
| Aporte de: | Registro referencial: Solicitar el recurso aquí |
| LEADER | 08948caa a22007217a 4500 | ||
|---|---|---|---|
| 001 | PAPER-9064 | ||
| 003 | AR-BaUEN | ||
| 005 | 20230518203856.0 | ||
| 008 | 190411s2012 xx ||||fo|||| 00| 0 eng|d | ||
| 024 | 7 | |2 scopus |a 2-s2.0-84870669694 | |
| 040 | |a Scopus |b spa |c AR-BaUEN |d AR-BaUEN | ||
| 100 | 1 | |a Danilov, M. | |
| 245 | 1 | 0 | |a Robust estimation of multivariate location and scatter in the presence of missing data |
| 260 | |c 2012 | ||
| 270 | 1 | 0 | |m Danilov, M.; Quantitative Analyst at Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States; email: mikedanilov@google.com |
| 506 | |2 openaire |e Política editorial | ||
| 504 | |a Cheng, T.C., Victoria-Feser, M.P., High-Breakdown Estimation of Multivariate Mean and Covariance With Missing Observations (2002) British Journal of Mathematical and Statistical Psychology, 55, pp. 317-335. , [1178] | ||
| 504 | |a Copt, S., Victoria-Feser, M.P., (2003) Fast Algorithms For Computing High Breakdown Covariance Matrices With Missing Data, , Technical Report 2003.04, Universite de Geneve. [1184] | ||
| 504 | |a Croux, C., Filzmoser, P., Joossens, K., Classification Efficiencies for Robust Discriminant Analysis (2008) Statistica Sinica, 18, pp. 588-599. , [1178] | ||
| 504 | |a Danilov, M., (2010) Robust Estimation Or Multivariate Scatter Under Non- Affine Equivariant Scenarios, , Ph.D. dissertation, Department of Statistics, University of British Columbia. [1182] | ||
| 504 | |a Davies, P., Asymptotic Behaviour of S-Estimates of Multivariate Location Parameters and Dispersion Matrices (1987) The Annals of Statistics, 15, pp. 1269-1292. , [1178,1179,1181] | ||
| 504 | |a Dempster, A., Laird, N., Rubin, D., Maximum Likelihood From Incomplete Data via the EM Algorithm (1977) Journal of the Royal Statistical Society, Series B, 39, pp. 1-38. , [1178] | ||
| 504 | |a Frahma, G., Jaekel, U., A Generalization of Tyler's M-Estimators to the Case of Incomplete Data (2010) Computational Statistics and Data Analysis, 54, pp. 374-393. , [1178] | ||
| 504 | |a Harrison, D., Rubinfeld, D.L., Hedonic Housing Prices and the Demand for Clean Air (1978) Journal of Environmental Economics and Management, 5, pp. 81-102. , [1184] | ||
| 504 | |a Kenward, M.G., Molenberghs, G., Likelihood Based Frequentist Inference When Data Are Missing at Random (1998) Statistical Science, 13, pp. 236-247. , [1182] | ||
| 504 | |a Little, R.J.A., Robust Estimation of the Mean and Covariance Matrix From Data With Missing Values (1988) Journal of the Royal Statistical Society, Series C, 37, pp. 23-38. , [1178] | ||
| 504 | |a Little, R.J.A., Rubin, D.B., (2002) Statistical Analysis With Missing Data, , 2nd ed.), New York: Wiley, [1178,1182] | ||
| 504 | |a Little, R.J.A., Smith, P.J., Editing and Imputing for Quantitative Survey Data (1987) Journal of the American Statistical Association, 82, pp. 58-68. , [1178] | ||
| 504 | |a Maronna, R.A., Robust M-Estimators of Multivariate Location and Scatter (1976) The Annals of Statistics, 4, pp. 51-67. , [1178] | ||
| 504 | |a Maronna, R.A., Martin, R.D., Yohai, V.J., (2006) Robust Statistics: Theory and Methods, , Chichister: Wiley, [1178,1182,1184] | ||
| 504 | |a Rocke, D.M., Robustness Properties of S-Estimators of Multivariate Location and Shape in High Dimension (1996) The Annals of Statistics, 24, pp. 1327-1345. , [1178] | ||
| 504 | |a Rousseeuw, P., Multivariate Estimation With High Breakdown Point (1985) Mathematical Statistics and Applications, 8, pp. 283-297. , [1178] | ||
| 504 | |a Rousseeuw, P.J., van Driessen, K., A Fast Algorithm for the MinimumCovariance Determinant Estimator (1999) Technometrics, 41, pp. 212-223. , [1178] | ||
| 504 | |a Salibian-Barrera, M., van Aelst, S., Willems, G., PCA Based on Multivariate MM-Estimators With Fast and Robust Bootstrap (2006) Journal of the American Statistical Association, 101, pp. 1198-1211. , [1178,1181] | ||
| 504 | |a Schafer, J.L., (1997) Analysis of Incomplete Multivariate Data, , London: Chapman and Hall. [1182] | ||
| 504 | |a Tanner, M.A., (1993) Tools For Statistical Inference: Methods For the Exploration of Posterior Distributions and Likelihood Functions, , 2nd ed.), New York: Springer, [1182] | ||
| 504 | |a Taskinen, S., Croux, C., Kankainen, A., Ollila, E., Oja, H., Influence Functions and Efficiencies of the Canonical Correlation and Vector Estimates Based on Scatter and Shape Matrices (2006) Journal of Multivariate Analysis, 97, pp. 359-384. , [1178,1181] | ||
| 504 | |a Tatsuoka, K., Tyler, D., TheUniqueness of S and M-Functionals Under Non-Elliptical Distributions (2000) The Annals of Statistics, 28, pp. 1219-1243. , [1180,1181] | ||
| 504 | |a Templ, M., Kowarik, A., Filzmoser, P., Iterative Stepwise Regression Imputation Using Standard and Robust Methods (2011) Computational Statistics & Data Analysis, 55, pp. 2793-2806. , [1178] | ||
| 504 | |a Tyler, D., A Distribution-Free M-Estimator of Multivariate Scatter (1987) The Annals of Statistics, 15, pp. 234-251. , [1178] | ||
| 520 | 3 | |a Two main issues regarding data quality are data contamination (outliers) and data completion (missing data). These two problems have attracted much attention and research but surprisingly, they are seldom considered together. Popular robust methods such as S-estimators of multivariate location and scatter offer protection against outliers but cannot deal with missing data, except for the obviously inefficient approach of deleting all incomplete cases. We generalize the definition of S-estimators of multivariate location and scatter to simultaneously deal with missing data and outliers. We show that the proposed estimators are strongly consistent under elliptical models when data are missing completely at random. We derive an algorithm similar to the Expectation-Maximization algorithm for computing the proposed estimators. This algorithm is initialized by an extension for missing data of the minimum volume ellipsoid. We assess the performance of our proposal by Monte Carlo simulation and give some real data examples. This article has supplementary material online. © 2012 American Statistical Association. |l eng | |
| 536 | |a Detalles de la financiación: Natural Sciences and Engineering Research Council of Canada | ||
| 536 | |a Detalles de la financiación: Universidad de Buenos Aires, PIP 5505 | ||
| 536 | |a Detalles de la financiación: Agencia Nacional de Promoción Científica y Tecnológica | ||
| 536 | |a Detalles de la financiación: Consejo Nacional de Investigaciones Científicas y Técnicas, PICT 00899 | ||
| 536 | |a Detalles de la financiación: Mike Danilov is a Quantitative Analyst at Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA (E-mail: mikedanilov@google.com), Víctor J. Yohai is Emeritus Professor, Departamento de Matemáticas, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Univer-sitaria, Pabellón 1, 1428 Buenos Aires, Argentina (E-mail: vyohai@dm.uba.ar), and Ruben H. Zamar is Professor, Department of Statistics, University of British Columbia, 333-6356 Agricultural Road, Vancouver, BC V6T 1Z2, Canada (E-mail: ruben@stat.ubc.ca). This research was partially supported by grants X-018 and X-447 from the University of Buenos Aires, PIP 5505 from CONICET, PICT 00899 from ANPCyT, and Discovery grant from NSERC. We thank the Associate Editor and two referees for their comments and suggestions which resulted in several important improvements on the first version of this article. | ||
| 593 | |a Quantitative Analyst at Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States | ||
| 593 | |a Departamento de Matemáticas, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pabellón 1, 1428 Buenos Aires, Argentina | ||
| 593 | |a Department of Statistics, University of British Columbia, 333-6356 Agricultural Road, Vancouver, BC V6T 1Z2, Canada | ||
| 690 | 1 | 0 | |a CONSISTENT |
| 690 | 1 | 0 | |a ELLIPTICAL DISTRIBUTION |
| 690 | 1 | 0 | |a EM ALGORITHM |
| 690 | 1 | 0 | |a FIXED POINT EQUATION |
| 700 | 1 | |a Yohai, V.J. | |
| 700 | 1 | |a Zamar, R.H. | |
| 773 | 0 | |d 2012 |g v. 107 |h pp. 1178-1186 |k n. 499 |p J. Am. Stat. Assoc. |x 01621459 |w (AR-BaUEN)CENRE-264 |t Journal of the American Statistical Association | |
| 856 | 4 | 1 | |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-84870669694&doi=10.1080%2f01621459.2012.699792&partnerID=40&md5=95ea079940f449c735306e3ccff4de04 |y Registro en Scopus |
| 856 | 4 | 0 | |u https://doi.org/10.1080/01621459.2012.699792 |y DOI |
| 856 | 4 | 0 | |u https://hdl.handle.net/20.500.12110/paper_01621459_v107_n499_p1178_Danilov |y Handle |
| 856 | 4 | 0 | |u https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_01621459_v107_n499_p1178_Danilov |y Registro en la Biblioteca Digital |
| 961 | |a paper_01621459_v107_n499_p1178_Danilov |b paper |c PE | ||
| 962 | |a info:eu-repo/semantics/article |a info:ar-repo/semantics/artículo |b info:eu-repo/semantics/publishedVersion | ||
| 999 | |c 70017 | ||