Speeding up the execution of a large number of statistical tests of independence

A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Schlüter, Federico, Bromberg, Facundo, Pérez, Diego Sebastián
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2010
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/152584
http://39jaiio.sadio.org.ar/sites/default/files/39jaiio-asai-05.pdf
Aporte de:
Descripción
Sumario:A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables from the data. In this work we present an intelligent cache of contingency tables that allows the tables stored to be reused not only for the same test, in the not uncommon case that the test must be performed again, but for an exponential number of other tests, all those involving a subset of the variables of the test stored. In practice, however, not so many tests actually reuse the tables stored. We show results when testing the cache with IBMAP-HC, a recently proposed algorithm for learning the structure of Markov networks, a.k.a. undirected graphical models. The experiments show that in all cases, above 95% of the running time spent by IBMAP-HC in reading data is saved by the cache. The savings in running time for IBMAP-HC were up to 80% for datasets above 40,000 datapoints.