Automated biological sequence description by genetic multiobjective generalized clustering
Recent advances in the accessibility of databases containing representations of complex objects - exemplified by repositories of time-series data, information about biological macromolecules, or knowledge about metabolic pathways - have not been matched by availability of tools that facilitate the r...
Guardado en:
Autores principales: | , , |
---|---|
Formato: | SER |
Materias: | |
Acceso en línea: | http://hdl.handle.net/20.500.12110/paper_00778923_v980_n_p65_Zwir |
Aporte de: |
id |
todo:paper_00778923_v980_n_p65_Zwir |
---|---|
record_format |
dspace |
spelling |
todo:paper_00778923_v980_n_p65_Zwir2023-10-03T14:54:24Z Automated biological sequence description by genetic multiobjective generalized clustering Zwir, I. Zaliz, R.R. Ruspini, E.H. Biological DNA sequences Feature elicitation Generalized clustering Hierarchy of evolution programs Multiobjective genetic algorithms Pareto optimality Qualitative description accuracy automation calculation conference paper data base DNA sequence gene cluster gene sequence genetic algorithm information retrieval mathematical analysis nonhuman performance qualitative analysis sequence analysis short interspersed repeat Trypanosoma cruzi Protozoa Trypanosoma Trypanosoma cruzi Recent advances in the accessibility of databases containing representations of complex objects - exemplified by repositories of time-series data, information about biological macromolecules, or knowledge about metabolic pathways - have not been matched by availability of tools that facilitate the retrieval of objects of particular interest and aid understanding their structure and relations. In applications, such as the analysis of DNA sequences, on the other hand, requirements to retrieve objects on the basis of qualitative characteristics are poorly met by descriptions that emphasize precision and detail rather than structural features. This paper presents a method for identification of interesting qualitative features in biological sequences. Our approach relies on a generalized clustering methodology in which the features being sought correspond to the solutions of a multivariable, multiobjective optimization problem with features generally corresponding to fuzzy subsets of the object being represented. Foremost among the optimization objectives being considered are measures of the degree by which features resemble prototypical structures deemed to be interesting by database users. Other objectives include feature size and, in some cases, performance criteria related to domain-specific constraints. Genetic-algorithm methods are employed to solve the multiobjective optimization problem. These optimization algorithms discover candidate features as subsets of the object being described and that lie in the set of all Pareto-optimal solutions - of that problem. These candidate features are then summarized, employing again evolutionary-computation methods, and interrelated by employing domain-specific relations of interest to the end users. We present results of the application of this two-step method to the recognition and summarization of interesting features in DNA sequences of Tripanosoma cruzi. SER info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_00778923_v980_n_p65_Zwir |
institution |
Universidad de Buenos Aires |
institution_str |
I-28 |
repository_str |
R-134 |
collection |
Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) |
topic |
Biological DNA sequences Feature elicitation Generalized clustering Hierarchy of evolution programs Multiobjective genetic algorithms Pareto optimality Qualitative description accuracy automation calculation conference paper data base DNA sequence gene cluster gene sequence genetic algorithm information retrieval mathematical analysis nonhuman performance qualitative analysis sequence analysis short interspersed repeat Trypanosoma cruzi Protozoa Trypanosoma Trypanosoma cruzi |
spellingShingle |
Biological DNA sequences Feature elicitation Generalized clustering Hierarchy of evolution programs Multiobjective genetic algorithms Pareto optimality Qualitative description accuracy automation calculation conference paper data base DNA sequence gene cluster gene sequence genetic algorithm information retrieval mathematical analysis nonhuman performance qualitative analysis sequence analysis short interspersed repeat Trypanosoma cruzi Protozoa Trypanosoma Trypanosoma cruzi Zwir, I. Zaliz, R.R. Ruspini, E.H. Automated biological sequence description by genetic multiobjective generalized clustering |
topic_facet |
Biological DNA sequences Feature elicitation Generalized clustering Hierarchy of evolution programs Multiobjective genetic algorithms Pareto optimality Qualitative description accuracy automation calculation conference paper data base DNA sequence gene cluster gene sequence genetic algorithm information retrieval mathematical analysis nonhuman performance qualitative analysis sequence analysis short interspersed repeat Trypanosoma cruzi Protozoa Trypanosoma Trypanosoma cruzi |
description |
Recent advances in the accessibility of databases containing representations of complex objects - exemplified by repositories of time-series data, information about biological macromolecules, or knowledge about metabolic pathways - have not been matched by availability of tools that facilitate the retrieval of objects of particular interest and aid understanding their structure and relations. In applications, such as the analysis of DNA sequences, on the other hand, requirements to retrieve objects on the basis of qualitative characteristics are poorly met by descriptions that emphasize precision and detail rather than structural features. This paper presents a method for identification of interesting qualitative features in biological sequences. Our approach relies on a generalized clustering methodology in which the features being sought correspond to the solutions of a multivariable, multiobjective optimization problem with features generally corresponding to fuzzy subsets of the object being represented. Foremost among the optimization objectives being considered are measures of the degree by which features resemble prototypical structures deemed to be interesting by database users. Other objectives include feature size and, in some cases, performance criteria related to domain-specific constraints. Genetic-algorithm methods are employed to solve the multiobjective optimization problem. These optimization algorithms discover candidate features as subsets of the object being described and that lie in the set of all Pareto-optimal solutions - of that problem. These candidate features are then summarized, employing again evolutionary-computation methods, and interrelated by employing domain-specific relations of interest to the end users. We present results of the application of this two-step method to the recognition and summarization of interesting features in DNA sequences of Tripanosoma cruzi. |
format |
SER |
author |
Zwir, I. Zaliz, R.R. Ruspini, E.H. |
author_facet |
Zwir, I. Zaliz, R.R. Ruspini, E.H. |
author_sort |
Zwir, I. |
title |
Automated biological sequence description by genetic multiobjective generalized clustering |
title_short |
Automated biological sequence description by genetic multiobjective generalized clustering |
title_full |
Automated biological sequence description by genetic multiobjective generalized clustering |
title_fullStr |
Automated biological sequence description by genetic multiobjective generalized clustering |
title_full_unstemmed |
Automated biological sequence description by genetic multiobjective generalized clustering |
title_sort |
automated biological sequence description by genetic multiobjective generalized clustering |
url |
http://hdl.handle.net/20.500.12110/paper_00778923_v980_n_p65_Zwir |
work_keys_str_mv |
AT zwiri automatedbiologicalsequencedescriptionbygeneticmultiobjectivegeneralizedclustering AT zalizrr automatedbiologicalsequencedescriptionbygeneticmultiobjectivegeneralizedclustering AT ruspinieh automatedbiologicalsequencedescriptionbygeneticmultiobjectivegeneralizedclustering |
_version_ |
1807319231608389632 |