Automated biological sequence description by genetic multiobjective generalized clustering

Recent advances in the accessibility of databases containing representations of complex objects - exemplified by repositories of time-series data, information about biological macromolecules, or knowledge about metabolic pathways - have not been matched by availability of tools that facilitate the r...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Zwir, I., Zaliz, R.R., Ruspini, E.H.
Formato: SER
Materias:
Acceso en línea:http://hdl.handle.net/20.500.12110/paper_00778923_v980_n_p65_Zwir
Aporte de:
id todo:paper_00778923_v980_n_p65_Zwir
record_format dspace
spelling todo:paper_00778923_v980_n_p65_Zwir2023-10-03T14:54:24Z Automated biological sequence description by genetic multiobjective generalized clustering Zwir, I. Zaliz, R.R. Ruspini, E.H. Biological DNA sequences Feature elicitation Generalized clustering Hierarchy of evolution programs Multiobjective genetic algorithms Pareto optimality Qualitative description accuracy automation calculation conference paper data base DNA sequence gene cluster gene sequence genetic algorithm information retrieval mathematical analysis nonhuman performance qualitative analysis sequence analysis short interspersed repeat Trypanosoma cruzi Protozoa Trypanosoma Trypanosoma cruzi Recent advances in the accessibility of databases containing representations of complex objects - exemplified by repositories of time-series data, information about biological macromolecules, or knowledge about metabolic pathways - have not been matched by availability of tools that facilitate the retrieval of objects of particular interest and aid understanding their structure and relations. In applications, such as the analysis of DNA sequences, on the other hand, requirements to retrieve objects on the basis of qualitative characteristics are poorly met by descriptions that emphasize precision and detail rather than structural features. This paper presents a method for identification of interesting qualitative features in biological sequences. Our approach relies on a generalized clustering methodology in which the features being sought correspond to the solutions of a multivariable, multiobjective optimization problem with features generally corresponding to fuzzy subsets of the object being represented. Foremost among the optimization objectives being considered are measures of the degree by which features resemble prototypical structures deemed to be interesting by database users. Other objectives include feature size and, in some cases, performance criteria related to domain-specific constraints. Genetic-algorithm methods are employed to solve the multiobjective optimization problem. These optimization algorithms discover candidate features as subsets of the object being described and that lie in the set of all Pareto-optimal solutions - of that problem. These candidate features are then summarized, employing again evolutionary-computation methods, and interrelated by employing domain-specific relations of interest to the end users. We present results of the application of this two-step method to the recognition and summarization of interesting features in DNA sequences of Tripanosoma cruzi. SER info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_00778923_v980_n_p65_Zwir
institution Universidad de Buenos Aires
institution_str I-28
repository_str R-134
collection Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic Biological DNA sequences
Feature elicitation
Generalized clustering
Hierarchy of evolution programs
Multiobjective genetic algorithms
Pareto optimality
Qualitative description
accuracy
automation
calculation
conference paper
data base
DNA sequence
gene cluster
gene sequence
genetic algorithm
information retrieval
mathematical analysis
nonhuman
performance
qualitative analysis
sequence analysis
short interspersed repeat
Trypanosoma cruzi
Protozoa
Trypanosoma
Trypanosoma cruzi
spellingShingle Biological DNA sequences
Feature elicitation
Generalized clustering
Hierarchy of evolution programs
Multiobjective genetic algorithms
Pareto optimality
Qualitative description
accuracy
automation
calculation
conference paper
data base
DNA sequence
gene cluster
gene sequence
genetic algorithm
information retrieval
mathematical analysis
nonhuman
performance
qualitative analysis
sequence analysis
short interspersed repeat
Trypanosoma cruzi
Protozoa
Trypanosoma
Trypanosoma cruzi
Zwir, I.
Zaliz, R.R.
Ruspini, E.H.
Automated biological sequence description by genetic multiobjective generalized clustering
topic_facet Biological DNA sequences
Feature elicitation
Generalized clustering
Hierarchy of evolution programs
Multiobjective genetic algorithms
Pareto optimality
Qualitative description
accuracy
automation
calculation
conference paper
data base
DNA sequence
gene cluster
gene sequence
genetic algorithm
information retrieval
mathematical analysis
nonhuman
performance
qualitative analysis
sequence analysis
short interspersed repeat
Trypanosoma cruzi
Protozoa
Trypanosoma
Trypanosoma cruzi
description Recent advances in the accessibility of databases containing representations of complex objects - exemplified by repositories of time-series data, information about biological macromolecules, or knowledge about metabolic pathways - have not been matched by availability of tools that facilitate the retrieval of objects of particular interest and aid understanding their structure and relations. In applications, such as the analysis of DNA sequences, on the other hand, requirements to retrieve objects on the basis of qualitative characteristics are poorly met by descriptions that emphasize precision and detail rather than structural features. This paper presents a method for identification of interesting qualitative features in biological sequences. Our approach relies on a generalized clustering methodology in which the features being sought correspond to the solutions of a multivariable, multiobjective optimization problem with features generally corresponding to fuzzy subsets of the object being represented. Foremost among the optimization objectives being considered are measures of the degree by which features resemble prototypical structures deemed to be interesting by database users. Other objectives include feature size and, in some cases, performance criteria related to domain-specific constraints. Genetic-algorithm methods are employed to solve the multiobjective optimization problem. These optimization algorithms discover candidate features as subsets of the object being described and that lie in the set of all Pareto-optimal solutions - of that problem. These candidate features are then summarized, employing again evolutionary-computation methods, and interrelated by employing domain-specific relations of interest to the end users. We present results of the application of this two-step method to the recognition and summarization of interesting features in DNA sequences of Tripanosoma cruzi.
format SER
author Zwir, I.
Zaliz, R.R.
Ruspini, E.H.
author_facet Zwir, I.
Zaliz, R.R.
Ruspini, E.H.
author_sort Zwir, I.
title Automated biological sequence description by genetic multiobjective generalized clustering
title_short Automated biological sequence description by genetic multiobjective generalized clustering
title_full Automated biological sequence description by genetic multiobjective generalized clustering
title_fullStr Automated biological sequence description by genetic multiobjective generalized clustering
title_full_unstemmed Automated biological sequence description by genetic multiobjective generalized clustering
title_sort automated biological sequence description by genetic multiobjective generalized clustering
url http://hdl.handle.net/20.500.12110/paper_00778923_v980_n_p65_Zwir
work_keys_str_mv AT zwiri automatedbiologicalsequencedescriptionbygeneticmultiobjectivegeneralizedclustering
AT zalizrr automatedbiologicalsequencedescriptionbygeneticmultiobjectivegeneralizedclustering
AT ruspinieh automatedbiologicalsequencedescriptionbygeneticmultiobjectivegeneralizedclustering
_version_ 1807319231608389632