The importance of digitized biocollections as a source of trait data and a new VertNet resource

For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosys...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Guralnick, R.P., Zermoglio, P.F., Wieczorek, J., LaFrance, R., Bloom, D., Russell, L.
Formato: JOUR
Materias:
Acceso en línea:http://hdl.handle.net/20.500.12110/paper_17580463_v2016_n_p_Guralnick
Aporte de:
id todo:paper_17580463_v2016_n_p_Guralnick
record_format dspace
spelling todo:paper_17580463_v2016_n_p_Guralnick2023-10-03T16:32:43Z The importance of digitized biocollections as a source of trait data and a new VertNet resource Guralnick, R.P. Zermoglio, P.F. Wieczorek, J. LaFrance, R. Bloom, D. Russell, L. animal DNA sequence genetic database genetic variation human procedures quantitative trait locus software Animals Databases, Genetic Genetic Variation Humans Quantitative Trait Loci Sequence Analysis, DNA Software For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content. © The Author(s) 2016. Fil:Zermoglio, P.F. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. JOUR info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_17580463_v2016_n_p_Guralnick
institution Universidad de Buenos Aires
institution_str I-28
repository_str R-134
collection Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic animal
DNA sequence
genetic database
genetic variation
human
procedures
quantitative trait locus
software
Animals
Databases, Genetic
Genetic Variation
Humans
Quantitative Trait Loci
Sequence Analysis, DNA
Software
spellingShingle animal
DNA sequence
genetic database
genetic variation
human
procedures
quantitative trait locus
software
Animals
Databases, Genetic
Genetic Variation
Humans
Quantitative Trait Loci
Sequence Analysis, DNA
Software
Guralnick, R.P.
Zermoglio, P.F.
Wieczorek, J.
LaFrance, R.
Bloom, D.
Russell, L.
The importance of digitized biocollections as a source of trait data and a new VertNet resource
topic_facet animal
DNA sequence
genetic database
genetic variation
human
procedures
quantitative trait locus
software
Animals
Databases, Genetic
Genetic Variation
Humans
Quantitative Trait Loci
Sequence Analysis, DNA
Software
description For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content. © The Author(s) 2016.
format JOUR
author Guralnick, R.P.
Zermoglio, P.F.
Wieczorek, J.
LaFrance, R.
Bloom, D.
Russell, L.
author_facet Guralnick, R.P.
Zermoglio, P.F.
Wieczorek, J.
LaFrance, R.
Bloom, D.
Russell, L.
author_sort Guralnick, R.P.
title The importance of digitized biocollections as a source of trait data and a new VertNet resource
title_short The importance of digitized biocollections as a source of trait data and a new VertNet resource
title_full The importance of digitized biocollections as a source of trait data and a new VertNet resource
title_fullStr The importance of digitized biocollections as a source of trait data and a new VertNet resource
title_full_unstemmed The importance of digitized biocollections as a source of trait data and a new VertNet resource
title_sort importance of digitized biocollections as a source of trait data and a new vertnet resource
url http://hdl.handle.net/20.500.12110/paper_17580463_v2016_n_p_Guralnick
work_keys_str_mv AT guralnickrp theimportanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT zermogliopf theimportanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT wieczorekj theimportanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT lafrancer theimportanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT bloomd theimportanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT russelll theimportanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT guralnickrp importanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT zermogliopf importanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT wieczorekj importanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT lafrancer importanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT bloomd importanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT russelll importanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
_version_ 1807318560772456448