Integration of Functional and Taxonomic Annotations With Physicochemical Measurements via Ontologies
Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
Metagenomic sequencing technologies have rapidly advanced over the past 15 years, which has enabled the characterization of the functional genomic potentials and taxonomic structures of microbial communities. However, efforts to make novel insights across different communities and datasets have been hampered by shortcomings in the cyberinfrastructure hosting these metagenomes. Existing repositories have struggled to meet the FAIR data principles: Findable, Accessible, Interoperable, and Reusable. Interoperability, in particular, poses a significant challenge, as it requires both data and metadata to be encoded using standardized, publicly available ontologies or vocabularies. In marine metagenomics, the interoperability problem has hindered analysis within and across expeditions. Oceanographic research cruises have gathered samples across the globe, with many different teams aboard characterizing the biological, geological, and chemical processes in the ocean systems over space and time. However, no systemic, unifying framework exists to harmonize the microbiome data with the physical, biological, geological, and geochemical data. To address this problem, we previously developed Planet Microbe, a data repository that connects metagenomic and metatranscriptomic data with physical, geological, geochemical, and biological datasets. In addition to building the infrastructure, we integrated several historical oceanographic ‘omics datasets, such as the Hawaii Ocean Time series (HOT), and reconnected them with their physicochemical measurements. Encoding the physicochemical measurements using the Open Biomedical and Biological Ontologies (OBO) Foundry enables data discovery based on environmental parameters. By leveraging the hierarchical structure of ontologies, Planet Microbe provides a powerful search interface that enables researchers to discover datasets relevant to specific biological questions based on environmental and physicochemical contextual data. However, Planet Microbe did not include the capability to search ontologies programmatically for functional and taxonomic information. Integration of taxonomic and functional information with contextual data could empower the discovery of associations between genes, species, and physicochemical factors, the identification of highly constrained ecosystems, and the facilitation of comparisons between environments. To address this challenge, we developed a standardized pipeline for analyzing the datasets hosted at Planet Microbe, utilizing the Gene Ontology (GO) and NCBITaxon for functional and taxonomic ontological representation, respectively.Type
Electronic Thesistext
Degree Name
M.S.Degree Level
mastersDegree Program
Graduate CollegeMolecular & Cellular Biology
