Author
Vanni, C.Schechter, M.S.
Acinas, S.G.
Barberán, A.
Buttigieg, P.L.
Casamayor, E.O.
Delmont, T.O.
Duarte, C.M.
Eren, A.M.
Finn, R.D.
Kottmann, R.
Mitchell, A.
Sanchez, P.
Siren, K.
Steinegger, M.
Glöckner, F.O.
Fernandez-Guerra, A.
Affiliation
Department of Environmental Science, University of ArizonaIssue Date
2022
Metadata
Show full item recordPublisher
eLife Sciences Publications LtdCitation
Vanni, C., Schechter, M. S., Acinas, S. G., Barberán, A., Buttigieg, P. L., Casamayor, E. O., Delmont, T. O., Duarte, C. M., Eren, A. M., Finn, R. D., Kottmann, R., Mitchell, A., Sanchez, P., Siren, K., Steinegger, M., Glöckner, F. O., & Fernandez-Guerra, A. (2022). Unifying the known and unknown microbial coding sequence space. ELife.Journal
eLifeRights
Copyright © 2022, Vanni et al. This article is distributed under the terms of the Creative Commons Attribution License.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40%-60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1,749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data. © 2022, eLife Sciences Publications Ltd. All rights reserved.Note
Open access journalISSN
2050-084XPubMed ID
35356891Version
Final published versionae974a485f413a2113503eed53cd6c53
10.7554/eLife.67667
Scopus Count
Collections
Except where otherwise noted, this item's license is described as Copyright © 2022, Vanni et al. This article is distributed under the terms of the Creative Commons Attribution License.
Related articles
- Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life.
- Authors: Parks DH, Rinke C, Chuvochina M, Chaumeil PA, Woodcroft BJ, Evans PN, Hugenholtz P, Tyson GW
- Issue date: 2017 Nov
- A genome-phenome association study in native microbiomes identifies a mechanism for cytosine modification in DNA and RNA.
- Authors: Yang W, Lin YC, Johnson W, Dai N, Vaisvila R, Weigele P, Lee YJ, Corrêa IR Jr, Schildkraut I, Ettwiller L
- Issue date: 2021 Nov 8
- Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification.
- Authors: Kobeissy FH, Wolahan SM, Hirt D, Glenn TC
- Issue date: 2015