Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons
Author
Choi, IllyoungPonsero, Alise J
Bomhoff, Matthew
Youens-Clark, Ken
Hartman, John H
Hurwitz, Bonnie L
Affiliation
Univ Arizona, Dept Comp SciUniv Arizona, Dept Biosyst Engn
Univ Arizona, BIO5 Inst
Issue Date
2019-02-01
Metadata
Show full item recordPublisher
OXFORD UNIV PRESSCitation
Illyoung Choi, Alise J Ponsero, Matthew Bomhoff, Ken Youens-Clark, John H Hartman, Bonnie L Hurwitz, Libra: scalable k-mer–based tool for massive all-vs-all metagenome comparisons, GigaScience, Volume 8, Issue 2, February 2019, giy165, https://doi.org/10.1093/gigascience/giy165Journal
GIGASCIENCERights
© The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
Background Shotgun metagenomics provides powerful insights into microbial community biodiversity and function. Yet, inferences from metagenomic studies are often limited by dataset size and complexity and are restricted by the availability and completeness of existing databases. De novo comparative metagenomics enables the comparison of metagenomes based on their total genetic content. Results We developed a tool called Libra that performs an all-vs-all comparison of metagenomes for precise clustering based on their k-mer content. Libra uses a scalable Hadoop framework for massive metagenome comparisons, Cosine Similarity for calculating the distance using sequence composition and abundance while normalizing for sequencing depth, and a web-based implementation in iMicrobe (http://imicrobe.us) that uses the CyVerse advanced cyberinfrastructure to promote broad use of the tool by the scientific community. Conclusions A comparison of Libra to equivalent tools using both simulated and real metagenomic datasets, ranging from 80 million to 4.2 billion reads, reveals that methods commonly implemented to reduce compute time for large datasets, such as data reduction, read count normalization, and presence/absence distance metrics, greatly diminish the resolution of large-scale comparative analyses. In contrast, Libra uses all of the reads to calculate k-mer abundance in a Hadoop architecture that can scale to any size dataset to enable global-scale analyses and link microbial signatures to biological processes.Note
Open access journalISSN
2047-217XPubMed ID
30597002Version
Final published versionSponsors
National Science Foundation [1640775]Additional Links
https://academic.oup.com/gigascience/article/8/2/giy165/5266304ae974a485f413a2113503eed53cd6c53
10.1093/gigascience/giy165
Scopus Count
Collections
Except where otherwise noted, this item's license is described as © The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License.
Related articles
- ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.
- Authors: Maarala AI, Bzhalava Z, Dillner J, Heljanko K, Bzhalava D
- Issue date: 2018 Mar 15
- Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis.
- Authors: Dubinkina VB, Ischenko DS, Ulyantsev VI, Tyakht AV, Alexeev DG
- Issue date: 2016 Jan 16
- Improving the sensitivity of long read overlap detection using grouped short k-mer matches.
- Authors: Du N, Chen J, Sun Y
- Issue date: 2019 Apr 4
- Comparison of k-mer-based de novo comparative metagenomic tools and approaches.
- Authors: Ponsero AJ, Miller M, Hurwitz BL
- Issue date: 2023
- AFITbin: a metagenomic contig binning method using aggregate l-mer frequency based on initial and terminal nucleotides.
- Authors: Darabi A, Sobhani S, Aghdam R, Eslahchi C
- Issue date: 2024 Jul 16

