Show simple item record

dc.contributor.authorLiu, Yang
dc.contributor.authorKhan, Saad M.
dc.contributor.authorWang, Juexin
dc.contributor.authorRynge, Mats
dc.contributor.authorZhang, Yuanxun
dc.contributor.authorZeng, Shuai
dc.contributor.authorChen, Shiyuan
dc.contributor.authorMaldonado dos Santos, Joao V.
dc.contributor.authorValliyodan, Babu
dc.contributor.authorCalyam, Prasad P.
dc.contributor.authorMerchant, Nirav
dc.contributor.authorNguyen, Henry T.
dc.contributor.authorXu, Dong
dc.contributor.authorJoshi, Trupti
dc.date.accessioned2017-07-06T23:18:59Z
dc.date.available2017-07-06T23:18:59Z
dc.date.issued2016-10-06
dc.identifier.citationPGen: large-scale genomic variations analysis workflow and browser in SoyKB 2016, 17 (S13) BMC Bioinformaticsen
dc.identifier.issn1471-2105
dc.identifier.doi10.1186/s12859-016-1227-y
dc.identifier.urihttp://hdl.handle.net/10150/624651
dc.description.abstractBackground: With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed " PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. Results: We have developed both a Linux version in GitHub (https:// github. com/ pegasus-isi/ PGen-GenomicVariationsWorkflow) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), (http:// soykb. org/ Pegasus/ index. php). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser (http:// soykb. org/ NGS_ Resequence/ NGS_ index. php) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. Conclusion: PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species.
dc.description.sponsorshipMissouri Soybean Merchandising Council [368]; United Soybean Board [1320-532-5615]en
dc.language.isoenen
dc.publisherBIOMED CENTRAL LTDen
dc.relation.urlhttp://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1227-yen
dc.rights© 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License.en
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titlePGen: large-scale genomic variations analysis workflow and browser in SoyKBen
dc.typeArticleen
dc.contributor.departmentUniv Arizona, iPlant Collaboraten
dc.identifier.journalBMC Bioinformaticsen
dc.description.collectioninformationThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.en
dc.eprint.versionFinal published versionen
refterms.dateFOA2018-09-11T21:08:57Z
html.description.abstractBackground: With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed " PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. Results: We have developed both a Linux version in GitHub (https:// github. com/ pegasus-isi/ PGen-GenomicVariationsWorkflow) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), (http:// soykb. org/ Pegasus/ index. php). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser (http:// soykb. org/ NGS_ Resequence/ NGS_ index. php) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. Conclusion: PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species.


Files in this item

Thumbnail
Name:
s12859-016-1227-y.pdf
Size:
1.726Mb
Format:
PDF
Description:
FInal Published Version

This item appears in the following Collection(s)

Show simple item record

© 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License.
Except where otherwise noted, this item's license is described as © 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License.