Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis

Peter Sona, Jong Hui Hong, Sunho Lee, Byong Joon Kim, Woon Young Hong, Jongcheol Jung, Han Na Kim, Hyung Lae Kim, David Christopher, Laurent Herviou, Young Hwan Im, Kwee Yum Lee, Tae Soon Kim, Jongsun Jung

Research output: Contribution to journalArticlepeer-review

Abstract

Background: The use of whole genome sequence has increased recently with rapid progression of next-generation sequencing (NGS) technologies. However, storing raw sequence reads to perform large-scale genome analysis pose hardware challenges. Despite advancement in genome analytic platforms, efficient approaches remain relevant especially as applied to the human genome. In this study, an Integrated Genome Sizing (IGS) approach is adopted to speed up multiple whole genome analysis in high-performance computing (HPC) environment. The approach splits a genome (GRCh37) into 630 chunks (fragments) wherein multiple chunks can simultaneously be parallelized for sequence analyses across cohorts. Results: IGS was integrated on Maha-Fs (HPC) system, to provide the parallelization required to analyze 2504 whole genomes. Using a single reference pilot genome, NA12878, we compared the NGS process time between Maha-Fs (NFS SATA hard disk drive) and SGI-UV300 (solid state drive memory). It was observed that SGI-UV300 was faster, having 32.5 mins of process time, while that of the Maha-Fs was 55.2 mins. Conclusions: The implementation of IGS can leverage the ability of HPC systems to analyze multiple genomes simultaneously. We believe this approach will accelerate research advancement in personalized genomic medicine. Our method is comparable to the fastest methods for sequence alignment.

Original languageEnglish
Article number462
JournalBMC Bioinformatics
Volume19
Issue number1
DOIs
StatePublished - 3 Dec 2018

Bibliographical note

Funding Information:
This work was supported by the ‘INNOPOLIS Foundation, a grant-in-aid from the Korean government through Syntekabio, Inc. [grant number A2014DD101]; the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI); and the Ministry of Health & Welfare, Republic of Korea [grant number HI14C0072]’ The funding bodies had no role in the design, collection, analysis, or interpretation of data in this study.

Publisher Copyright:
© 2018 The Author(s).

Keywords

  • Genome analysis
  • Genome sizing
  • Infrastructure
  • Sequencing
  • Statistics
  • Storage
  • Whole genome

Fingerprint

Dive into the research topics of 'Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis'. Together they form a unique fingerprint.

Cite this