TY - JOUR
T1 - HLAscan
T2 - Genotyping of the HLA region using next-generation sequencing data
AU - Ka, Sojeong
AU - Lee, Sunho
AU - Hong, Jonghee
AU - Cho, Yangrae
AU - Sung, Joohon
AU - Kim, Han Na
AU - Kim, Hyung Lae
AU - Jung, Jongsun
N1 - Publisher Copyright:
© 2017 The Author(s).
PY - 2017/5/12
Y1 - 2017/5/12
N2 - Background: Several recent studies showed that next-generation sequencing (NGS)-based human leukocyte antigen (HLA) typing is a feasible and promising technique for variant calling of highly polymorphic regions. To date, however, no method with sufficient read depth has completely solved the allele phasing issue. In this study, we developed a new method (HLAscan) for HLA genotyping using NGS data. Results: HLAscan performs alignment of reads to HLA sequences from the international ImMunoGeneTics project/human leukocyte antigen (IMGT/HLA) database. The distribution of aligned reads was used to calculate a score function to determine correctly phased alleles by progressively removing false-positive alleles. Comparative HLA typing tests using public datasets from the 1000 Genomes Project and the International HapMap Project demonstrated that HLAscan could perform HLA typing more accurately than previously reported NGS-based methods such as HLAreporter and PHLAT. In addition, the results of HLA-A, -B, and -DRB1 typing by HLAscan using data generated by NextGen were identical to those obtained using a Sanger sequencing-based method. We also applied HLAscan to a family dataset with various coverage depths generated on the Illumina HiSeq X-TEN platform. HLAscan identified allele types of HLA-A, -B, -C, -DQB1, and -DRB1 with 100% accuracy for sequences at ≥ 90× depth, and the overall accuracy was 96.9%. Conclusions: HLAscan, an alignment-based program that takes read distribution into account to determine true allele types, outperformed previously developed HLA typing tools. Therefore, HLAscan can be reliably applied for determination of HLA type across the whole-genome, exome, and target sequences.
AB - Background: Several recent studies showed that next-generation sequencing (NGS)-based human leukocyte antigen (HLA) typing is a feasible and promising technique for variant calling of highly polymorphic regions. To date, however, no method with sufficient read depth has completely solved the allele phasing issue. In this study, we developed a new method (HLAscan) for HLA genotyping using NGS data. Results: HLAscan performs alignment of reads to HLA sequences from the international ImMunoGeneTics project/human leukocyte antigen (IMGT/HLA) database. The distribution of aligned reads was used to calculate a score function to determine correctly phased alleles by progressively removing false-positive alleles. Comparative HLA typing tests using public datasets from the 1000 Genomes Project and the International HapMap Project demonstrated that HLAscan could perform HLA typing more accurately than previously reported NGS-based methods such as HLAreporter and PHLAT. In addition, the results of HLA-A, -B, and -DRB1 typing by HLAscan using data generated by NextGen were identical to those obtained using a Sanger sequencing-based method. We also applied HLAscan to a family dataset with various coverage depths generated on the Illumina HiSeq X-TEN platform. HLAscan identified allele types of HLA-A, -B, -C, -DQB1, and -DRB1 with 100% accuracy for sequences at ≥ 90× depth, and the overall accuracy was 96.9%. Conclusions: HLAscan, an alignment-based program that takes read distribution into account to determine true allele types, outperformed previously developed HLA typing tools. Therefore, HLAscan can be reliably applied for determination of HLA type across the whole-genome, exome, and target sequences.
KW - HLA typing
KW - HLAscan
KW - Next-generation sequencing
KW - Phasing issue
UR - http://www.scopus.com/inward/record.url?scp=85019196340&partnerID=8YFLogxK
U2 - 10.1186/s12859-017-1671-3
DO - 10.1186/s12859-017-1671-3
M3 - Article
C2 - 28499414
AN - SCOPUS:85019196340
SN - 1471-2105
VL - 18
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - 1
M1 - 258
ER -