Characterization of distinct histone methylation and acetylation binding patterns in promoters and prediction of novel regulatory regions remains an important area of genomic research, as it is hypothesized that distinct chromatin signatures may specify unique genomic functions. However, methods that have been proposed in the literature are either descriptive in nature or are fully parametric and hence more restrictive in pattern discovery. In this article, we propose a two-step non-parametric statistical inference procedure to characterize unique histone modification patterns and apply it to analyzing the binding patterns of four histone marks, H3K4me2, H3K4me3, H3K9ac, and H4K20me1, in human B-lymphoblastoid cells. In the first step, we used a functional principal component analysis method to represent the concatenated binding patterns of these four histone marks around the transcription start sites as smooth curves. In the second step, we clustered these curves to reveal several unique classes of binding patterns. These uncovered patterns were used in turn to scan the whole-genome to predict novel and alternative promoters. Our analyses show that there are three distinct promoter binding patterns of active genes. Further, 19654 regions not within known gene promoters were found to overlap with human ESTs, CpG islands, or common SNPs, indicative of their potential role in gene regulation, including being potential novel promoter regions.
Bibliographical noteFunding Information:
Funding:Thisworkwassupportedinpartbya NationalResearchFoundationofKorea(NRF)grant fundedbytheKoreanGovernment(url:https://nrf.
© 2020 Kim, Lin. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.