JOURNAL ARTICLE
RESEARCH SUPPORT, N.I.H., EXTRAMURAL
Add like
Add dislike
Add to saved papers

Computational inference of a genomic pluripotency signature in human and mouse stem cells.

Biology Direct 2016 September 18
UNLABELLED: Recent analyses of next-generation sequencing datasets have shown that cell-specific regulatory elements in stem cells are marked with distinguishable patterns of transcription factor (TF) binding and epigenetic marks. For example, we recently demonstrated that promoters of cell-specific genes are covered with expanded trimethylation of histone H3 at lysine 4 (H3K4me3) marks (i.e., broad H3K4me3 domains). Moreover, binding of specific TFs, such as OCT4, NANOG, and SOX2, have been shown to play a critical role in maintaining the pluripotency of stem cells. Despite these observations, a systematic exploration of genomic and epigenomic features of stem-cell-specific gene promoters has not been conducted. Advanced machine-learning models can capture distinguishable genomic and epigenomic characteristics of stem-cell-specific promoters by taking advantage of the wealth of publicly available datasets. Here, we propose a three-step framework to discover novel data characteristics of high-throughput next generation sequencing datasets that distinguish pluripotency genes in human and mouse embryonic stem cells (ESCs). Our framework involves: i) feature extraction to identify novel features of genomic datasets; ii) feature selection using a logistic regression model combined with the Least Absolute Shrinkage and Selection Operator (LASSO) method to find the most critical datasets and features; and iii) cross validation with features selected using LASSO method to assess the predictive power of selected data features in distinguishing pluripotency genes. We show that specific epigenetic marks, and specific features of these marks, are enriched at pluripotency gene promoters. Moreover, we also assess both the individual and combined effect of TF binding, epigenetic mark deposition, gene expression datasets for marking pluripotency genes. Our findings are consistent with the existence of a conserved, complex and integrative genomic signature in ESCs that can be exploited to flag important candidate pluripotency genes. They also validate our computational framework for fostering a deeper understanding of genomic datasets in stem cells, in the future, could be extended to study cell-type-specific genomic landscapes in other cell types.

REVIEWERS: This article was reviewed by Zoltan Gaspari and Piotr Zielenkiewicz.

Full text links

We have located links that may give you full text access.
Can't access the paper?
Try logging in through your university/institutional subscription. For a smoother one-click institutional access experience, please use our mobile app.

Related Resources

For the best experience, use the Read mobile app

Mobile app image

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices Toggle icon

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app