Control of human embryonic stem cell pluripotence and fate choice. A major goal for California's supporters of stem cell research is development of stem cell-based products that have medical use, and the mandate for the research community is to provide the best possible fundamental information to help guide clinical applications. Our research plan is to lay the groundwork for medical use of stem cells by an ambitious program to address the most important basic questions about stem cells. How do different stem cell lines differ from each other, and which one is best for a particular therapy or product? What gives stem cells the ability to self-renew? Can we make stem cells into the adult cell types we need? Can stem cells be made safe- what causes some stem cells to form tumors? We will use "high-throughput" techniques developed for the Human Genome Project to examine a large group of human embryonic stem (ES) cells and related cells. The techniques show us how all 25,000 genes in the human genome are turned on and off in different cell types. We were surprised to learn that embryonic stem cells from all over the world all have almost the same gene activity pattern- they have more genes "on" than any other cell type. As the ES cells develop into particular types of cells, like nerve cells or muscle cells, they turn off particular genes and turn on others. We used bioinformatics, which merges computer science and biology, to examine huge amounts of data, (millions of data points), and discovered the processes of self-renewal and differentiation seem to be tightly regulated by many types of molecular signals. The goal of our research proposal is to understand how these signals work, and to intervene in the signaling to discover how to control the development of stem cells so that they can be both safe and effective for medical applications. We believe that scientific data aren't useful unless they can be used by other scientists to guide their experiments, so we will publish our molecular information on a web-based knowledge database, which can be shared by stem cell scientists all over the world. If we achieve our aims, scientists and clinicians will be able to find out with a click of a mouse which stem cells they should use for studying Alzheimer's disease, which cells to use for testing the effects of new drugs, and which cells will be safe and efficacious when transplanted to the brain of a child with a fatal neurological disease.
Statement of Benefit to California:
Californians are a large and diverse population that poses unique challenges for the future of medical care. Fortunately, California has a tradition of taking the lead in technology and medical breakthroughs and following through from the first idea to the final product. A major goal for California's supporters of stem cell research is development of stem cell-based products that have medical use, and the mandate for the research community is to provide the best possible fundamental information to help guide clinical applications. Our research plan is to lay the groundwork for medical use of stem cells by an ambitious program to address the most important basic questions about stem cells. How do different stem cell lines differ from each other, and which one is best for a particular therapy or product? What gives stem cells the ability to self-renew? Can we make stem cells into the adult cell types we need? Can stem cells be made safe- what causes some stem cells to form tumors? We will use "high-throughput" techniques developed for the Human Genome Project to examine a large group of human embryonic stem cells and related cells. The huge amount of information we generate will be used to create a web-based shared knowledge database. The publicly accessible knowledge database will provide answers for many of the existing questions and inspire new ones. This innovative program has the potential to make California a significant stem cell and life science content provider. It will be a magnet for other researchers, inside and outside California, to contribute their own information and expertise, which will leverage the power of the California stem cell community to explore novel approaches. The proposed project will also be a springboard to new commercial ventures, and attract investment in research and development. Ultimately the project will speed the development of clinical applications for stem cells that will benefit all Californians.
SYNOPSIS: A large number of related cell types, including well-characterized pluripotent human embryonic stem cell (hESC) lines, early-stage embryonic cells, lineage-restricted and differentiated hESC derivatives, germ cell cancers, and multipotent somatic stem cells will be purified and characterized using SNP genotyping, DNA methylation, and mRNA and miRNA expression. The aims are: 1)Identify the genes whose expression levels are consistently linked to the maintenance of pluripotence and specific fate choices by hESCs; 2) Integrate epigenetic, genetic, and expression information to create experimentally testable models of regulation of self-renewal, pluripotence and specific differentiation; 3) Validate the models of regulatory networks by functional analysis; and 4) Produce an accessible interactive stem cell knowledge base. IMPACT AND SIGNIFICANCE: This is a high throughput systems biology-based approach that utilizes state-of-the-art technology to descriptively characterize a wide range of hESC lines at various stages of differentiation and transformation. The analysis is highly multifactorial including SNP genotyping, methylation analysis and transcriptomics. The project will make the data (and presumably models) available to the stem cell community via a web resource. The data generated in Aim 1 would be extremely valuable to the stem cell community in understanding the nature of hESC behavior (e.g., self-renewal, pluripotency, genetic stability). These phenotypes appear to be regulated by multiple coordinated signals, and the network-based approach proposed by the PIs will shed quantitative insight into how these pathways synergize. This type of basic analysis of hESC biology will have implications on directed differentiation efforts of others as it may help define culture conditions and identify optimal cell lines for generating cells in desired lineages. There is, however, little innovation or originality in this proposal in that it is simply cataloging the molecular traits of a variety of hESC lines and their derivatives and compiling these in a database. There is, essentially, no hypothesis driven research described here. However, as a baseline, the authors argue that having this information gained in a highly controlled manner compiled and compared would be a useful tool for stem cell biologists worldwide. QUALITY OF THE RESEARCH PLAN: The overall research plan is highly descriptive and aimed at cataloging a huge number of variations and systematically comparing them among different hESCs and hESC derived cell lines using bioinformatics and systems biology approaches. The investigators seek to identify genetic and epigenetic changes using high throughput screening methods on established cell lines and their derivatives and then validate them using single gene perturbation approaches. The ultimate goal is to produce an accessible interactive stem-cell knowledge base that can be accessed on the web. The investigators simply list about 40 different cell lines from which they will perform all of these analyses and provide a relatively cohesive bioinformatics and systems approach to evaluating and comparing these cell lines. The key issue is whether they will be able to identify genotypic or transcriptomic patterns from otherwise identical hESC cell lines, which predispose a particular line to differentiate down a particular pathway or maintain pluripotency. This apparently would be the main value of such a broad descriptive study. In preliminary data they show that undifferentiated hESCs show a unique epigenetic signature via CpG methylation analysis. This is quite interesting and indicates that methylation regulates the expression of some genes but not others (depending on cell type). The main problem with this proposal is that it is so vaguely written that it is unclear exactly what this huge and highly collaborative group are going to do and exactly how they are going to do it. The research design and methods is essentially a complete rehash of the specific aims with very little detail provided including prioritization and data analysis. This is clearly a large-scale project amenable to this RFA and responsive but it is almost completely descriptive and unclear whether the mountain of data produced will be interpretable and useful. The PI has assembled an impressive number of collaborators who will provide cell lines and has assembled the necessary tools, both bioinformatic and otherwise, to complete it. There is very little in this that is innovative or original but it is guaranteed to generate data, which will likely be useful to the stem cell community. The rationale that molecular “signatures” can be used to identify and characterize cell types is very interesting and compelling. The molecular analysis component of the proposed research is very strong. Global gene expression analysis, SNP genotyping, global DNA methylation, and microRNA analysis will provide a broad characterization of a diverse set of hESC lines, hESC derivatives, embryos, and other differentiated cells. The investigators have an excellent track record in these assays and present preliminary data indicating these efforts are underway. Creation of experimentally-testable models of self-renewal, pluripotency, and specific differentiation is less well defined, however. The proposal does not indicate the type(s) of models to be generated, or how the data will be used in modeling. The preliminary data indicate a machine-learning based cell type classification, but it is not clear how such a model will weight different types of data in the classification algorithm. Also, these types of models are not very mechanistic at a molecular level and as such may not generate substantial mechanistic insight into network regulation of hESC behaviors. The validation mechanisms will utilize microRNA overexpression and suppression, as well as siRNA gene silencing, to target particular regulatory elements. These assays are very reasonable and likely to provide insight into model validity. It is not clear how these data will be used to refine the model, however. First generation models will likely possess some correct and some incorrect, or incomplete, features. The PI should consider how experiments in Aim 3 will improve the mechanistic behavior of the models. STRENGTHS: The strength of this proposal is the environment in which the work is to be accomplished. The applicant institution has the infrastructure needed for hESCs, and is positioning itself as a leader in this field in California. Another noted strength is the collaboration of multiple investigators with distinct areas of expertise in stem cell biology and molecular characterization. The proposal has preliminary data indicating unique molecular signatures of hESCs. The proposal is comprehensive in looking at genetic epigenetic and expression profiles of a huge battery of different cell lines. Furthermore, analysis of multiple cell lines will provide insight into similarities and differences of hESC lines, and progeny of these lines. The proposal includes a data sharing plan via a web-based resource, which will be useful to the scientific community. WEAKNESSES: Several weaknesses were noted for this proposal. Overall, the proposal is vague in its description of what will be done and the potential outcomes and pitfalls are poorly defined. There is little detail provided on model development, validation, and refinement. This calls into question the ability to extract mechanistic meaning from the experiments. Furthermore, it is not clear how different culture conditions for each of the distinct cell types will be interpreted. Microenvironmental factors such as media and extracellular matrix will complicate analysis. There are specific weaknesses noted for each aim. One reviewer felt that specific aim one has already been done and reported by many different groups, discussed and reviewed in detail. So there is nothing innovative in aim one, or any new information to be gained. Specific aim two is extremely broad and it is not clear to one reviewer how the profiles of global methylation, microRNA espression, and SNP genotype of a group of stem cells could be used to reprogram adult somatic cell nuclei, and how that fits with the rest of the project. In specific aim three, the expression “triggering differentiation” is loosely used without addressing questions such as, How many time points? How many transition states? What changes in culture conditions will affect these parameters? Specific aim four appears naïve, as setting up a “Google-type” resource for stem cells has been tried by many other groups in the past without success. As anyone who has attempted this type of endeavor knows, it is not setting up a database that is challenging, but the maintenance that is challenging. The cryopreservation experiments, while addressing an important problem, do not seem integrated with the rest of the proposal. Global molecular analysis of cells following preservation may provide clues into the mechanism of damage experienced by the cells, but how this information will be translated into a robust cryopreservation method, and be validated, is unclear. DISCUSSION: This is a big systems biology approach to stem cells in which 40 cell lines will be characterized and profiled using various assays to get a baseline for existing cell lines. One reviewer felt that this application was very diffuse. It’s not clear how the data will be analyzed or how experiments will be done. The applicant wants to create a Google-like database for the field but doesn’t explain how it will be done. There is no hypothesis here except that different hESCs can differentiate down different pathways. There was concern that the applicant uses terms like “triggering differentiation” in a poorly defined way. It would be helpful to address this in more detail, for example, by indicating how many time points will be tested. It was recognized that a database of such information could be useful but the applicant has not addressed how to turn data into experimentally useful models. Genomics/proteomics data conversion to models has historically been very challenging. The applicant has not proposed an acceptable plan. In addition, the applicant did not acknowledge relevant published work such as the creation of the HSC database. Aim 2 is very broad and has been proposed by numerous other groups. Everyone wants a Google database but it is well known that setting it up is the easy part; maintenance and curation are the hard parts. There is no mention made of curation in this proposal. The PI doesn't seem to appreciate the difficulties of what is proposed and needs to outline a better plan. There was a question about the independence of the applicant based on the publications presented. The percent effort listed seems unreasonable.