The Stem Cell Matrix: a map of the molecular pathways that define pluripotent cells

Funding Type: 
Tools and Technologies I
Grant Number: 
Award Value: 
Stem Cell Use: 
Embryonic Stem Cell
iPS Cell
Public Abstract: 

Human embryonic stem cells (hESC) are being considered for a wide range of research and therapeutic uses. Cell therapy is the most challenging of the potential clinical applications and its success will depend on the ability to guide differentiation of hESC into clinically useful cell types. The ideal cell types would possess three features: the capacity to restore lost functions, the ability to survive after transplantation, and the absence of malignant potential.

A major roadblock in the development of stem cell therapies is the lack of tools for quality control, characterization, and identification of human pluripotent stem cells and differentiated populations. As new cell lines are developed and new differentiation techniques are tested, the need for validation of the cells becomes more and more critical if the cells are to be used in a clinical setting. We have developed a new method for unequivocally identifying pluripotent stem cell populations using molecular analysis tools developed for the Human Genome Project. We have identified a molecular fingerprint that is shared by all pluripotent cells, human or mouse, embryo-derived or produced from adult cells through new induced pluripotence technologies. Using the more than 10 million pieces of data we generated by analyzing hundreds of cell lines, we created a database called the The Stem Cell Matrix, which is intended to fill a critical knowledge gap in the field of human pluripotent cell biology. By collaborating with a company that has developed a powerful new search engine, we will be able to search these data for clues that will tell us whether a specific cell line is pluripotent, identify chemicals that may improve methods for reprogramming, and eventually link data from clinical trials with data on the genes that are active in the cells before they are transplanted. Our overall goal is to build on our proven technology to grow the database, providing a service that all CIRM-funded investigators can use for quality control and identification of the cells they are developing for research and clinical applications. An advantage of our approach is that the search engine can link our information to a much larger database on cancer cells, which will make it possible for stem cell researchers to develop new insights by comparing stem cells and cancer cells.

Statement of Benefit to California: 

The State of California, like the rest of the nation, faces immense challenges to its health care system, with soaring medical costs and an aging population. Pluripotent stem cells hold the potential to revolutionize medicine and health care by providing new treatments for incurable conditions such as diabetes, Parkinson's disease, and spinal cord injuries. Stem cell therapies, however, are in an early stage, and research conducted over the next few years will be critical to development of therapies that are safe and effective.

We have developed a new technology that harnesses the powerful tools developed for the Human Genome Project to ensure quality control and simplify characterization of human stem cells used for research and clinical therapy. The technology links smoothly with databases and search engines that are being developed by the high tech industry. We propose to further develop this technology and make it available and accessible to stem cell researchers and clinicians throughout California. Ultimately, this technology, the discoveries it will enable, and its synergies with the high tech industry will benefit California by attracting highly skilled jobs and tax revenues, and by making the State a leader in a field that is poised to be the economic engine of the future.

Progress Report: 

In our Tools and Technologies grant application we proposed to remove a major roadblock in the development of stem cell therapies, the lack of reliable tools for identification of human stem cells and cells differentiated from these stem cells.

In the last year we have made considerable progress toward this goal and have met all of our proposed milestones. In fact, we greatly exceeded our goals, generating and analyzing far more data than we had proposed.

Our success was made possible because of the following factors:

1. An novel analysis instrument (Illumina BeadStation) purchased with CIRM funds was installed in our laboratory.

2. A highly skilled staff and rigorous quality control methods led to generation of data of consistently higher quality than had been expected based on other researchers’ experience.

3. We made improvements in data analysis methods, which made it possible to analyze data more quickly.

4. Improvements in of reliability of reprogramming of iPSC lines and generation of hESC lines meant that we obtained more high quality lines for analysis than we had expected.

5. There was an unexpectedly high level of interest in collaboration, which led to addition of samples that were analyzed using other funding mechanisms.

6. We have created improved web tools to make access more efficient and user-friendly.

So far, we run more than 2000 tests from more than 250 undifferentiated and differentiated stem cell samples. We decided to develop our own database access platform that would allow our data to be used for simplified testing of new stem cell lines by other researchers throughout the world.

At our current benchmark, based on the CIRM-funded Stem Cell Matrix-2 data collection, we are able to predict pluripotency in cell cultures with >99% specificity and >99% sensitivity, using data from a single $80 microarray.

Currently, CIRM is supporting the generation of hundreds of human iPSC and ESC lines for basic research, translational research, and disease treatments. In the past, the only way to prove that these cell lines are pluripotent- a necessity for peer-reviewed publications- was to transplant a million or so of the cells to the muscle, kidney, or other site in an immunodeficient mouse. A month or two later, if the cells are pluripotent, they will develop into a large tumor called a teratoma. The teratoma contains recognizable tissues like cartilage, glandular tissue, and nerve cells; if a sufficient diversity of cell types is identified in the tumor, the cell line is said to be pluripotent. The teratoma technique is difficult to learn, unreliable, and uses laboratory animals; one researcher called it the “most ridiculous assay on the planet”, and most stem cell researchers would prefer a different kind of assay for pluripotency of human stem cells. This Tools and Technologies award proposed to develop new methods for determining whether or not human stem cells are pluripotent by analyzing the activities of their genomes. We’ve known for several years that the gene expression profile- which of the 25,000 genes are active- for human pluripotent stem cells (hESCs and iPSCs) is very different from any other cell types. We published this information in 2008 in the scientific journal Nature. For the assay of pluripotency, which we call “PluriTest”, we collected gene expression profiles from more than 400 samples of normal pluripotent human cells and developed a new computer algorithm to use the gene expression information to identify the typical features of pluripotency. Pluripotent cells can have a range of different gene expression profiles, but there is overlap among them in which genes are turned on and which are turned off. PluriTest captures these variations, making it easy for researchers to confirm the pluripotency of their cells by comparing their cells’ gene expression profile to the typical profile generated by PluriTest. This approach is the same one taken by researchers developing diagnostic assays for cancers; “molecular diagnostic” tests are extremely valuable for determining the characteristics of a cancer and how it should be treated. The PluriTest molecular diagnostic test is extremely accurate and sensitive in determining whether or not a cell line is pluripotent. Once the gene expression profile of a cell line is obtained, at a cost of about $150 to $300, the data are uploaded to the PluriTest website, and ten minutes later, the user gets both a “pluripotency score” and a “novelty score” for his or her cell line. If the pluripotency score is high, the cells are definitively pluripotent; if a teratoma assay were performed on the cells, it would be positive every time. However, a cell line may have both a high pluripotency score and a high novelty score; this is the case for aneuploid cell lines (cells with duplicate or missing chromosomes) and embryonic stem cell-like tumors. PluriTest can identify cells that are not fully reprogrammed in iPSC techniques (“partially reprogrammed cells”) and can be used to track the progress of differentiation protocols. When cells differentiate, they move slowing from the “pluripotent, not novel space” to the “non-pluripotent, highly novel space”. We plan to use this approach to develop molecular diagnostic tests for differentiation- we are working on a “CardioTest”, a “NeuroTest” and a “HepatoTest”, using the same approach of collecting huge numbers of samples so that we can accurately identify the common variations among specific cell types. PluriTest has caught on in the stem cell research community: in the 3 months since the paper was published (in Nature Methods), more than 1500 gene expression profiles have been uploaded to our website and analyzed by PluriTest.