All stem cells have similar features, like the ability to differentiate into multiple types of tissues. But not all stem cells are alike. Not only are human embryonic stem cells (hESCs) and somatic (adult) stem cells very different from each other, but different preparations of the same kind of stem cell can have vastly diverse characteristics. This means that some stem cell lines may be preferable to others for particular research projects or clinical therapies. But how can scientists decide which lines to use? The answer is for scientists to share with other scientists the information they gather from their experiments with individual cell lines or types. We pioneered this effort by creating a website called the Stem Cell Community, but while the website is useful for creating a sense of community, scientists need a means to share the immense amounts of data that are generated by modern molecular methods. In particular, we have been using the power of the human genome project to examine the qualities of different types of stem cells. Using a high-throughput gene expression microarray method, we are identifying a distinctive "molecular signature" for each stem cell line. We are also using molecular analysis to study how particular genes are turned on and off during stem cell differentiation, and we are using the tools of personalized medicine to look for genetic changes in the cells. Each of our experiments generates several million data points, and our challenge is to find a way to make these data useful for other scientists. Our solution is to develop a sort of "Google" for stem cells, a web-based database that will allow scientists to browse through molecular profiles; this database will also link this information with a variety of other kinds of information such as results of other scientists' experiments and preclinical and clinical data on stem cell therapies. It is our hope that the Stem Cell Matrix will not only aid in design of experiments for fundamental and preclinical research, but will also enable clinical insights, the "eureka" moments that will lead to bold new approaches to diagnostic tests and cures.
Statement of Benefit to California:
“California has the fastest growing population in the U.S., and the fastest growing segment of California’s population is persons age 85 and over. The number of people over 60 years of age will grow from 4.9 million in 2000 to 9.0 million in 2020.” Governors Budget Summary, 2001-01, Aging with Dignity Californians are a large and diverse population that poses unique challenges for the future of medical care. To its benefit, California has a tradition of taking the lead in technology and medical breakthroughs, not only by invention and innovation, but also by following through with research, development, and commercialization of ideas. Californians have a tradition of encouraging an entrepreneurial spirit, making the state an attractive site to launch new and risky ventures. Our proposal takes advantage of California's greatest strengths: innovative scientific research and high tech expertise. We propose a collaborative effort that integrates cutting edge laboratory research with the imaginative computer technology that launched many successful high tech ventures in California. We will generate and analyze a large amount of information about stem cells and use high-level software analysis to put the molecular signatures of these cells in perspective with scientific data and tests of efficacy in preclinical development and clinical trials. The database that we are building, which we call the "Stem Cell Matrix" will have the unique feature of being accessible to scientists who have little or no experience in molecular biology or bioinformatics. In order to achieve this goal, we are drawing on the talents of two groups of California researchers: one is actively developing high-throughput molecular analysis methods, and the other is designing accessible web-based knowledge databases Our goal is to make stem cell data accessible to scientists so that they can make intelligent choices about which of the hundreds of embryonic and somatic stem cell lines will be best for their needs. This will eliminate wasted resources by creating a shared information database, encourage collaboration among a diverse group of scientists, and immediately enable scientists to plan and carry out their experiments in a sound scientific context. This database has the potential to make California a significant stem cell and life science content provider. This project will create a magnet for other researchers, inside and outside California, to contribute their own information and expertise, which will leverage the power of the California stem cell community to explore novel approaches. The proposed project will be a springboard to new commercial ventures, and attract investment in research and development. Ultimately, by encouraging sharing of information, the project will help maintain high standards of scientific research, and speed the development of clinical applications for stem cells that will benefit all Californians.
SYNOPSIS: The PI proposes to develop a database of molecular information and a curated stem cell database, for both embryonic and adult stem cells that he names the “Stem Cell Matrix”. This proposal has three specific aims. The first is to create a database of molecular information that is accessible to researchers without specific expertise in high throughput molecular methods or bioinformatics. The second is to develop a curated stem cell knowledge database, enabling basic scientists and clinicians to access, query and share study results in the context of a worldwide comprehensive dataset. The third specific aim is to expand the knowledge database by generating phenotypic and genotypic data about specific stem cell lines. This grant could be easily funded by the NIH. SIGNIFICANCE AND INNOVATION: This proposal is innovative and significant in the sense that it will provide a database for the people in the field to be able to have access to data collected from multiple sources. Unfortunately, this referee could not check this site as suggest by the PI at http://www.stemcellcommunity.org, as the use of the site requires registration and would unveil the identity of the referee. The construction of a general stem cell database with integrated information from numerous stem cell systems and laboratories is a valuable idea. While there are number of databases focused on stem cells at different institutions, it is not clear how the information sets can be compared. Indeed, a need for a "meta-analysis" of stem cell gene expression datasets and other features is badly needed, and could be an invaluable resource. STRENGTHS: While there are many databases covering many aspects of both embryonic and adult stem cells (for example: NIH, Princeton, Harvard, UCSF, Rockefeller), the one proposed by the PI represents an integration of all of these many different, individual sites. The combination of data gathering and short-term curation of datasets are obvious advantages to such an approach. It is clear that if successful and managed properly, it could represent a powerful research tool for the stem cell research community in California and abroad. The broad scope of the information to be compiled and curated is a strength. In the short term it is likely that the proposed construction of the database will succeed. WEAKNESSES: This is a tool-making project which suffers from a number of shortcomings despite the obvious utility of the desired tool. First, there is currently no mechanism in place that would force people in the field to share their raw data in any kind of setting. Some journals make it a requirement, but not all. The private sector certainly will not contribute for IP reasons. Thus, it is not really clear what volume this database will cover. Second, the fact that experiments amongst different laboratories are not homogenized or normalized for the same statistical tools increases the likelihood that the collection of information will suffer in quality, resulting in an "apples and oranges" comparison problem. As any laboratory that has made their own database or enjoyed the use of the third party’s database knows, starting these projects is easy and usually fueled by tremendously motivated people. However, the most important aspect of this tool, for it to be useful on a long-term basis is the process of curation in function of time. As the dataset increases, curation becomes more important. It is not very clear how this will be addressed at the end of the two years of CIRM Seed funding. The construction of databases is not particularly difficult or challenging. Indeed, a number of efforts have been reported in the stem cell field. A number of these databases have provided important clues to biological function. What is not very clear about this database is how it will actually, in practice improve on the existing resources. It is critical to provide information that demonstrates a commitment and ability to maintain this as a growing, continuously curated and updated resource. This will take considerable time and effort. DISCUSSION: There was no further discussion following the reviewers' comments.