Proc Natl Acad Sci U S A
We developed computational method to predict genes involved in stem cell development, named Mining Developmentally Regulated Genes (MiDReG). Many of the relationships in biology are not symmetric, but asymmetric. For example, trees bearing fruit almost certainly have leaves, but trees outside of the fruiting season may or may not have leaves, depending on the time of year. MiDReG bases its predictions on asymmetric relationships of gene expression, mined from large publicly available biological data. When scientists give two genes known to be involved in the stem cell development, then MiDReG predicts new genes involved in the same developmental pathway based on asymmetric relationship among those genes.
We present a method termed mining developmentally regulated genes (MiDReG) to predict genes whose expression is either activated or repressed as precursor cells differentiate. MiDReG does not require gene expression data from intermediate stages of development. MiDReG is based on the gene expression patterns between the initial and terminal stages of the differentiation pathway, coupled with "if-then" rules (Boolean implications) mined from large-scale microarray databases. MiDReG uses two gene expression-based seed conditions that mark the initial and the terminal stages of a given differentiation pathway and combines the statistically inferred Boolean implications from these seed conditions to identify the relevant genes. The method was validated by applying it to B-cell development. The algorithm predicted 62 genes that are expressed after the KIT+ progenitor cell stage and remain expressed through CD19+ and AICDA+ germinal center B cells. qRT-PCR of 14 of these genes on sorted B-cell progenitors confirmed that the expression of 10 genes is indeed stably established during B-cell differentiation. Review of the published literature of knockout mice revealed that of the predicted genes, 63.4% have defects in B-cell differentiation and function and 22% have a role in the B cell according to other experiments, and the remaining 14.6% are not characterized. Therefore, our method identified novel gene candidates for future examination of their role in B-cell development. These data demonstrate the power of MiDReG in predicting functionally important intermediate genes in a given developmental pathway that is defined by a mutually exclusive gene expression pattern.