Virtual Gene: Using Correlations Between Genes to Select Informative Genes on Microarray Datasets

  • Xian Xu
  • Aidong Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3680)


Gene Selection is one class of most used data analysis algorithms on microarray datasets. The goal of gene selection algorithms is to filter out a small set of informative genes that best explains experimental variations. Traditional gene selection algorithms are mostly single-gene based. Some discriminative scores are calculated and sorted for each gene. Top ranked genes are then selected as informative genes for further study. Such algorithms ignore completely correlations between genes, although such correlations is widely known. Genes interact with each other through various pathways and regulative networks. In this paper, we propose to use, instead of ignoring, such correlations for gene selection. Experiments performed on three public available datasets show promising results.


Feature Selection Prediction Accuracy Gene Pair Gene Selection Feature Selection Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissue probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S.A. 96(12), 6745–6750 (1999)CrossRefGoogle Scholar
  2. 2.
    Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue classification with gene expression profiles 7, 559–583 (2000)Google Scholar
  3. 3.
    Bø, T., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biology 3(4), research0017.1–0017.11 (2002)Google Scholar
  4. 4.
    Bobashev, G.V., Das, S., Das, A.: Experimental design for gene microarray experiments and differential expression analysis. In: Methods of Microarray Data Analysis II, pp. 23–41 (2001)Google Scholar
  5. 5.
    Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machinesGoogle Scholar
  6. 6.
    Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97(457), 77–87 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Golub, T.R., et al.: Molecular classifications of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  8. 8.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)zbMATHCrossRefGoogle Scholar
  9. 9.
    Hastie, T., Tibshirani, R., Eisen, M., Alizadeh, A., Levy, R., Staudt, L., Chan, W., Botstein, D., Brown, P.: ’gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology 1(2) (2000)Google Scholar
  10. 10.
    Jaeger, J., Sengupta, R., Ruzzo, W.L.: Improved gene selection for classification of microarrays. In: Proc. PSB (2003)Google Scholar
  11. 11.
    Jain, A.K., Duin, R.P., Mao, J.: Statistical pattern recognition: A review. IEEE Transactions on pattern analysis and machine intelligence 22(1), 4–37 (2000)CrossRefGoogle Scholar
  12. 12.
    Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370–1386 (2004)CrossRefGoogle Scholar
  13. 13.
    Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C., Peterson, C.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7(6), 673–679 (2001)CrossRefGoogle Scholar
  14. 14.
    Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20, 2429–2437 (2004)CrossRefGoogle Scholar
  15. 15.
    Li, W., Grosse, I.: Gene selection criterion for discriminant microarray data analysis based on extreme value distributions. In: Proc. RECOMB (2003)Google Scholar
  16. 16.
    Lu, Y., Han, J.: Cancer classification using gene expression data. Genome Inform 28, 243–268 (2003)zbMATHGoogle Scholar
  17. 17.
    Mardia, K., Kent, J., Bibby, J.: Multivariate Analysis. Academic Press, London (1979)zbMATHGoogle Scholar
  18. 18.
    Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E.S., Golub, T.: Multiclass cancer diagnosis using tumor gene expression signatures. PNAS 98(26), 15149–15154 (2001)CrossRefGoogle Scholar
  19. 19.
    Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. PNAS 98(9), 5116–5121 (2001)zbMATHCrossRefGoogle Scholar
  20. 20.
    Wang, Y., Makedon, F.S., Ford, J.C., Pearlman, J.: Hykgene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 21(8), 1530–1537 (2005)CrossRefGoogle Scholar
  21. 21.
    Wu, Y., Zhang, A.: Feature selection for classifying high-dimensional numerical data. In: IEEE Conference on Computer Vision and Pattern Recognition 2004, vol. 2, pp. 251–258 (2004)Google Scholar
  22. 22.
    Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proc. 18th International Conf. on Machine Learning, pp. 601–608. Morgan Kaufmann, San Francisco (2001)Google Scholar
  23. 23.
    Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Proc. of SIGKDD (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Xian Xu
    • 1
  • Aidong Zhang
    • 1
  1. 1.Department of Computer Science and EngineeringState University of New York at BuffaloBuffaloUSA

Personalised recommendations