From Publications to Knowledge Graphs

  • Panos ConstantopoulosEmail author
  • Vayianos Pertsas
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1197)


We address the task of compiling structured documentation of research processes in the form of knowledge graphs by automatically extracting information from publications and associating it with information from other sources. This challenge has not been previously addressed at the level described here. We have developed a process and a system that leverages existing information from DBpedia, retrieves articles from repositories, extracts and interrelates various kinds of named and non-named entities by exploiting article metadata, the structure of text as well as syntactic, lexical and semantic constraints, and populates a knowledge base in the form of RDF triples. An ontology designed to represent scholarly practices is driving the whole process. Rule -based and machine learning- based methods that account for the nature of scientific texts and a wide variety of writing styles have been developed for the task. Evaluation on datasets from three disciplines, Digital Humanities, Bioinformatics, and Medicine, shows very promising performance.


Information extraction Process mining Knowledge base creation Machine learning Ontology population 


  1. 1.
    Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications. J. Assoc. Inf. Sci. Technol. 66, 2215–2222 (2015)CrossRefGoogle Scholar
  2. 2.
    Renear, A.H., Palmer, C.L.: Strategic reading, ontologies, and the future of scientific publishing. Science 325, 828–832 (2009)CrossRefGoogle Scholar
  3. 3.
    Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: SemEval 2017 task 10: ScienceIE, pp. 546–555 (2017)Google Scholar
  4. 4.
    Pertsas, V., Constantopoulos, P.: Scholarly ontology: modelling scholarly practices. Int. J. Digit. Libr. 18, 173–190 (2017)CrossRefGoogle Scholar
  5. 5.
    Gerber, D., Hellmann, S., Bühmann, L., Soru, T., Usbeck, R., Ngonga Ngomo, A.-C.: Real-time RDF extraction from unstructured data streams. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 135–150. Springer, Heidelberg (2013). Scholar
  6. 6.
    Lehmann, J., et al.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6, 167–195 (2015). Scholar
  7. 7.
    Chalkidis, I., Michos, A., Androutsopoulos, I.: Extracting contract elements. In: ICAL, London, p. 10 (2017)Google Scholar
  8. 8.
    Stern, R., Sagot, B.: Population of a knowledge base for news metadata from unstructured text and web data. In: AKBC-WEKEX 2012, Montreal, Canada, pp. 35–40 (2012)Google Scholar
  9. 9.
    Makki, J., Alquier, A.-M., Prince, V.: Ontology population via NLP techniques in risk management. Int. J. Humanit. Soc. Sci. 3, 212–217 (2008)Google Scholar
  10. 10.
    Buitelaar, P., Cimiano, P., Frank, A., Hartung, M., Racioppa, S.: Ontology-based information extraction and integration from heterogeneous data sources. Int. J. Hum. Comput. Stud. 66, 759–788 (2008). Scholar
  11. 11.
    Pertsas, V., Constantopoulos, P.: Ontology-driven information extraction from research publications. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J.C. (eds.) TPDL 2018. LNCS, vol. 11057, pp. 241–253. Springer, Cham (2018). Scholar
  12. 12.
    Goldberg, Y.: A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57, 345–420 (2015)MathSciNetCrossRefGoogle Scholar
  13. 13.
    QasemiZadeh, B., Schumann, A.-K.: The ACL RD-TEC 2.0: a language resource for evaluating term extraction and entity recognition methods. In: Proceedings of the 10th Edition of the Language Resources and Evaluation Conference, pp. 1862–1868 (2016)Google Scholar
  14. 14.
    Lee, L.-H., Lee, K.-C., Tseng, Y.-H.: The NTNU system at SemEval-2017 task 10: extracting keyphrases and relations from scientific publications using multiple CRFs. In: 11th International Workshop on Semantic Evaluation (SemEval 2017), pp. 950–954 (2017)Google Scholar
  15. 15.
    Luan, Y., Ostendorf, M., Hajishirzi, H.: Scientific information extraction with semi-supervised neural tagging, pp. 2631–2641 (2017)Google Scholar
  16. 16.
    Sateli, B., Witte, R.: What’s in this paper? Combining rhetorical entities with linked open data for semantic literature querying. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1023–1028. ACM (2015)Google Scholar
  17. 17.
    Osborne, F., de Ribaupierre, H., Motta, E.: TechMiner: extracting technologies from academic publications. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds.) EKAW 2016. LNCS (LNAI), vol. 10024, pp. 463–479. Springer, Cham (2016). Scholar
  18. 18.
    Sateli, B., Witte, R.: Semantic representation of scientific literature: bringing claims, contributions and named entities onto the Linked Open Data cloud. PeerJ Comput. Sci. 1, e37 (2015)CrossRefGoogle Scholar
  19. 19.
    Song, Y., Yi, E., Kim, E., Lee, G.G., Park, S.J.: POSBIOTM-NER: a machine learning approach for bio-named entity recognition, Korea, 305–350 (2004)Google Scholar
  20. 20.
    Plake, C., et al.: A support vector classifier for gene name recognition. In: BioCreAtIvE Workshop, Granada, Spain, pp. 1–5 (2004)Google Scholar
  21. 21.
    Gupta, S., Manning, C.: Analyzing the dynamics of research by extracting key aspects of scientific papers. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1–9 (2011)Google Scholar
  22. 22.
    Pertsas, V., Constantopoulos, P., Androutsopoulos, I.: Ontology driven extraction of research processes. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 162–178. Springer, Cham (2018). Scholar
  23. 23.
    Ruch, P., et al.: Using argumentation to extract key sentences from biomedical abstracts. Int. J. Med. Inf. 76, 195–200 (2007)CrossRefGoogle Scholar
  24. 24.
    Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefGoogle Scholar
  25. 25.
    De Sitter, A., Calders, T., Daelemans, W.: A formal framework for evaluation of information extraction, University of Antwerp (2004)Google Scholar
  26. 26.
    Do, H.H.N., Chandrasekaran, M.K., Cho, P.S., Kan, M.-Y.M.Y.: Extracting and matching authors and affiliations in scholarly documents. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013, p. 219 (2013)Google Scholar
  27. 27.
    Lindsay, A., Read, J., Ferreira, J.F., Hayton, T., Porteous, J., Gregory, P.: Framer: planning models from natural language action descriptions. In: Proceedings ICAPS, pp. 434–442 (2017)Google Scholar
  28. 28.
    Feng, W., Zhuo, H.H., Kambhampati, S.: Extracting action sequences from texts based on deep reinforcement learning (2018)Google Scholar
  29. 29.
    Mei, H., Bansal, M., Walter, M.R.: Listen, attend, and walk: neural mapping of navigational instructions to action sequences (2015)Google Scholar
  30. 30.
    Yeh, A.: More accurate tests for the statistical significance of result differences. In: Coling 2000 (2000)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of InformaticsAthens University of Economics and BusinessAthensGreece
  2. 2.Digital Curation UnitIMSI-Athena Research CentreMarousiGreece

Personalised recommendations