An Annotated Corpus and a Grammar Model of Theorem Description

  • Yusuke Baba
  • Masakazu Suzuki
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2594)


Digitizing documents is becoming increasingly popular in various fields, and training computers to understand the contents of digitized documents is of growing interest. Since the early 90’s, research of natural language processing using large annotated corpora such as the Penn TreeBank has developed. Applying the methods of corpus-based research, we built a syntactically annotated corpus of theorem descriptions, using a book of set theory, and extracted a grammar model of theorems from the obtained corpus, as the first step to understanding mathematical documents by computer.


These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Authors and Affiliations

  • Yusuke Baba
    • 1
  • Masakazu Suzuki
    • 2
  1. 1.Graduate School of MathematicsKyushu UniversityJapan
  2. 2.Faculty of MathematicsKyushu UniversityHigashi-kuJapan

