A Bare Bones Approach to Literature-Based Discovery: An Analysis of the Raynaud’s/Fish-Oil and Migraine-Magnesium Discoveries in Semantic Space

  • R. J. Cole
  • P. D. Bruza
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3735)


Literature discovery can be characterized as a goal directed search for previously unknown implicit knowledge captured within a collection of scientific articles. Swanson’s serendipitous discovery of a treatment for Raynaud’s disease by dietary fish-oil while browsing Medline, an online collection of biomedical literature, exemplifies such a discovery. By means of a series of experiments, the impact of stop words, various weighting schemes, discovery mechanisms, and contextual reduction are studied in relation to replicating the Raynaud/fish-oil and migraine-magnesium discoveries by operational means. Two aspects of discovery were brought under focus: (i) the discovery of intermediate, or B –terms, and (ii) the discovery of indirect AC connections via the B–terms. A semantic space representation of the underlying corpus is computed and discoveries automated by computing associations between words in both higher and contextually reduced spaces. It was found that the discovery of B–terms and AC connections can be achieved to an encouraging degree with a standard stop word list. In addition, no single weighting scheme seems to suffice. Log-likelihood appears to be potentially effective for leading to the discovery of B–terms, whereas both odds ratio and simple co-occurrence frequencies both facilitate the discovery of AC connections. With regard to discovery mechanism, both semantic similarity (via cosine) and information flow computation seem promising for computing AC connections, but more research is needed to understand their relative strengths and weaknesses. Discovery in a contextually reduced semantic space revealed mixed results.


Singular Value Decomposition Weighting Scheme Latent Semantic Analysis Semantic Space Stop Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bruza, P., Song, D., McArthur, R.: Abduction in semantic space: Towards a logic of discovery. Logic Journal of the Interest Group in Pure and Applied Logics 12, 97–109 (2004)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Bruza, P.D., Song, D.: Inferring Query Models by Computing Information Flow. In: Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM 2002), pp. 260–269. ACM Press, New York (2002)Google Scholar
  3. 3.
    Burgess, C., Livesay, K., Lund, K.: Explorations in context space: words, sentences, discourse. Discourse Processes 25(2&3), 211–257 (1998)CrossRefGoogle Scholar
  4. 4.
    Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1994)Google Scholar
  5. 5.
    Gabbay, D., Woods, J.: The Reach of Abduction: Insight and Trial. A Practical Logic of Cognitive Systems, vol. 2. Elsevier, Amsterdam (2004); An early draft appeared as Lecture Notes from ESSLLI 2000 (European Summer School on Logic, Language and Information), Online Google Scholar
  6. 6.
    Gordon, M.D.: Literature-based discovery by lexical statistics. Journal of the American Society for Information Science 50, 574–587 (1999)CrossRefGoogle Scholar
  7. 7.
    Gordon, M.D., Dumais, S.: Using latent semantic indexing for literature based discovery. Journal of the American Society for Information Science 48, 674–685 (1998)CrossRefGoogle Scholar
  8. 8.
    Gordon, M.D., Lindsay, R.L.: Towards discovery support systems: A replication, re-examination, and extension of swanson’s work on literature-based discovery of a connection between raynaud’s and fish oil. Journal of the American Society for Information Science 47, 116–128 (1996)CrossRefGoogle Scholar
  9. 9.
    Kintsch, W.: Predication. Cognitive Science 25, 173–202 (2001)CrossRefGoogle Scholar
  10. 10.
    Landauer, T.K., Dumais, S.T.: A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104, 211–240 (1997)CrossRefGoogle Scholar
  11. 11.
    Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  12. 12.
    Levy, J.P., Bulliniaria, J.A.: Learning lexical properties from word usage patterns: Which context words should be used? Connectionist models of learning, development and evolution, 213–282 (1999)Google Scholar
  13. 13.
    Lowe, W.: What is the dimensionality of human semantic space? In: Proceedings of the 6th Neural Computation and Psychology workshop, pp. 303–311. Springer, Heidelberg (2000)Google Scholar
  14. 14.
    Lowe, W.: Towards a theory of semantic space. In: Moore, J.D., Stenning, K. (eds.) Proceedings of the Twenty-Third Annual Conference of the Cognitive Science Society, pp. 576–581. Lawrence Erlbaum Associates, Mahwah (2001)Google Scholar
  15. 15.
    Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments & Computers 28(2), 203–208 (1996)CrossRefGoogle Scholar
  16. 16.
    Patel, M., Bulliniaria, J.A., Levy, J.P.: Extracting semantic representations from large text corpora. In: Proceedings of the Fourth Neural Computation and Psychology Workshop, pp. 199–212 (1997)Google Scholar
  17. 17.
    Peirce, C.S.: The Nature of Meaning. In: Peirce Edition Project,(ed.) Essential Peirce: Selected Philosophical Writings, vol. 2(1893-1913), pp. 208–225. Indiana Univ. Press (1998)Google Scholar
  18. 18.
    Sahlgren, M.: Towards a flexible model of word meaning. In: Proceedings of AAAI Spring Symposium 2002, Palo Alto, California, USA, Stanford University (2002)Google Scholar
  19. 19.
    Srinivasan, P.: Text mining: Generating hypotheses from medline. Journal of the American Society for Information Science and Technology 55(5), 396–413 (2004)CrossRefGoogle Scholar
  20. 20.
    Swanson, D.R.: Fish oil, raynaud’s syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine 30, 7–18 (1986)Google Scholar
  21. 21.
    Swanson, D.R.: Undiscovered public knowledge. Library Quarterly 56, 103–118 (1986)CrossRefGoogle Scholar
  22. 22.
    Swanson, D.R.: Two medical literatures that are logically but not bibliographically connected. Journal of the American Society for Information Science 38, 228–233 (1987)CrossRefGoogle Scholar
  23. 23.
    Swanson, D.R., Smalheiser, N.R.: Implicit text linkages between medline records: Using arrowsmith as an aid to scientific discovery. Library Trends 48, 48–59 (1999)Google Scholar
  24. 24.
    Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures: A stimulus to scientific discovery. Artificial Intelligence 91(2), 183–203 (1997)zbMATHCrossRefGoogle Scholar
  25. 25.
    Weeber, M., Vos, R., Klein, H., de Jong-van den Berg, L.T.W.: Using concepts in literature-based discovery: simulating swanson’s raynaud- fish-oil and migraine-magnesium discoveries. JASIST 52(7), 548–557 (2001)CrossRefGoogle Scholar
  26. 26.
    Weeber, M., Klein, H., de Jong-can den Berg, L.T.W.: Using concepts in literature-based discovery: Simulating swanson’s raynaud-fish oil and migrain-magnesium discoveries. Journal of the American Society for Information Science and Technology 52(7), 548–557 (2001)Google Scholar
  27. 27.
    Widdows, D.: Geometry and Meaning. CSLI Publications (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • R. J. Cole
    • 1
  • P. D. Bruza
    • 2
  1. 1.School of Info. Tech. and Elec. Eng.University of Queensland 
  2. 2.Distributed Systems Technology CentreUniversity of Queensland 

Personalised recommendations