The Arrowsmith Project: 2005 Status Report

  • Neil R. Smalheiser
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3735)


In the 1980s, Don Swanson proposed the concept of “undiscovered public knowledge,” and published several examples in which two disparate literatures (i.e., sets of articles having no papers in common, no authors in common, and few cross-citations) nevertheless held complementary pieces of knowledge that, when brought together, made compelling and testable predictions about potential therapies for human disorders. In the 1990s, Don and I published more predictions together and created a computer-assisted search strategy (“Arrowsmith”). At first, the so-called one-node search was emphasized, in which one begins with a single literature (e.g., that dealing with a disease) and searches for a second unknown literature having complementary knowledge (e.g. that dealing with potential therapies). However, we soon realized that the two-node search is better aligned to the information practices of most biomedical investigators: in this case, the user chooses two literatures and then seeks to identify meaningful links between them. Could typical biomedical investigators learn to carry out Arrowsmith analyses? Would they find routine occasions for using such a sophisticated tool? Would they uncover significant links that affect their experiments? Four years ago, we initiated a project to answer these questions, working with several neuroscience field testers. Initially we expected that investigators would spend several days learning how to carry out searches, and would spend several days analyzing each search. Instead, we completely re-designed the user interface, the back-end databases, and the methods of processing linking terms, so that investigators could use Arrowsmith without any tutorial at all, and requiring only minutes to carry out a search. The Arrowsmith Project now hosts a suite of free, public tools. It has launched new research spanning medical informatics, genomics and social informatics, and has, indeed, assisted investigators in formulating new experiments, with direct impact on basic science and neurological diseases.


MeSH Term Biomedical Literature Unify Medical Language System Latent Semantic Indexing MeSH Heading 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Swanson, D.R.: Fish oil, Raynaud’s Syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7–18 (1986)Google Scholar
  2. 2.
    Swanson, D.R.: Undiscovered public knowledge. Library Q 56, 103–118 (1986)CrossRefGoogle Scholar
  3. 3.
    Swanson, D.R.: Two medical literatures that are logically but not bibliographically connected. JASIS 38, 228–233 (1987)CrossRefGoogle Scholar
  4. 4.
    Swanson, D.R.: Migraine and magnesium: eleven neglected connections. Perspect. Biol. Med. 31, 526–557 (1988)Google Scholar
  5. 5.
    Smalheiser, N.R., Swanson, D.R.: Assessing a gap in the biomedical literature: magnesium deficiency & neurologic disease. Neurosci. Res. Commun. 15, 1–9 (1994)Google Scholar
  6. 6.
    Smalheiser, N.R., Swanson, D.R.: Linking estrogen to Alzheimer’s Disease: an informatics approach. Neurology 47, 809–810 (1996)Google Scholar
  7. 7.
    Smalheiser, N.R., Swanson, D.R.: Indomethacin and Alzheimer s Disease. Neurology 46, 583 (1996)Google Scholar
  8. 8.
    Smalheiser, N.R., Swanson, D.R.: Calcium-independent phospholipase A2 and schizophrenia. Arch. Gen. Psychiat. 55, 752–753 (1998)CrossRefGoogle Scholar
  9. 9.
    Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif. Intelligence 91, 183–203 (1997)zbMATHCrossRefGoogle Scholar
  10. 10.
    Smalheiser, N.R., Swanson, D.R.: Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. Computer Methods and Programs in Biomedicine 57, 149–153 (1998)CrossRefGoogle Scholar
  11. 11.
    Smalheiser, N.R.: Predicting emerging technologies with the aid of text-based data mining: a micro approach. Technovation 21, 689–693 (2001)CrossRefGoogle Scholar
  12. 12.
    Swanson, D.R., Smalheiser, N.R., Bookstein, A.: Information discovery from complementary literatures: categorizing viruses as potential weapons. JASIST 52, 797–812 (2001)CrossRefGoogle Scholar
  13. 13.
    Weeber, M., Vos, R., Baayen, R.H.: Using concepts in literature-based discovery: Simulating Swanson’s raynaud - fish oil and migraine - magnesium discoveries. JASIST 52, 548–557 (2001)CrossRefGoogle Scholar
  14. 14.
    Weeber, M., Vos, R., Klein, H., De Jong-Van Den Berg, L.T., Aronson, A.R., Molema, G.: Generating hypotheses by discovering implicit associations in the literature: a case report of a search for new potential therapeutic uses for thalidomide. JAMIA 10, 252–259 (2003)Google Scholar
  15. 15.
    Torvik, V.I., Triantaphyllou, E.: Guided Inference of Nested Monotone Boolean Functions. Information Sciences 151, 171–200 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Torvik, V.I., Triantaphyllou, E.: Discovering rules that govern monotone phenomena. In: Triantaphyllou, Felici (eds.) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Massive Computing Series, Ch. 4, pp. 149–192. Springer, Heidelberg (2005) (in press)Google Scholar
  17. 17.
    Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proc AMIA Symp., pp. 17–21 (2001)Google Scholar
  18. 18.
    Lindberg, D.A., Humphreys, B.L., McCray, A.T.: The Unified Medical Language System. Methods Inf Med. 32(4), 281–291 (1993) Related Articles, LinksGoogle Scholar
  19. 19.
    Tanabe, L., Wilbur, W.J.: Generation of a large gene/protein lexicon by morphological pattern analysis. J. Bioinform Comput Biol. 1(4), 611–626 (2004)CrossRefGoogle Scholar
  20. 20.
    Torvik, V.I., Weeber, M., Swanson, D.R., Smalheiser, N.R.: A probabilistic similarity metric for MEDLINE records: a model for author name disambiguation. JASIST 56(2), 140–158 (2005)CrossRefGoogle Scholar
  21. 21.
    Smalheiser, N.R., Perkins, G.A., Jones, S.: Guidelines for negotiating scientific collaborations. PLoS Biology 3(6), e217 (2005)CrossRefGoogle Scholar
  22. 22.
    Palmer, C.L., Cragin, M.H., Hogan, T.P.: Information at the Intersections of Discovery: Case Studies in Neuroscience. In: Proc. ASIST annual meeting, pp. 448–455 (2004)Google Scholar
  23. 23.
    Kostoff, R.N., Block, J.A., Stump, J.A., Pfeil, K.M.: Information content in MEDLINE record fields. Int. J. Med Inform. 73(6), 515–527 (2004)CrossRefGoogle Scholar
  24. 24.
    Ding, J., Berleant, D., Nettleton, D., Wurtele, E.: Mining MEDLINE: abstracts, sentences, or phrases? In: Pac. Symp. Biocomput., pp. 326–337 (2002)Google Scholar
  25. 25.
    Shah, P.K., Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Information extraction from full text scientific articles: where are the keywords? BMC Bioinformatic 4, 20 (2003)Google Scholar
  26. 26.
    Tanabe, L., Scherf, U., Smith, L.H., Lee, J.K., Hunter, L., Weinstein, J.N.: MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques 27(6), 1210–1214, 1216–1217 (1999)Google Scholar
  27. 27.
    Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics. 5(1), 147 (2004)CrossRefGoogle Scholar
  28. 28.
    Divoli, A., Attwood, T.: BioIE: extracting informative sentences from the biomedical literature. Bioinformatics 21(9), 2138–2139 (2005)CrossRefGoogle Scholar
  29. 29.
    Chen, H., Martinez, J., Ng, T.D., Schatz, B.R.: A concept space approach to addressing the vocabulary problem in scientific information retrieval: An experiment on the worm community system. JASIST 48(1), 17–31 (1997)CrossRefGoogle Scholar
  30. 30.
    Lindsay, R.K., Gordon, M.D.: Literature-based discovery by lexical statistics. JASIS 50, 574–587 (1999)CrossRefGoogle Scholar
  31. 31.
    Gordon, M.D., Dumais, S.: Using latent semantic indexing for literature based discovery. JASIS 49, 674–685 (1998)CrossRefGoogle Scholar
  32. 32.
    Hristovski, D., Peterlin, B., Mitchell, J.A., Humphrey, S.M.: Using literature-based discovery to identify disease candidate genes. Int. J. Med. Inform. 74, 289–298 (2005)CrossRefGoogle Scholar
  33. 33.
    Srinivasan, P.: Text Mining: Generating Hypotheses from MEDLINE. JASIST 55(5), 396–413 (2004)CrossRefGoogle Scholar
  34. 34.
    Wren, J.D., Bekeredjian, R., Stewart, J.A., Shohet, R.V., Garner, H.R.: Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20(3), 389–398 (2004)CrossRefGoogle Scholar
  35. 35.
    Wren, J.D., Garner, H.R.: Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. Bioinformatics 20, 191–198 (2004)CrossRefGoogle Scholar
  36. 36.
    Wren, J.D.: Extending the mutual information measure to rank inferred literature relationships. BMC Bioinformatics 5(1), 145 (2004)CrossRefGoogle Scholar
  37. 37.
    Pratt, W., Yetisgen-Yildiz, M.: LitLinker: Capturing Connections across the Biomedical Literature. In: Proceedings of the International Conference on Knowledge Capture (K-Cap 2003), Florida, October 2003, pp. 105–112 (2003)Google Scholar
  38. 38.
    Hearst, M.A.: Untangling text data mining. In: Proc. Assoc. Comp. Ling. (1999)Google Scholar
  39. 39.
    Smalheiser, N.R.: EST analyses predict the existence of a population of chimeric microRNA precursor-mRNA transcripts expressed in normal human and mouse tissues. Genome Biology 4, 403 (2003)CrossRefGoogle Scholar
  40. 40.
    Smalheiser, N.R., Torvik, V.I.: A population-based statistical approach identifies parameters characteristic of human microRNA-mRNA interactions. BMC Bioinformatics 5, 139 (2004)CrossRefGoogle Scholar
  41. 41.
    Smalheiser, N.R., Torvik, V.I.: Mammalian microRNAs derived from genomic repeats. Trends in Genetics 21(6), 322–326 (2005)CrossRefGoogle Scholar
  42. 42.
    Smalheiser, N.R., Torvik, V.I.: Complications in mammalian microRNA target prediction. In: Ying, S.-Y. (ed.) MicroRNA: Protocols. Methods in Molecular Biology. Humana Press (2005) (to be published)Google Scholar
  43. 43.
    Lugli, G., Larson, J., Martone, M.E., Jones, Y., Smalheiser, N.P.: Dicer and eIF2c are enriched at postsynaptic densities in adult mouse brain and are modified by neuronal activity in a calpain-dependent manner. J. Neurochem. (2005) (in press)Google Scholar
  44. 44.
    Smalheiser, N.R.: Informatics and hypothesis-driven research. EMBO Reports 3, 702 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Neil R. Smalheiser
    • 1
  1. 1.UIC Psychiatric InstituteUniversity of Illinois-ChicagoChicagoUSA

Personalised recommendations