Improving Image Annotations Using WordNet

  • Yohan Jin
  • Lei Wang
  • Latifur Khan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3665)


The development of technology generates huge amounts of non-textual information, such as images. An efficient image annotation and retrieval system is highly desired. Clustering algorithms make it possible to represent visual features of images with finite symbols. Based on this, many statistical models, which analyze correspondence between visual features and words and discover hidden semantics, have been published. These models improve the annotation and retrieval of large image databases. However, current state of the art including our previous work produces too many irrelevant keywords for images during annotation. In this paper, we propose a novel approach that augments the classical model with generic knowledge-based, WordNet. Our novel approach strives to prune irrelevant keywords by the usage of WordNet. To identify irrelevant keywords, we investigate various semantic similarity measures between keywords and finally fuse outcomes of all these measures together to make a final decision. We have implemented various models to link visual tokens with keywords based on knowledge-based, WordNet and evaluated performance using precision, and recall using benchmark dataset. The results show that by augmenting knowledge-based with classical model we can improve annotation accuracy by removing irrelevant keywords.


Semantic Similarity Image Annotation Word Sense Disambiguation Translation Model Visual Token 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [BD1]
    Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)zbMATHCrossRefGoogle Scholar
  2. [BJ1]
    Blei, D., Jordan, M.: Modeling annotated data. In: 26th Annual Int. ACM SIGIR Conf., Toronto, Canada (2003)Google Scholar
  3. [BP1]
    Banerjee, S., Pedersen, T.: An adpated Lesk algorithm for word sense disambiguation using WordNet. In: Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, Pittsburgh (2001)Google Scholar
  4. [BP2]
    Banerjee, S., Pedersen, T.: Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pp. 805–810 (2003)Google Scholar
  5. [Corel1]
  6. [CorelKDD1]
  7. [DB1]
    Duygulu, P., Barnard, K., de Freitas, N., Forsyth, D.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. [Eccv02]
  9. [JLM1]
    Jeon, J., Lavrenko, V., Manmatha, R.: Automatic Image Annotation and Retrieval using Cross-Media Relevance Models. In: 26th Annual Int. ACM SIGIR Conference, Toronto, Canada (2003)Google Scholar
  10. [JC1]
    Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Procedeeings on International Conference on Research in Computational Linguistics, Taiwan (1997)Google Scholar
  11. [KJC1]
    Kang, F., Jin, R., Chai, J.Y.: Regularizing Translation Models for Better Automatic Image Annotation. In: CIKM 2004, pp. 350–359 (2004)Google Scholar
  12. [LW1]
    Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans. on Pattern Analysis and Machine Intelligence 25(10) (2003)Google Scholar
  13. [Lea1]
    Leacock, C., Chodorow, M.: Combining Local Context and WordNet Similarity for Word Sense Identification. In: Fellbaum, C. (ed.) WordNet: An electronic lexical database, pp. 265–283. MIT Press, Cambridge (1998)Google Scholar
  14. [Les1]
    Lesk, M.: Automatic sense disambiguation machine readable dictionaries: How to tell a pine cone from an ice cream cone. In: Proceedings of SIGDOC 1986 (1986)Google Scholar
  15. [Lin1]
    Lin, D.: Using syntatic dependency as a local context to reslove word sense ambiguity. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, Madrid, pp. 64–71 (1997)Google Scholar
  16. [MTO1]
    Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRM 1999 Frist International Workshop on Multimedia Intellegent Storage and Retrieval Management (1999)Google Scholar
  17. [M1]
    Miller, G.: WordNet: An on-line lexical database. International Journal of Lexicography 3(4) (1990)Google Scholar
  18. [PYRD1]
    Pan, J.Y., Yang, H.J., Faloutsos, C., Duygulu, P.: Automatic Multimedia Cross-modal Correlation Discovery. In: KDD 2004, Seattle, WA (August 2004)Google Scholar
  19. [PBT1]
    Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  20. [Res1]
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (1995)Google Scholar
  21. [SM1]
    Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. In: IEEE Conf. Computer Vision and Pattern Recognition(CVPR), Puerto Rico (1997)Google Scholar
  22. [WK1]
    Wang, L., Khan, L.: Automatic Image Annotation and Retrieval using Weighted Feature Selection. To appear in a special issue in Multimedia Tools and Applications. Kluwer Publisher, Dordrecht (2005)Google Scholar
  23. [YCBF1]
    Yang, Y., Carbonell, J.G., Brown, R.D., Frederking, R.E.: Translingual Information Retrieval: Learning from Bilingual Corpora. Artificial Intelligence 103(1-2), 323–345 (2003)CrossRefGoogle Scholar
  24. [ZG1]
    Zhao, R., Grosky, W.: Narrowing the semantic gap - improved text-based web document retrieval using visual features. IEEE Trans. on Multimedia 4(2), 189–200 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Yohan Jin
    • 1
  • Lei Wang
    • 1
  • Latifur Khan
    • 1
  1. 1.Department of Computer ScienceUniversity of Texas at Dallas RichardsonUSA

Personalised recommendations