Classroom Lecture Recognition

  • Isabel Trancoso
  • Ricardo Nunes
  • Luís Neves
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3960)


The main goal of this work is to provide automatic transcriptions of classroom lectures for e-learning and e-inclusion applications. The first experiments using a recognition system trained for Broadcast News resulted in word error rates near 60%, clearly confirming the need for adaptation to the specific topic of the lectures, on one hand, and for better strategies for handling spontaneous speech. This paper describes the different domain adaptation steps that lowered the error rate to 45%, with very little transcribed adaptation material. It also includes a qualitative analysis of the different types of error, focusing on the ones related to a very high rate of disfluencies.


Acoustic Model Spontaneous Speech Word Error Rate Broadcast News Discourse Marker 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Shriberg, E.: Spontaneous speech: How people really talk, and why engineers should care. In: Proc. Interspeech 2005, Lisbon, Portugal (2005)Google Scholar
  2. 2.
    Furui, S., Iwano, K., Hori, C., Shinozaki, T., Saito, Y., Tamura, S.: Ubiquitous speech processing. In: Proc. ICASSP 2001, Salt Lake City, USA (2001)Google Scholar
  3. 3.
    Lamel, L., Adda, G., Bilinski, E., Gauvain, J.L.: Transcribing lectures and seminars. In: Proc. Interspeech 2005, Lisbon, Portugal (2005)Google Scholar
  4. 4.
    Glass, J.R., Hazen, T.J., Hetherington, I.L., Wang, C.: Analysis and processing of lecture audio data: Preliminary investigations. In: Proc. Human Language Technology NAACL, Speech Indexing Workshop, Boston (2004)Google Scholar
  5. 5.
    Lindstrm, A.: English and Other Foreign Linguistic Elements in Spoken Swedish: Studies of Productive Processes and Their Modelling Using Finite-State Tools. PhD thesis, Linkping University (2004)Google Scholar
  6. 6.
    Trancoso, I., Neto, J., Meinedo, H., Amaral, R.: Evaluation of an alert system for selective dissemination of broadcast news. In: Proc. Eurospeech 2003, Geneva, Switzerland (2003)Google Scholar
  7. 7.
    Meinedo, H., Neto, J.: Audio segmentation, classification and clustering in a broadcast news task. In: Proc. ICASSP 2003, Hong Kong (2003)Google Scholar
  8. 8.
    Caseiro, D., Trancoso, I., Oliveira, L., Viana, C.: Grapheme-to-phone using finite state transducers. In: Proc. 2002 IEEE Workshop on Speech Synthesis, SantaMonica, CA, USA (2002)Google Scholar
  9. 9.
    Trancoso, I., Viana, C., Mascarenhas, M., Teixeira, C.: On deriving rules for nativised pronunciation in navigation queries. In: Proc. Eurospeech 1999, Budapest, Hungary (1999)Google Scholar
  10. 10.
    Stolcke, A.: Srlim - an extensible language modeling toolkit. In: Proc. ICSLP 2002, Denver, USA (2002)Google Scholar
  11. 11.
    Gauvain, J., Lamel, L., Adda, G.: Developments in continuous speech dictation using the arpa wsj task. In: Proc. ICASSP 1995, Detroit, USA (1995)Google Scholar
  12. 12.
    Martins, C., Neto, J., Almeida, L.: Using partial morphological analysis in language modeling estimation for large vocabulary portuguese speech recognition. In: Proc. Eurospeech 1999, Budapest, Hungary (1999)Google Scholar
  13. 13.
    LDC: Simple metadata annotation specification version 6.2. Technical report, Linguistic Data Consortium (2004) Google Scholar
  14. 14.
    Mata, A.: For a Study of Intonation in Spontaneous and Prepared Speec. In: European portuguese: Methodology, Results and Didactic Implications (in Portuguese). PhD thesis, FLUL, Lisbon (1998)Google Scholar
  15. 15.
    Goto, M., Itou, K., Hayamizu, S.: A real-time filled pause detection system for spontaneous speech recognition. In: Proc. Eurospeech 1999, Budapest, Hungary (1999)Google Scholar
  16. 16.
    Heeman, P., Allen, J.: Speech repairs, intonational phrases and discourse markers: modeling speakers’ utterances in spoken dialog. Computational Linguistics 4(25), 527–571 (1999)Google Scholar
  17. 17.
    Johnson, M., Charniak, E.: A tag-based noisy channel model of speech repairs. In: Proc. ACL, Barcelona, Spain (2004)Google Scholar
  18. 18.
    Honal, M., Schultz, T.: Automatic disfluency removal on recognized spontaneous speech - rapid adaptation to speaker-dependent disfluencies. In: Proc. ICASSP 2005, Philadelphia, USA (2005)Google Scholar
  19. 19.
    Snover, M., Schwartz, R., Dorr, B., Makhoul, J.: Rt-s: Surface rich transcription scoring, methodology, and initial results. In: Proceedings of the Rich Transcription 2004 Workshop, Montreal, Canada (2004) Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Isabel Trancoso
    • 1
  • Ricardo Nunes
    • 1
  • Luís Neves
    • 1
  1. 1.INESC ID / ISTLisbonPortugal

Personalised recommendations