Tracking Body Parts of Multiple People for Multi-person Multimodal Interface

  • Sébastien Carbini
  • Jean-Emmanuel Viallet
  • Olivier Bernier
  • Bénédicte Bascle
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3766)


Although large displays could allow several users to work together and to move freely in a room, their associated interfaces are limited to contact devices that must generally be shared. This paper describes a novel interface called SHIVA (Several-Humans Interface with Vision and Audio) allowing several users to interact remotely with a very large display using both speech and gesture. The head and both hands of two users are tracked in real time by a stereo vision based system. From the body parts position, the direction pointed by each user is computed and selection gestures done with the second hand are recognized. Pointing gesture is fused with n-best results from speech recognition taking into account the application context. The system is tested on a chess game with two users playing on a very large display.


Speech Recognition Speech Signal Gesture Recognition Stereo Camera Application Context 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bellik, Y.: Technical Requirements for a Successful Multimodal Interaction. In: International Workshop on Information Presentation and Natural Multimodal Dialogue, Verona, Italy (2001)Google Scholar
  2. 2.
    Bolt, R.A.: Put-that-there: Voice and gesture at the graphics interface. In: Proceedings of the 7th Annual Conference on Computer Graphics and Interactive Techniques, Seattle, Washington, pp. 262–270 (1980)Google Scholar
  3. 3.
    Carbini, S., Viallet, J.E., Bernier, O.: Pointing Gesture Visual Recognition for Large Display. In: Pointing 2004 ICPR Workshop, Cambridge (2004)Google Scholar
  4. 4.
    Chang, T.H., Gong, S.: Tracking multiple people with a multi-camera system. In: IEEE Workshop on Multi-Object Tracking, Vancouver, Canada, pp. 19–26 (2001)Google Scholar
  5. 5.
    Checka, N., Wilson, K., Siracusa, M., Darrell, T.: Multiple Person and Speaker Activity Tracking with a Particle Filter. In: ICASSP, Montreal, Canada (2004)Google Scholar
  6. 6.
    Demirdjian, D., Darrell, T.: 3-D Articulated Pose Tracking for Untethered Diectic Reference. In: Proceedings of International Conference on Multimodal Interfaces, Pittsburgh, Pennsylvania, p. 267 (2002)Google Scholar
  7. 7.
    Eisenstein, J., Christoudias, C.M.: A Salience-Based Approach to Gesture- Speech Alignement, HLT-NAACL, pp. 25-32, Boston, Massachusetts (2004)Google Scholar
  8. 8.
    Feraud, R., Bernier, O., Viallet, J.E., Collobert, M.: A fast and accurate face detector based on neural networks. PAMI 23 (1), 42–53 (2001)Google Scholar
  9. 9.
    Jojic, N., Brumitt, B., Meyers, B., Harris, S.: Detecting and Estimating Pointing Gestures in Dense Disparity Maps. In: IEEE International Conference on Face and Gesture recognition, Grenoble, France, p. 468 (2000)Google Scholar
  10. 10.
    Kehl, R., Van Gool, L.: Real-time Pointing Gesture Recognition for an Immersive Environment. In: IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Korea, pp. 577–582 (2004)Google Scholar
  11. 11.
    Kettebekov, S., Sharma, R.: Understanding Gestures in a Multimodal Human Computer Interaction. International Journal of Artificial Intelligence Tools 9 (2), 205–223 (2000)CrossRefGoogle Scholar
  12. 12.
    Krahnstoever, N., Kettebekov, S., Yeasin, M., Sharma, R.: A Real-Time Framework for Natural Multimodal Interaction with Large Screen Displays. In: International Conference on Multimodal Interfaces, Pittsburgh, Pennsylvania, p. 349 (2002)Google Scholar
  13. 13.
    Nickel, K., Seemann, E., Stiefelhagen, R.: 3D-Tracking of Head and Hands for Pointing Gesture Recognition in a Human-Robot Interaction Scenario. In: IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Korea, p. 565 (2004)Google Scholar
  14. 14.
    Oviatt, S., De Angeli, A., Kuhn, K.: Integration and synchronization of input modes during multimodal human-computer interaction. In: CHI 1997: Proceedings of the SIGCHI Conference on Human factors in computing systems, Atlanta, Georgia, pp. 415–422 (1997)Google Scholar
  15. 15.
    Polat, E., Yeasin, M., Sharma, R.: A Tracking Framework for Collaborative Human Computer Interaction. In: International Conference on Multimodal Interfaces, Pittsburgh, Pennsylvania, pp. 27–32 (2002)Google Scholar
  16. 16.
    Sato, K., Aggarwal, J.K.: Tracking and recognizing two-person interactions in outdoor image sequences. In: Workshop on Multi-Object Tracking, Vancouver, Canada, pp. 87–94 (2001)Google Scholar
  17. 17.
    Yamamoto, Y., Yoda, I., Sakaue, K.: Arm-Pointing Gesture Interface Using Surrounded Stereo Cameras System. In: ICPR (International Conference on Pattern Recognition), Cambridge, UK, pp. 965–970 (2004)Google Scholar
  18. 18.

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Sébastien Carbini
    • 1
  • Jean-Emmanuel Viallet
    • 1
  • Olivier Bernier
    • 1
  • Bénédicte Bascle
    • 1
  1. 1.France Télécom R&DLannion CedexFrance

Personalised recommendations