A Joint System for Person Tracking and Face Detection

  • Zhenqiu Zhang
  • Gerasimos Potamianos
  • Andrew Senior
  • Stephen Chu
  • Thomas S. Huang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3766)


Visual detection and tracking of humans in complex scenes is a challenging problem with a wide range of applications, for example surveillance and human-computer interaction. In many such applications, time-synchronous views from multiple calibrated cameras are available, and both frame-view and space-level human location information is desired. In such scenarios, efficiently combining the strengths of face detection and person tracking is a viable approach that can provide both levels of information required and improve robustness. In this paper, we propose a novel vision system that detects and tracks human faces automatically, using input from multiple calibrated cameras. The method uses an Adaboost algorithm variant combined with mean shift tracking applied on single camera views for face detection and tracking, and fuses the results on multiple camera views to check for consistency and obtain the three-dimensional head estimate. We apply the proposed system to a lecture scenario in a smart room, on a corpus collected as part of the CHIL European Union integrated project. We report results on both frame-level face detection and three-dimensional head tracking. For the latter, the proposed algorithm achieves similar results with the IBM “PeopleVision” system.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    CHIL project web-site,
  2. 2.
    Rowley, H.A., Baluja, S., Kanade, T.: Neural network-based face detection. IEEE Trans. Pattern Anal. Machine Intell. 20(1), 23–28 (1998)CrossRefGoogle Scholar
  3. 3.
    Roth, D., Yang, M.-H., Ahuja, N.: A SNoW-based face detector. In: Proc. NIPS (2000)Google Scholar
  4. 4.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proc. Conf. Computer Vision Pattern Recog. (2001)Google Scholar
  5. 5.
    Schneiderman, H., Kanade, T.: A statistical method for 3D object detection applied to faces and cars. In: Proc. Conf. Computer Vision Pattern Recog. (2000)Google Scholar
  6. 6.
    Comaniciu, D., Ramesh, V., Meer, P.: Real-time tracking of non-rigid objects using mean shift. In: Proc. Conf. Computer Vision Pattern Recog.(2000)Google Scholar
  7. 7.
    Isard, M., Blake, A.: Condensation - conditional density propagation for visual tracking. Int. J. Computer Vision 29(1), 5–28 (1998)CrossRefGoogle Scholar
  8. 8.
    Black, J., Ellis, T.: Multi camera image tracking. In: Proc. IEEE Work on Performance Evaluation of Tracking and Surveillance (2001)Google Scholar
  9. 9.
    Hampapur, A., Pankanti, S., Senior, A.W., Tian, Y.-L., Brown, L., Bolle, R.: Face cataloger: Multi-scale imaging for relating identity to location. In: Proc. IEEE Conf. Advanced Video Signal Based Surveillance, pp. 13–20 (2003)Google Scholar
  10. 10.
    Zhang, Z., Zhu, L., Li, S.: Real time multiview face detection. In: Proc. IEEE Int. Conf. Face Gesture Recog. (2002)Google Scholar
  11. 11.
    Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recog. Lett. 15, 1119–1125 (1994)CrossRefGoogle Scholar
  12. 12.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting, Technical Report, Dept. Statistics, Stanford Univerity, Palo Alto, CA (1998)Google Scholar
  13. 13.
    Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. J. Machine Learning 37(3), 297–336 (1999)zbMATHCrossRefGoogle Scholar
  14. 14.
    Jain, A., Zongker, D.: Feature selection: Evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Machine Intell. 19(2), 153–158 (1997)CrossRefGoogle Scholar
  15. 15.
    Somol, P., Pudil, P., Novovicova, J., Paclik, P.: Adaptive floating search methods in feature selection. Pattern Recog. Lett. 20, 1157–1163 (1999)CrossRefGoogle Scholar
  16. 16.
    Bobick, A., Davis, J.: The representation and recognition of action using temporal templates. IEEE Trans. Pattern Anal. Machine Intell. 23(3), 257–267 (2001)CrossRefGoogle Scholar
  17. 17.
    Welch, G., Bishop, G.: An introduction to the Kalman Filter, Technical Report TR 95- 041, Computer Science Dept., Univ. of North Carolina, Chapel Hill, NC (1995)Google Scholar
  18. 18.
    Bouguet, J.-Y.: Camera calibration toolbox,
  19. 19.
    Macho, D., Padrell, J., Abad, A., et al.: Automatic speech activity detection, source localization, and speech recognition on the CHIL seminar corpus. In: Proc. Int. Conf. Multimedia Expo. (2005)Google Scholar
  20. 20.
    Senior, A.: Tracking with probabilistic appearance models. In: Proc. Int. Work. on Performance Evaluation of Tracking and Surveillance Systems (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Zhenqiu Zhang
    • 1
  • Gerasimos Potamianos
    • 1
  • Andrew Senior
    • 1
  • Stephen Chu
    • 1
  • Thomas S. Huang
    • 1
  1. 1.IBM Thomas J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations