Application and Evaluation of Bayesian Filter for Chinese Spam

  • Zhan Wang
  • Yoshiaki Hori
  • Kouichi Sakurai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4318)


Recently, a statistical filtering based on Bayes theory, so-called Bayesian filtering gain attention when it was described in the paper “A Plan for Spam” by Paul Graham, and has become a popular mechanism to distinguish spam email from legitimate email. Many modern mail programs make use of Bayesian spam filtering techniques. The implementation of the Bayesian filtering corresponding to the email written in English and Japanese has already been developed. On the other hand, few work is conducted on the implementation of the Bayesian spam corresponding to Chinese email. In this paper, firstly, we adopted a statistical filtering called as bsfilter and modified it to filter out Chinese email. When we targeted Chinese emails for experiment, we analyzed the relation between the parameter and the spam judgement accuracy of the filtering, and also considered the optimal parameter values.


Bayesian filtering spam Chinese email 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Graham, P.: A Plan For Spam (August 2002)Google Scholar
  2. 2.
  3. 3.
    CCERT Data Sets of Chinese Emails,
  4. 4.
    Robinson, G.: A statistical approach to the spam problem. Linux Journal 107 (2003)Google Scholar
  5. 5.
    Graham, P.: Better bayesian filtering. In: Spam Conference (2003)Google Scholar
  6. 6.
    Zhang, L., Zhu, J., Yao, T.: An Evaluation of Statistical Spam Filtering Techniques. ACM Transactions on Asian Language Information Processing 3(4), 243–269 (2004)CrossRefGoogle Scholar
  7. 7.
    Maosong, S., Dayang, S., Changning, H.: CSeg Tagl.0: A Practical Word Segmenter and POS Tagger for Chinese Texts, A97-1018, A Digital Archive of Research Papers in Computational LinguisticsGoogle Scholar
  8. 8.
    Hovold, J.: Naive Bayes Spam Filtering Using Word-Position-Based Attributes. In: Second Conference on Email and Anti-Spam, CEAS 2005 (2005)Google Scholar
  9. 9.
    Iwanaga, M., Tabata, T., Sakurai, K.: Comparison with Implementations of Bayesian Filtering for Anti-spam. In: SCIS 2004, vol. 2, pp. 1025–1028 (2004) (in Japanese)Google Scholar
  10. 10.
    Ohfuku, H., Matsuura, K.: Optimization of Bayesian filtering for Anti-spam. In: SCIS 2005, vol. 1, pp. 199–204 (2005) (in Japanese)Google Scholar
  11. 11.
  12. 12.
    Support Vector Machine,
  13. 13.
  14. 14.
  15. 15.
    Nie, J.-Y., Ren, F.: Chinese Information Retrieval: Using Characters or Words? Information Processing and Management 35(4), 443–462 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Zhan Wang
    • 1
  • Yoshiaki Hori
    • 2
  • Kouichi Sakurai
    • 2
  1. 1.Graduate School of Information Science and Electrical EngineeringKyushu University 
  2. 2.Faculty of Information Science and Electrical Engineering Kyushu UniversityFukuokaJapan

Personalised recommendations