Advertisement

An Experiment with Association Rules and Classification: Post-Bagging and Conviction

  • Alípio M. Jorge
  • Paulo J. Azevedo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3735)

Abstract

In this paper we study a new technique we call post-bagging, which consists in resampling parts of a classification model rather then the data. We do this with a particular kind of model: large sets of classification association rules, and in combination with ordinary best rule and weighted voting approaches. We empirically evaluate the effects of the technique in terms of classification accuracy. We also discuss the predictive power of different metrics used for association rule mining, such as confidence, lift, conviction and χ 2. We conclude that, for the described experimental conditions, post-bagging improves classification results and that the best metric is conviction.

Keywords

Association Rule Frequent Itemset Association Rule Mining Frequent Pattern Mining Decision Tree Inducer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining, pp. 307–328 (1996)Google Scholar
  2. 2.
    Ali, K., Manganaris, S., Srikant, R.: Partial classification using association rules. In: Proceedings of the Third ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 1997, pp. 115–118. ACM, New York (1997)Google Scholar
  3. 3.
    Azevedo, P.J.: A Data Structure to Represent Association Rules based Classifiers Technical Report, Universidade do Minho (2005)Google Scholar
  4. 4.
    Azevedo, P.J., Jorge, A.M.: The CLASS Project, http://www.niaad.liacc.up.pt/~amjorge/Projectos/Class/
  5. 5.
    Bayardo, R.J., Agrawal, R., Gunopulos, D.: Constraint-Based Rule Mining in Large, Dense Databases. Data Mining and Knowledge Discovery 4(2-3), 217–240 (2000)CrossRefGoogle Scholar
  6. 6.
    Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth (1984)Google Scholar
  8. 8.
    Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proceedings of th ACM SIGMOD International Conference on Management of Data (1997)Google Scholar
  9. 9.
    Domingos, P.: Why does bagging work? A Bayesian account and its implications. In: Proceedings of the Third ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 1997, pp. 115–118. ACM, New York (1997)Google Scholar
  10. 10.
    Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Bajcsy, R. (ed.) Proceedings of the 13th International Joint Conference on Artificial Intelligence, Chambéry, France, pp. 1022–1029. Morgan Kaufmann, San Francisco (1993)Google Scholar
  11. 11.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction, Series in Statistics. Springer, Heidelberg (2001)zbMATHGoogle Scholar
  12. 12.
    Ho, T.K., Hull, J.J., Srihari, S.N.: Decision Combination in Multiple Classifier Systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 16(1), 66–75 (1994)CrossRefGoogle Scholar
  13. 13.
    Ihaka, R., Gentleman, R.: R: A Language for Data Analysis and Graphics. Journal of Computational Graphics and Statistics 5(3), 299–314 (1996)CrossRefGoogle Scholar
  14. 14.
    Jovanoski, V., Lavrac, N.: Classification rule learning with APRIORI-C. In: Brazdil, P.B., Jorge, A.M. (eds.) EPIA 2001. LNCS (LNAI), vol. 2258, pp. 44–51. Springer, Heidelberg (2001)Google Scholar
  15. 15.
    Jorge, A., Lopes, A.: Iterative part-of-speech tagging. In: Cussens, J., Džeroski, S. (eds.) LLL 1999. LNCS (LNAI), vol. 1925, p. 170. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  16. 16.
    Kononenko, I.: Combining decisions of multiple rules. In: du Boulay, B., Sgurev, V. (eds.) Artificial Intelligence V: Methodology, Systems, Applications. Elsevier, Amsterdam (1992)Google Scholar
  17. 17.
    Li, W., Han, J., Pei, J.: CMAR: Accurate and Efficient Classification Based on MultipleClass-Association Rules. In: IEEE International Conference on Data Mining (2001)Google Scholar
  18. 18.
    Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: Proceedings of the Fourth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 15-18. ACM, New York (1998)Google Scholar
  19. 19.
    Liu, B., Hsu, W., Ma, Y.: Pruning and Summarizing the Discovered Associations. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 15-18, pp. 125–134. ACM, New York (1999)CrossRefGoogle Scholar
  20. 20.
    Meretakis, D., Wüthrich, B.: Extending Nave Bayes Classifiers Using Long Itemsets. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 15-18, pp. 165–174. ACM, New York (1999)CrossRefGoogle Scholar
  21. 21.
    Merz, C.J., Murphy, P.: UCI Repository of Machine Learning Database (1996), http://www.cs.uci.edu/~mlearn
  22. 22.
    Neave, H.R., Worthington, P.L.: Distribution-free tests, Unwin Hyman Ltd. (1988)Google Scholar
  23. 23.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  24. 24.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)Google Scholar
  25. 25.
    Zaki, M.J.: Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12(3), 372–390 (2000)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Alípio M. Jorge
    • 1
  • Paulo J. Azevedo
    • 2
  1. 1.LIACC, Faculdade de EconomiaUniversidade do PortoPortoPortugal
  2. 2.Departamento de InformáticaUniversidade do MinhoPortugal

Personalised recommendations