An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations
- 550 Downloads
Given a transaction database as a global set of transactions and its local database obtained by some conditioning to the global one, we consider a pair of itemsets whose degrees of correlations are higher in the local database than in the global one. A problem of finding paired itemsets with high correlation in one database is known as Discovery of Correlation, and some algorithms to search for such characteristic paired itemsets are already proposed. However, even non-characteristic paired itemsets in the local database are also meaningful, provided the degree of correlation increases much higher in the local database than in the global one. They can be an implicit and hidden evidence showing that something particular to the local database occurs even though they are not yet realized as characteristic ones in the local. From this viewpoint, we have already proposed to measure the significance of paired itemsets by the difference of two correlations before and after the conditioning to the local database, and define a notion of DC pairs whose degrees of differences of correlations are high. As DC pairs are regarded as compound itemsets consisting of two component itemsets, we can have two basic strategies for finding them. One strategy firstly examines the compound itemsets and then the components, while another one does the component itemsets and then the compound ones. According to the former strategy, which we have already proposed and tested for its effectiveness, we have to enumerate many number of candidate compound itemsets that cannot be decomposable to components. For this reason, this paper presents a new algorithm according to the second strategy. It firstly enumerate possible component itemsets based on a new pruning rule for cutting off useless components. Secondly it forms the compound itemsets by combining the components thus detected, while we also make use of a constraint for preventing our algorithm from checking meaningless combinations.
Unable to display preview. Download preview PDF.
- 1.Taniguchi, T., Haraguchi, M., Okubo, Y.: Discovery of hidden correlations in a local transaction database based on differences of correlations. In: Proc. of the IAPR Int’l Conf. on Machine Learning and Data Mining in Pattern Recognition (2005) (to appear)Google Scholar
- 2.Bayardo, R.J.: Efficiently mining long patterns from databases. In: Proc. of the ACM-SIGMOD Int’l Conf. on Management of Data, pp. 85–93 (1998)Google Scholar
- 3.Uno, T., Kiyomi, M., Arimura, H.: LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. In: Proc. of the IEEE Int’l Conf. on data mining, 2nd Workshop on Frequent Itemset Mining Implementations (FIMI 2004) (2004)Google Scholar
- 4.Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the Int’l Conf. on Very Large Data Bases, pp. 487–499 (1994)Google Scholar
- 5.Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proc. of the 5th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 43–52 (1999)Google Scholar
- 7.Webb, G.I., Butler, S., Newlands, D.: On detecting differences between groups. In: Proc. of the 9th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 256–265 (2003)Google Scholar
- 8.Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. In: Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, vol. 26(2), pp. 265–276 (1997)Google Scholar
- 9.Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, vol. 26(2), pp. 255–264 (1997)Google Scholar
- 10.Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: Proc. of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 1998), pp. 18–24 (1998)Google Scholar
- 11.Morishita, S., Sese, J.: Traversing itemset lattices with statistical metric pruning. In: Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Database Systems (PODS), pp. 226–236 (2000)Google Scholar
- 12.Ohsawa, Y., Nara, Y.: Understanding internet users on double helical model of chance-discovery process. In: Proc. of the IEEE Int’l Symposium on Intelligent Control, pp. 844–849 (2002)Google Scholar
- 13.Hettich, S., Bay, S.D.: The UCI KDD Archive, Department of Information and Computer Science, University of California, Irvine, CA (1999), http://kdd.ics.uci.edu