[关闭]
@nrailgun 2015-10-28T21:58:48.000000Z 字数 1313 阅读 1627

Frequent Itemset Mining and Association Rules

机器学习


Frequent Itemset

Qustion: Find sets of items appear together frequently in baskets.

Support for itemset I: Number of baskets containing I. Given a support threshold s, then sets of items appear in at least s baskets are called frequent itemsets.

Association Rules: {i1,,ik}j means "if a basket contains all of i1,,ik then it is likely to contain j". Confidence of association rule is

conf(Ij)=support(Ij)support(I)

Not all high-confidence rules are interesting:

Interest(Ij)=conf(Ij)P[j]

Interesting rules are those with high postive or negative interest values.

A-Priori

A-Priori finds frequent itemsets.

Key idea: If a set of items I appears at least s times, so does every subset J of I. If item i does not appear in s baskets, then no pair including i can appear in s baskets.

  1. Read baskets and count in main memory the occurrences of each individual item.
  2. Read baskets again and count in main memory only those pairs where both elements are frequent (2-tuple).
  3. For each k, we construct two sets of k-tuples:
    1. Generate candidate k-tuple Ck from Lk1 and L1.
    2. Lk= the set of truly frequent k-tuples.

Note that to generate Ck from Lk1 and L1, one should be careful. For example, in C3 we know {b,m,j} cannot be frequent since {m,j} is not frequent.

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注