@nrailgun 2015-10-28T13:58:48.000000Z 字数 1313 阅读 1952

Frequent Itemset Mining and Association Rules

机器学习

Frequent Itemset

Qustion: Find sets of items appear together frequently in baskets.

Support for itemset $I$ : Number of baskets containing $I$ . Given a support threshold $s$ , then sets of items appear in at least $s$ baskets are called frequent itemsets.

Association Rules: $\{ i_1, \dots, i_k \} \to j$ means "if a basket contains all of $i_1, \dots, i_k$ then it is likely to contain $j$ ". Confidence of association rule is

c o n f (I \to j) = s u p p o r t ( I \cup j ) s u p p o r t ( I )

$\mathrm{conf}(I \to j) = \frac{\mathrm{support}(I \cup j)}{\mathrm{support}(I)}$

Not all high-confidence rules are interesting:

I n t e r e s t (I \to j) = c o n f (I \to j) - P [j]

$\mathrm{Interest}(I \to j) = \mathrm{conf}(I \to j) - P[j]$
Interesting rules are those with high postive or negative interest values.

A-Priori

A-Priori finds frequent itemsets.

Key idea: If a set of items $I$ appears at least $s$ times, so does every subset $J$ of $I$ . If item $i$ does not appear in $s$ baskets, then no pair including $i$ can appear in $s$ baskets.

Read baskets and count in main memory the occurrences of each individual item.
Read baskets again and count in main memory only those pairs where both elements are frequent (2-tuple).
For each k, we construct two sets of k-tuples:
1. Generate candidate k-tuple $C_k$ from $L_{k-1}$ and $L_1$ .
2. $L_k =$ the set of truly frequent k-tuples.

Note that to generate $C_k$ from $L_{k-1}$ and $L_1$ , one should be careful. For example, in $C_3$ we know $\{b,m,j\}$ cannot be frequent since $\{m,j\}$ is not frequent.

Frequent Itemset Mining and Association Rules

Frequent Itemset

A-Priori

内容目录

选择主题