@HaomingJiang
2016-08-03T16:08:39.000000Z
字数 1391
阅读 1439
数据挖掘导论
笔记
Introduce a new “item” for each distinct attribute-value pair
Example: replace Browser Type attribute with
Browser Type = Internet Explorer
Browser Type = Mozilla
Browser Type = Mozilla
Potential Issues:
What if attribute has many possible values
Example: attribute country has more than 200 possible values
Many of the attribute values may have very low support
Potential solution: Aggregate the low-support attribute values
What if distribution of attribute values is highly skewed
Example: 95% of the visitors have Buy = No
Most of the items will be associated with (Buy=No) item
Potential solution: drop the highly frequent items
It is hard to set the different descreting interval.
We can use all possible intervals. But it is time expensive, and will generate redundant rules.
(PS: There is anadvanced method proposed by Approach by Srikant & Agrawal)
For each frequent itemset, compute the descriptive statistics for the
corresponding target variable( Frequent itemset becomes a rule by introducing the target variable as rule consequent)
Apply statistical test to determine interestingness of the rule (e.g. t-test for mean of the random variable)