[关闭]
@HaomingJiang 2016-08-03T16:08:39.000000Z 字数 1391 阅读 1439

Chp7 Extended Association Analysis

数据挖掘导论 笔记



7.1 Categorical Attribute

Introduce a new “item” for each distinct attribute-value pair

Example: replace Browser Type attribute with
Browser Type = Internet Explorer
Browser Type = Mozilla
Browser Type = Mozilla

Potential Issues:
What if attribute has many possible values
Example: attribute country has more than 200 possible values
Many of the attribute values may have very low support
Potential solution: Aggregate the low-support attribute values

What if distribution of attribute values is highly skewed
Example: 95% of the visitors have Buy = No
Most of the items will be associated with (Buy=No) item
Potential solution: drop the highly frequent items

7.2 Continuous Attributes

7.2.1 Discretization-based

It is hard to set the different descreting interval.
We can use all possible intervals. But it is time expensive, and will generate redundant rules.

(PS: There is anadvanced method proposed by Approach by Srikant & Agrawal)

7.2.2 Statistics-based

7.2.3 Non-discretization-based

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注