[关闭]
@nrailgun 2015-10-22T00:29:00.000000Z 字数 3895 阅读 1636

决策树

机器学习


Decision Tree

Decision Tree is a basic classification and regression method. Contains 3 algorithms:


Select Features

Define Entropy as

H(p)=H(X)=i=1npilogpi,

where X is a random variable with probability pi=P(X=xi), with range 0H(p)logn.The larger the uncertainly is, the larger entropy H(X) will be.

Define Conditional Entropy as

H(YX)=i=1npiH(YX=xi),

where X and p are defined as above.

Define Information Gain is defined as:

G(D,A)=H(D)H(DA),

which measures how much uncertainty reduced by introducing A for classification. We should keep those features reduce most uncertainty.


Generating Decision Tree

ID3

Input: Training data set D, feature set A, threshold ϵ;
Output: Decision Tree T.

  1. If all instances belong to same class Ck, return T with label Ck;
  2. If A=, return T with label Ck with most instances in D;
  3. Calculates information gains for each Ai, and select Ag with largest information gain;
    • If information gain of Ag<ϵ, return T with label Ck with most instances in D;
    • Else, partition D into Dis with different Ag=ai, construct sub-trees with Di and AAg.

This approach tends to over-fit.

C4.5

C4.5 is similar to ID3 but select features with information gain ratio

gR(D,A)=g(D,A)H(D)

instead of information gain.


Pruning Decision Tree

Cost function

Let T denote a decision tree with t leaves. Each leaf has Nt samples, and Ntk samples for class k. Define loss function as:

Cα(T)=t=1|T|k=1KNtklogNtkNt+α|T|.

Pruning algorithm

  1. Calculate emperical entropy for every nodes;
  2. If the cost become smaller, then prune it.

CART Algorithm

Classification And Regression Tree is a widely used approach for learning decision tree. It can be used for both classification and regression.

Generate regression tree

Input: Training set D
Output: Regression tree f(x)

  1. Scan for the most appropriate splitting variable j and splitting point s, solving

    minj,sminc1xiR1(yic1)2+minc2xiR2(yic2)2

    where R1 and R2 are regions splitted by j and s, cm=E(yixiRm).

  2. Back to step 1 for R1 and R2, until terminating condition satisfied.

  3. Split input space into M regions R1,R2,,RM, generate decision treeL

    f(x)=m=1Mc^mI(xRm)

Geni Index

Geni Index of probability distribution is defined as

Gini(p)=k=1Kpk(1pk)=1k=1Kp2k

For binomial distribution, the equivalent form of Gini index is Gini(p)=2p(1p).

The Gini index of sample set D is

Gini(D)=1k=1K(|Ck||D|)2

Split D into D1 and D2 by letting Dm={(x,y)DA(x)=a}. Then under the constraint of feature A, Gini index is defined as

Gini(D,A)=|D1||D|Gini(D1)+|D2||D|Gini(D2)

The higher Gini index is, the more unpredicatable of samples are.

Generate classification tree

Input: Training set D
Output: CART classification decision tree

  1. Split D with A=a for every feature A and every possible value a, calculates Gini index Gini(D,A).
  2. Split D with A=a which produces smallest Gini index.
  3. Call step 1 and step 2 for subset D1 and D2.

CART Pruning

TODO: Don't know how this algorithm come.

Input: Decision Tree generated by CART
Output: Optimized decision tree Ta

  1. Set k=0, T=T0.
  2. Set α=+.
  3. Calculate C(Tt), |Tt|,
    g(t)=C(t)C(Tt)|Tt|1(3)

    and
    α=min(α,g(t))(4)

    from top down, where Tt is sub-tree with root t, C(Ti) is error, |Tt| is the number of leaves.
  4. Visit t from top down, if g(t)=α, prune Tt.
  5. Set k=k+1, αk=α, and Tk=T.
  6. If T is not a single node tree, back to step 4.
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注