朴素贝叶斯分类
机器学习
Bayes Theorem
Let A and B denotes 2 independent random variables. Then we have:
P(A∣B)=P(B∣A)×P(A)P(B),
where
P(A) and
P(B) is corresponding probability.
Naive Bayes Classification
Let input space be X⊂Rn, and output space be Y={c1,c2,…,ck}. Conditional probability distribution P(X=x∣Y=ck) has K∏nj=1Sj parameters. Almost impossible to calculate them all.
Suppose all x(l) are independent, the distribution will be:
P(X=x∣Y=ck)=∏j=1nP(X(j)=x(j)∣Y=ck).
Well, we have:
P(Y=ck∣X=x)=P(X=x∣Y=ck)P(Y=ck)P(X=x),
where
P(X=x) is the same for any
ck.
Finally, Naive Bayes Classification can be represented as:
y=f(x)=argmaxckP(Y=ck)∏jP(X(j)=x(j)∣Y=ck).
Paramters Estimate
Maximum Likelihood Estimate
The probability of P(Y=ck)
P(Y=ck)=∑Ni=1I(yi=ci)N,k=1,2,…,K.
The probability of P(X(j)=a∣Y=cj) is
P(X(j)=a∣Y=cj)=∑Ni=1I(X(j)=a,yi=ck)∑Ni=1I(yi=ck).
Bayes Estimate
P(X(j)=a∣Y=ck) might be 0 for some a, which causes classification errors. Use Bayes Estimate instead.
The Bayes estimate of conditional probability is:
Pλ(X(j)=a∣Y=cj)=∑Ni=1I(X(j)=a,yi=ck)+λ∑Ni=1I(yi=ck)+Sjλ,
where
Sj is the size of
X(j) space, and usually
λ=1.
The Bayes estimate of P(Y=ck) is:
P(Y=ck)=∑Ni=1I(yi=ci)+λN+Kλ,k=1,2,…,K.