@nrailgun 2015-10-02T12:41:31.000000Z 字数 1825 阅读 1643

朴素贝叶斯分类

机器学习

Bayes Theorem

Let $A$ and $B$ denotes 2 independent random variables. Then we have:

P (A ∣ B) = P ( B ∣ A ) \times P ( A ) P ( B ),

$P(A \mid B) = \frac{P(B \mid A) \times P(A)}{P(B)},$
where

P(A) $P(A)$ and

P(B) $P(B)$ is corresponding probability.

Naive Bayes Classification

Let input space be $\mathcal{X} \subset \mathbf{R}^n$ , and output space be $\mathcal{Y} = \{c_1, c_2, \dots, c_k\}$ . Conditional probability distribution $P(X = x \mid Y = c_k)$ has $K \prod_{j=1}^{n} S_j$ parameters. Almost impossible to calculate them all.

Suppose all $x^{(l)}$ are independent, the distribution will be:

P (X = x ∣ Y = c k) = \prod j = 1 n P (X (j) = x (j) ∣ Y = c k) .

$P(X = x \mid Y = c_k) = \prod_{j=1}^{n} P(X^{(j)} = x^{(j)} \mid Y = c_k) .$

Well, we have:

P (Y = c k ∣ X = x) = P ( X = x ∣ Y = c k ) P ( Y = c k ) P ( X = x ),

$P(Y = c_k \mid X = x) = \frac { P(X = x \mid Y = c_k) P(Y = c_k) } {P(X = x)},$
where

P(X=x) $P(X = x)$ is the same for any

ck $c_k$ .

Finally, Naive Bayes Classification can be represented as:

y = f (x) = a r g m a x c k P (Y = c k) \prod j P (X (j) = x (j) ∣ Y = c k) .

$y = f(x) = \mathrm{argmax}_{c_k} P(Y=c_k) \prod_j P(X^{(j)} = x^{(j)} \mid Y = c_k).$

Paramters Estimate

Maximum Likelihood Estimate

The probability of $P(Y=c_k)$

P (Y = c k) = \sum N i = 1 I ( y i = c i ) N, k = 1, 2, \dots, K .

$P(Y=c_k) = \frac{\sum_{i=1}^{N} I(y_i = c_i)}{N}, k = 1, 2, \dots, K.$

The probability of $P(X^{(j)} = a \mid Y=c_j)$ is

P (X (j) = a ∣ Y = c j) = \sum N i = 1 I ( X ( j ) = a , y i = c k ) \sum N i = 1 I ( y i = c k ) .

$P(X^{(j)} = a \mid Y=c_j) = \frac {\sum_{i=1}^N I(X^{(j)} = a, y_i = c_k)} {\sum_{i=1}^N I(y_i = c_k)}.$

Bayes Estimate

$P(X^{(j)} = a \mid Y=c_k)$ might be $0$ for some $a$ , which causes classification errors. Use Bayes Estimate instead.

The Bayes estimate of conditional probability is:

P λ (X (j) = a ∣ Y = c j) = \sum N i = 1 I ( X ( j ) = a , y i = c k ) + λ \sum N i = 1 I ( y i = c k ) + S j λ,

$P_\lambda(X^{(j)} = a \mid Y=c_j) = \frac {\sum_{i=1}^N I(X^{(j)} = a, y_i = c_k) + \lambda} {\sum_{i=1}^N I(y_i = c_k) + S_j \lambda},$
where

Sj $S_j$ is the size of

X(j) $X^{(j)}$ space, and usually

λ=1 $\lambda = 1$ .

The Bayes estimate of $P(Y=c_k)$ is:

P (Y = c k) = \sum N i = 1 I ( y i = c i ) + λ N + K λ, k = 1, 2, \dots, K .

$P(Y=c_k) = \frac{\sum_{i=1}^{N} I(y_i = c_i) + \lambda}{N + K \lambda}, k = 1, 2, \dots, K.$