@nrailgun 2015-10-18T04:40:39.000000Z 字数 1035 阅读 1601

CNNVR: Neural Networks, backpropagation

机器学习

\partial f \partial x = \partial f \partial g \partial g \partial x

$\frac{\partial f}{\partial x} = \frac{\partial f}{\partial g} \frac{\partial g}{\partial x}$

Neural network Output Axon:

a = f (\sum i w i x i + b)

$a = f \left(\sum_i w_i x_i + b \right)$

\partial σ ( x ) \partial x = [1 - σ (x)] σ (x)

$\frac{\partial \sigma(x)}{\partial x} = \left[1 - \sigma(x) \right] \sigma(x)$
for sigmoid function

σ(x)=11+exp(−x) $\sigma(x) = \frac{1}{1 + \exp(-x)}$ .

Using sigmoid as activation, 2 problems:

Saturated neuron kill the gradients
Sigmoid outputs are not zero-centered
If input to a neuron is always positive, since $\frac{\partial \sigma(x)}{\partial x} \gt 0$ and $x_j \gt 0$ , all the gradients will always be positive. That's not good.

f (x) = max (0, x)

$f(x) = \max(0, x)$
Pros:

But:
Never update if $x \lt 0$ . Use Leaky ReLU instead.

max (w 1 x + b 1, w 2 x + b 2)

$\max{(w_1 x + b_1, w_2 x + b_2)}$

Use ReLU, and be careful with learning rate. Tryout Leaky ReLU and Maxout. Don't expect tanh too much. Never use sigmoid.