@nrailgun 2016-03-08T14:13:53.000000Z 字数 435 阅读 1521

Avoid using Sigmoid

机器学习

Sigmoid was popular as activision function, but it should not be used.

Saturated neurons "kill" the gradients.
For example, $\frac{\partial \sigma}{\partial x = 10} = 4.5 e^{-5}$ .
Sigmoid outputs are not zero-centered.
If input to a neuron is always positive, then gradients on $w$ are always positive or negative.
$\exp$ is a bit compute expensive.

$\tanh$ squashes numbers to range $[ -1, 1 ]$ . It's zero centered (nice), but kills gradients still.

ReLU is a great idea.