CNNVR: Neural Networks, backpropagation
机器学习
Chain Rule
∂f∂x=∂f∂g∂g∂x
NN
Neural network Output Axon:
a=f(∑iwixi+b)
Sigmoid Function
∂σ(x)∂x=[1−σ(x)]σ(x)
for sigmoid function
σ(x)=11+exp(−x).
Using sigmoid as activation, 2 problems:
- Saturated neuron kill the gradients
- Sigmoid outputs are not zero-centered
If input to a neuron is always positive, since ∂σ(x)∂x>0 and xj>0, all the gradients will always be positive. That's not good.
TanH
- Squashes numbers to range [−1,1]
- Zero centered (nice)
- Still kills gradients when saturated
ReLU
f(x)=max(0,x)
Pros:
- Doesn't saturate
- Computaionally effective
- Converges much faster than sigmoid and tanh
But:
Never update if x<0. Use Leaky ReLU instead.
Maxout
max(w1x+b1,w2x+b2)
In Pratice
Use ReLU, and be careful with learning rate. Tryout Leaky ReLU and Maxout. Don't expect tanh too much. Never use sigmoid.