@nrailgun 2016-04-07T07:56:21.000000Z 字数 515 阅读 1529

Caffe Solver

机器学习

The Caffe solvers are:

SGD
AdaDelta
$\vdots$
RMSProp

The optimization objective over all $|D|$ data instances is:

L (W) = 1 | D | \sum i | D | f W (X (i)) + λ γ (W)

$L(W) = \frac 1 {|D|} \sum_i^{|D|} f_W(X^{(i)}) + \lambda \gamma(W)$

SGD

We have the following formulas to compute the update value $V_{t+1}$ and the updated updated weight $W_{t+1}$ :

V t + 1 = μ V t + α \nabla L (W t)

$V_{t+1} = \mu V_t + \alpha \nabla L(W_t)$

W t + 1 = W t + V t + 1

$W_{t+1} = W_t + V_{t+1}$

Generally you probably want to use a momentum $\mu = 0.9$ and learning rate $\alpha = 0.01$ . If you increase $\mu$ , it might be a good idea to decrease $\alpha$ .

Caffe Solver

SGD

内容目录