[关闭]
@lyc102 2018-01-26T12:36:10.000000Z 字数 2530 阅读 1986

Stochastic Gradient Descent (SGD) Methods

machine_learning


Problems and Algorithms

Consder the minimization problem


where .

GD methods:

where is a sampled index set and is the average of the graident in this sample set.

Convergence Analysis of SGD

Theorem. If is 1) first order Lip continuous and 2) stricktly convex, then with , there exists s.t.

where
with is the sample ratio and is the sampled variance.

Improvments

Pro and Con

Pro: fast and easy to implement.
Disadvantanges:
- Hard to chose
- might be a vector not a scalar
- stuck in a local minimum or saddle points

Momentum

NAG

Adaptive Learning Rate

AdaGrad

RMSProp

AdaDelta

Adam

Convergence Analsyis of Adam

Define Regret


With certain assumptions:

Exercise

Consider the least squares problem

where is a tall matrix, i.e., .

Apply SGD to solve the least squares problem and present convergence proof.

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注