Newton Method and Quasi Newton Method
机器学习
Newton Method
Considering unconstrainted optimization problem
minx∈Rnf(x)
where
x∗ is the minimum of objective function.
Assuming f(x) is twice differentiable, and x(k) is value of x in k-th iteration, the taylor expansion of f(x(k)) is
f(x)=f(x(k))+gTk(x−x(k))+12(x−x(k))TH(x(k))(x−x(k))
where
gk=g(x(k))=∇f(x(k)), and
H(x(k)) is
Hesse matrix of
f(x)
H(x)=[∂2f∂xi∂xj]n×n
Algorithm
Input: objective function f(x), gradient g(x)=∇f(x), Hesse matrix H(x);
Output: minimum x∗.
- Pick init point x(0), set k=0.
- Calculate gk=g(x(k)).
- Stop if ∥gk∥<ϵ, let x∗=x(k).
- Calculate Hk=H(x(k)) and pk
Hkpk=−gk
- Set x(k+1)=x(k)+pk.
- Set k=k+1, goto step (2).
Quasi Newton Method
Calculating H−1 is rather expensive. Quasi newton method approximates H−1k with Gk=G(x(k)).
In newton method, we have
gk+1−gk=Hk(x(k+1)−x(k))
Let
yk=gk+1−gk and
δk=x(k+1)−x(k),
yk=Hkδk
The equation above are called quasi newton condition. Algorithms picking
Gk approximating
H−1k or
Bk approximating
Hk are called quasi newton method.
BFGS
Approxiamate H with
Bk+1=Bk+ykyTkyTkδk−BkδkδTkBkδTkBkδk
where
yk=gk+1−gk and
δk=x(k+1)−x(k).
Algorithm
Input: objective function f(x), g(x)=∇f(x), precision ϵ;
Output: minimum x∗ of f(x).
- Pick init point x(0), positive-definite matrix B0, k=0.
- Calculate gk=g(x(k)), stop if ∥gk∥<ϵ, let x∗=x(k).
- Calculate pk with Bkpk=−gk.
- 1D search:
f(x(k)+λkpk)=minλ≥0f(x(k)+λpk)
- Let x(k+1)=x(k)+λkpk.
- Calculate Bk+1:
Bk+1=Bk+ykyTkyTkδk−BkδkδTkBkδTkBkδk
- Set k=k+1, goto step (2).