@fsfzp888 2019-01-23T13:27:22.000000Z 字数 17128 阅读 1144

# 共轭梯度

数值优化 线性代数

## 背景

$f(x ) =\frac{1}{2}x^TAx-b^Tx+c\tag{1}$

$f^{'}(x)=Ax-b=0 \implies Ax=b \tag{2}$

## 几何理解和共轭梯度算法的流程

### Steepest Descent

import numpy as npdef steepest_descent(A, b, x_initial, max_step, thresh=0.00001):    assert(isinstance(A, np.matrix))    assert (isinstance(b, np.matrix))    assert (isinstance(x_initial, np.matrix))    x = x_initial    for _ in range(max_step):        r = b - A * x        alpha = (r.transpose() * r) / (r.transpose() * A * r)        x = x + r * alpha        dist = np.sqrt(np.sum(np.square(b - A * x)))        if dist < thresh:            break    return xif __name__ == '__main__':    N = 100    Ar = np.mat(np.random.rand(N, N))    As = Ar * Ar.transpose()  # get positive definite matrix    bn = np.mat(np.random.rand(N, 1))    xi = np.mat(np.zeros((N, 1)))    xr = steepest_descent(As, bn, xi, 1000)    print('1000:', np.sqrt(np.sum(np.square(bn - As * xr))))    xr = steepest_descent(As, bn, xi, 10000)    print('10000:', np.sqrt(np.sum(np.square(bn - As * xr))))    xr = steepest_descent(As, bn, xi, 20000)    print('20000:', np.sqrt(np.sum(np.square(bn - As * xr))))

1000: 1.945382131730489610000: 0.911662840883017520000: 0.6950563935826073

SD这个方法，使用梯度方向，每次选择梯度方向都和上一次的正交，这样迭代的结果取决于初始点的选择和$A$，如果初始点不理想，可能一直在锯齿状的接近极值点，但是迭代多次后仍然不能命中。

#### CG的推导过程

$d_0=r_0=b-Ax_0$
for loop until reach max_step or cost_function below threshold:
$\qquad \alpha_k=\frac{r_k^Tr_k}{d_k^TAd_k}$
$\qquad x_{k+1}=x_k+\alpha_kd_k$
$\qquad r_{k+1}=r_k-\alpha_kAd_k\space or \space r_{k+1}=b-Ax_{k+1}$
$\qquad \beta_{k+1}=\frac{r_{k+1}^Tr_{k+1}}{r_k^Tr_k}$
$\qquad d_{k+1}=r_{k+1}+\beta_{k+1}d_k$
return $x_n$

import numpy as npdef conjugate_gradient(A, b, x_initial, max_step, threshold=0.00001):    assert(isinstance(A, np.matrix))    assert(isinstance(b, np.matrix))    assert(isinstance(x_initial, np.matrix))    r_old = b - A * x_initial    d = r_old    x = x_initial    for i in range(max_step):        alpha = (r_old.transpose() * r_old) / (d.transpose() * A * d)        x = x + d * alpha        # r_new = b - A * x        r_new = r_old - A * d * alpha        beta = (r_new.transpose() * r_new) / (r_old.transpose() * r_old)        d = r_new + d * beta        r_old = r_new        cf = np.sqrt(np.sum(np.square(b - A * x)))        if cf < threshold:            print("Using step: ", i)            break    return xif __name__ == '__main__':    N = 200    Ar = np.mat(np.random.rand(N, N))    As = Ar * Ar.transpose()    bn = np.mat(np.random.rand(N, 1))    xi = np.mat(np.random.rand(N, 1))    xr = conjugate_gradient(As, bn, xi, 1000)    print('1000:', np.sqrt(np.sum(np.square(bn - As * xr))))    xr = conjugate_gradient(As, bn, xi, 10000)    print('10000:', np.sqrt(np.sum(np.square(bn - As * xr))))

Using step:  4101000: 6.43707958798e-06Using step:  41010000: 6.43707958798e-06

### 基于SSOR预条件子的共轭梯度算法

#### Precondition

$r_0=b-ax_0,z_0=M^{-1}r_0,d_0=z_0$
for loop until reach max_step or cost_function below threshold:
$\qquad \alpha_k=\frac{z_k^Tr_k}{d_k^TAd_k}$
$\qquad x_{k+1}=x_k+\alpha_kd_k$
$\qquad r_{k+1}=r_k - \alpha_kAd_k$
$\qquad z_{k+1}=M^{-1}r_{k+1}$
$\qquad \beta_{k+1}=\frac{z_{k+1}^Tr_{k+1}}{z_k^Tr_k}$
$\qquad d_{k+1}=z_{k+1}+\beta_{k+1}d_k$
return $x_n$

import numpy as npdef get_ssor_precondition_matrix(A, w):    UD = np.triu(A)    LD = np.tril(A)    dim = A.shape[0]    D = np.mat(np.zeros((dim, dim)))    for i in range(dim):        D[i, i] = A[i, i]    for i in range(dim):        for j in range(i+1, dim):            UD[i, j] = w * UD[i, j]    for i in range(dim):        for j in range(0, i):            LD[i, j] = w * LD[i, j]    # 对上下三角矩阵求逆矩阵，其实不必用通用的求逆方法，不停回代即可    return np.mat(np.linalg.inv(UD)) * D * np.mat(np.linalg.inv(LD))def precondition_conjugate_gradient(A, b, x_initial, max_step,                                    threshold=0.00001, w=0.2):    assert(isinstance(A, np.matrix))    assert(isinstance(b, np.matrix))    assert(isinstance(x_initial, np.matrix))    r_old = b - A * x_initial    M_inv = get_ssor_precondition_matrix(A, w)    z_old = M_inv * r_old    d = z_old    x = x_initial    for i in range(max_step):        alpha = (z_old.transpose() * r_old) / (d.transpose() * A * d)        x = x + d * alpha        # r_new = b - A * x        r_new = r_old - A * d * alpha        z_new = M_inv * r_new        beta = (z_new.transpose() * r_new) / (z_old.transpose() * r_old)        d = z_new + d * beta        r_old = r_new        z_old = z_new        cf = np.sqrt(np.sum(np.square(b - A * x)))        if cf < threshold:            print("Using step: ", i)            break    return xif __name__ == '__main__':    N = 200    Ar = np.mat(np.random.rand(N, N))    As = Ar * Ar.transpose()    bn = np.mat(np.random.rand(N, 1))    xi = np.mat(np.random.rand(N, 1))    xr = precondition_conjugate_gradient(As, bn, xi, 1000, 0.00001, 0.05)    print('w=0.05:', np.sqrt(np.sum(np.square(bn - As * xr))))    xr = precondition_conjugate_gradient(As, bn, xi, 1000, 0.00001, 0.5)    print('w=0.5:', np.sqrt(np.sum(np.square(bn - As * xr))))    xr = precondition_conjugate_gradient(As, bn, xi, 1000, 0.00001, 1)    print('w=1:', np.sqrt(np.sum(np.square(bn - As * xr))))

runfile('C:/Users/zczx1/.spyder-py3/temp.py', wdir='C:/Users/zczx1/.spyder-py3')Using step:  378w=0.05: 6.71578034331e-06Using step:  405w=0.5: 9.67223926338e-06Using step:  573w=1: 8.09438315554e-06runfile('C:/Users/zczx1/.spyder-py3/temp.py', wdir='C:/Users/zczx1/.spyder-py3')Using step:  371w=0.05: 6.1508910381e-06Using step:  401w=0.5: 8.51261715479e-06Using step:  602w=1: 7.57385104633e-06runfile('C:/Users/zczx1/.spyder-py3/temp.py', wdir='C:/Users/zczx1/.spyder-py3')Using step:  373w=0.05: 6.44798434081e-06Using step:  406w=0.5: 6.55294163482e-06Using step:  580w=1: 8.13783429322e-06

### 非线性共轭梯度算法

#### 算法流程

1. 依据步长和方向向量更新解向量，即$x_{k+1}=x_k+\alpha_kd_k$
2. 利用梯度更新方向向量，即$d_{k+1}=r_{k+1}+\beta_{k+1}d_{k}$

$\alpha_k$，是希望找到这样的步长因子使得$f^{'}(x_k+\alpha_kd_k)^Td_k=0$，本质上还是希望下一个梯度方向和当前的方向向量正交，这个时候应该就是最优的解，和SD中的线搜索是类似的。不过，在实际的实现中，可能只是设置一个固定值，然后乘以一个比例，这个比例随着迭代的进行可能会变得越来越小。所以:

$d_0=r_0=-f^{'}(x_0)$
for loop until max step or $f^{'}(x) \lt thresh$:
$\qquad \alpha_k=argmin(f(x_k+\alpha_kd_k))$
$\qquad x_{k+1}=x_k+\alpha_kd_k$
$\qquad r_{k+1}=-f^{'}(x_{k+1})$
$\qquad \beta_{k+1}^{PR}=max(0, \frac{r^T_{k+1}(r_{k+1}-r_{k})}{r_k^Tr_k})$
$\qquad d_{k+1}=r_{k+1}+\beta_{k+1}d_k$

1. 公式法
2. 符号微分法
3. 自动微分法

@fsfzp888
2019 年 01月

• 私有
• 公开
• 删除