在数值线性代数中,共轭梯度法是一种求解对称正定线性方程组 A x = b {\displaystyle {\boldsymbol {Ax}}={\boldsymbol {b}}} 的迭代方法。共轭梯度法可以从不同的角度推导而得,包括作为求解最优化问题的共轭方向法的特例,以及作为求解特征值问题的Arnoldi/Lanczos迭代的变种。 本条目记述这些推导方法中的重要步骤。 从共轭方向法推导 此章节需要扩充。 (2010年6月) 共轭梯度法可以看作是应用于二次函数最小化的共轭方向法的特例 f ( x ) = x T A x − 2 b T x . {\displaystyle f({\boldsymbol {x}})={\boldsymbol {x}}^{\mathrm {T} }{\boldsymbol {A}}{\boldsymbol {x}}-2{\boldsymbol {b}}^{\mathrm {T} }{\boldsymbol {x}}{\text{.}}} Remove ads共轭方向法 在共轭方向上求最小化: f ( x ) = x T A x − 2 b T x . {\displaystyle f({\boldsymbol {x}})={\boldsymbol {x}}^{\mathrm {T} }{\boldsymbol {A}}{\boldsymbol {x}}-2{\boldsymbol {b}}^{\mathrm {T} }{\boldsymbol {x}}{\text{.}}} 从最初的猜测 x 0 {\displaystyle {\boldsymbol {x}}_{0}} 开始,以及相应的残差 r 0 = b − A x 0 {\displaystyle {\boldsymbol {r}}_{0}={\boldsymbol {b}}-{\boldsymbol {Ax}}_{0}} , 并通过公式计算迭代和残差 α i = p i T r i p i T A p i , x i + 1 = x i + α i p i , r i + 1 = r i − α i A p i {\displaystyle {\begin{aligned}\alpha _{i}&={\frac {{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}{\text{,}}\\{\boldsymbol {x}}_{i+1}&={\boldsymbol {x}}_{i}+\alpha _{i}{\boldsymbol {p}}_{i}{\text{,}}\\{\boldsymbol {r}}_{i+1}&={\boldsymbol {r}}_{i}-\alpha _{i}{\boldsymbol {Ap}}_{i}\end{aligned}}} 其中, p 0 , p 1 , p 2 , … {\displaystyle {\boldsymbol {p}}_{0},{\boldsymbol {p}}_{1},{\boldsymbol {p}}_{2},\ldots } 为一系列相互共轭的方向,例如: p i T A p j = 0 {\displaystyle {\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{j}=0} ,对于任何 i ≠ j {\displaystyle i\neq j} 由于共轭方向法没有给出用于选择方向的公式,因此该方法具有误差 p 0 , p 1 , p 2 , … {\displaystyle {\boldsymbol {p}}_{0},{\boldsymbol {p}}_{1},{\boldsymbol {p}}_{2},\ldots } 而共轭方向法也有多种方法分支,包括共轭梯度法和高斯消元法。 Remove ads从Arnoldi/Lanczos迭代推导 共轭梯度法可以看作Arnoldi/Lanczos迭代应用于求解线性方程组时的一个变种。 一般Arnoldi方法 Arnoldi迭代从一个向量 r 0 {\displaystyle {\boldsymbol {r}}_{0}} 开始,通过定义 v i = w i / ‖ w i ‖ 2 {\displaystyle {\boldsymbol {v}}_{i}={\boldsymbol {w}}_{i}/\lVert {\boldsymbol {w}}_{i}\rVert _{2}} ,其中 w i = { r 0 if i = 1 , A v i − 1 − ∑ j = 1 i − 1 ( v j T A v i − 1 ) v j if i > 1 , {\displaystyle {\boldsymbol {w}}_{i}={\begin{cases}{\boldsymbol {r}}_{0}&{\text{if }}i=1{\text{,}}\\{\boldsymbol {Av}}_{i-1}-\sum _{j=1}^{i-1}({\boldsymbol {v}}_{j}^{\mathrm {T} }{\boldsymbol {Av}}_{i-1}){\boldsymbol {v}}_{j}&{\text{if }}i>1{\text{,}}\end{cases}}} 逐步建立Krylov子空间 K ( A , r 0 ) = { r 0 , A r 0 , A 2 r 0 , … } {\displaystyle {\mathcal {K}}({\boldsymbol {A}},{\boldsymbol {r}}_{0})=\{{\boldsymbol {r}}_{0},{\boldsymbol {Ar}}_{0},{\boldsymbol {A}}^{2}{\boldsymbol {r}}_{0},\ldots \}} 的一组标准正交基 { v 1 , v 2 , v 3 , … } {\displaystyle \{{\boldsymbol {v}}_{1},{\boldsymbol {v}}_{2},{\boldsymbol {v}}_{3},\ldots \}} 。 换言之,对于 i > 1 {\displaystyle i>1} , v i {\displaystyle {\boldsymbol {v}}_{i}} 由将 A v i − 1 {\displaystyle {\boldsymbol {Av}}_{i-1}} 相对于 { v 1 , v 2 , … , v i − 1 } {\displaystyle \{{\boldsymbol {v}}_{1},{\boldsymbol {v}}_{2},\ldots ,{\boldsymbol {v}}_{i-1}\}} 进行Gram-Schmidt正交化然后归一化得到。 写成矩阵形式,迭代过程可以表示为 A V i = V i + 1 H ~ i , {\displaystyle {\boldsymbol {AV}}_{i}={\boldsymbol {V}}_{i+1}{\boldsymbol {\tilde {H}}}_{i}{\text{,}}} 其中 V i = [ v 1 v 2 ⋯ v i ] , H ~ i = [ h 11 h 12 h 13 ⋯ h 1 , i h 21 h 22 h 23 ⋯ h 2 , i h 32 h 33 ⋯ h 3 , i ⋱ ⋱ ⋮ h i , i − 1 h i , i h i + 1 , i ] = [ H i h i + 1 , i e i T ] , h j i = { v j T A v i if j ≤ i , ‖ w i + 1 ‖ 2 if j = i + 1 , 0 if j > i + 1 . {\displaystyle {\begin{aligned}{\boldsymbol {V}}_{i}&={\begin{bmatrix}{\boldsymbol {v}}_{1}&{\boldsymbol {v}}_{2}&\cdots &{\boldsymbol {v}}_{i}\end{bmatrix}}{\text{,}}\\{\boldsymbol {\tilde {H}}}_{i}&={\begin{bmatrix}h_{11}&h_{12}&h_{13}&\cdots &h_{1,i}\\h_{21}&h_{22}&h_{23}&\cdots &h_{2,i}\\&h_{32}&h_{33}&\cdots &h_{3,i}\\&&\ddots &\ddots &\vdots \\&&&h_{i,i-1}&h_{i,i}\\&&&&h_{i+1,i}\end{bmatrix}}={\begin{bmatrix}{\boldsymbol {H}}_{i}\\h_{i+1,i}{\boldsymbol {e}}_{i}^{\mathrm {T} }\end{bmatrix}}{\text{,}}\\h_{ji}&={\begin{cases}{\boldsymbol {v}}_{j}^{\mathrm {T} }{\boldsymbol {Av}}_{i}&{\text{if }}j\leq i{\text{,}}\\\lVert {\boldsymbol {w}}_{i+1}\rVert _{2}&{\text{if }}j=i+1{\text{,}}\\0&{\text{if }}j>i+1{\text{.}}\end{cases}}\end{aligned}}} 当应用于求解线性方程组时,Arnoldi迭代对应于初始解 x 0 {\displaystyle {\boldsymbol {x}}_{0}} 的残量 r 0 = b − A x 0 {\displaystyle {\boldsymbol {r}}_{0}={\boldsymbol {b}}-{\boldsymbol {Ax}}_{0}} 开始迭代,在每一步迭代之后计算 y i = H i − 1 ( ‖ r 0 ‖ 2 e 1 ) {\displaystyle {\boldsymbol {y}}_{i}={\boldsymbol {H}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})} 和新的近似解 x i = x 0 + V i y i {\displaystyle {\boldsymbol {x}}_{i}={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {y}}_{i}} . Remove ads直接Lanzcos方法 在余下的讨论中,我们假定 A {\displaystyle {\boldsymbol {A}}} 是对称正定矩阵。由于 A {\displaystyle {\boldsymbol {A}}} 的对称性, 上Hessenberg矩阵 H i = V i T A V i {\displaystyle {\boldsymbol {H}}_{i}={\boldsymbol {V}}_{i}^{\mathrm {T} }{\boldsymbol {AV}}_{i}} 变成对阵三对角矩阵。于是它可以被更明确地表示为 H i = [ a 1 b 2 b 2 a 2 b 3 ⋱ ⋱ ⋱ b i − 1 a i − 1 b i b i a i ] . {\displaystyle {\boldsymbol {H}}_{i}={\begin{bmatrix}a_{1}&b_{2}\\b_{2}&a_{2}&b_{3}\\&\ddots &\ddots &\ddots \\&&b_{i-1}&a_{i-1}&b_{i}\\&&&b_{i}&a_{i}\end{bmatrix}}{\text{.}}} 这使得迭代中的 v i {\displaystyle {\boldsymbol {v}}_{i}} 有一个简短的三项递推式。Arnoldi迭代从而简化为Lanczos迭代。 由于 A {\displaystyle {\boldsymbol {A}}} 对称正定, H i {\displaystyle {\boldsymbol {H}}_{i}} 同样也对称正定。因此, H i {\displaystyle {\boldsymbol {H}}_{i}} 可以通过不选主元的LU分解分解为 H i = L i U i = [ 1 c 2 1 ⋱ ⋱ c i − 1 1 c i 1 ] [ d 1 b 2 d 2 b 3 ⋱ ⋱ d i − 1 b i d i ] , {\displaystyle {\boldsymbol {H}}_{i}={\boldsymbol {L}}_{i}{\boldsymbol {U}}_{i}={\begin{bmatrix}1\\c_{2}&1\\&\ddots &\ddots \\&&c_{i-1}&1\\&&&c_{i}&1\end{bmatrix}}{\begin{bmatrix}d_{1}&b_{2}\\&d_{2}&b_{3}\\&&\ddots &\ddots \\&&&d_{i-1}&b_{i}\\&&&&d_{i}\end{bmatrix}}{\text{,}}} 其中 c i {\displaystyle c_{i}} 和 d i {\displaystyle d_{i}} 有简单的递推式: c i = b i / d i − 1 , d i = { a 1 if i = 1 , a i − c i b i if i > 1 . {\displaystyle {\begin{aligned}c_{i}&=b_{i}/d_{i-1}{\text{,}}\\d_{i}&={\begin{cases}a_{1}&{\text{if }}i=1{\text{,}}\\a_{i}-c_{i}b_{i}&{\text{if }}i>1{\text{.}}\end{cases}}\end{aligned}}} 改写 x i = x 0 + V i y i {\displaystyle {\boldsymbol {x}}_{i}={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {y}}_{i}} 为 x i = x 0 + V i H i − 1 ( ‖ r 0 ‖ 2 e 1 ) = x 0 + V i U i − 1 L i − 1 ( ‖ r 0 ‖ 2 e 1 ) = x 0 + P i z i , {\displaystyle {\begin{aligned}{\boldsymbol {x}}_{i}&={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {H}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})\\&={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {U}}_{i}^{-1}{\boldsymbol {L}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})\\&={\boldsymbol {x}}_{0}+{\boldsymbol {P}}_{i}{\boldsymbol {z}}_{i}{\text{,}}\end{aligned}}} 其中 P i = V i U i − 1 , z i = L i − 1 ( ‖ r 0 ‖ 2 e 1 ) . {\displaystyle {\begin{aligned}{\boldsymbol {P}}_{i}&={\boldsymbol {V}}_{i}{\boldsymbol {U}}_{i}^{-1}{\text{,}}\\{\boldsymbol {z}}_{i}&={\boldsymbol {L}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1}){\text{.}}\end{aligned}}} 此时需要观察到 P i = [ P i − 1 p i ] , z i = [ z i − 1 ζ i ] . {\displaystyle {\begin{aligned}{\boldsymbol {P}}_{i}&={\begin{bmatrix}{\boldsymbol {P}}_{i-1}&{\boldsymbol {p}}_{i}\end{bmatrix}}{\text{,}}\\{\boldsymbol {z}}_{i}&={\begin{bmatrix}{\boldsymbol {z}}_{i-1}\\\zeta _{i}\end{bmatrix}}{\text{.}}\end{aligned}}} 实际上, p i {\displaystyle {\boldsymbol {p}}_{i}} 和 ζ i {\displaystyle \zeta _{i}} 同样有简短的递推式: p i = 1 d i ( v i − b i p i − 1 ) , α i = − c i ζ i − 1 . {\displaystyle {\begin{aligned}{\boldsymbol {p}}_{i}&={\frac {1}{d_{i}}}({\boldsymbol {v}}_{i}-b_{i}{\boldsymbol {p}}_{i-1}){\text{,}}\\\alpha _{i}&=-c_{i}\zeta _{i-1}{\text{.}}\end{aligned}}} 通过这个形式,我们得到 x i {\displaystyle {\boldsymbol {x}}_{i}} 的一个简单的递推式: x i = x 0 + P i z i = x 0 + P i − 1 z i − 1 + ζ i p i = x i − 1 + ζ i p i . {\displaystyle {\begin{aligned}{\boldsymbol {x}}_{i}&={\boldsymbol {x}}_{0}+{\boldsymbol {P}}_{i}{\boldsymbol {z}}_{i}\\&={\boldsymbol {x}}_{0}+{\boldsymbol {P}}_{i-1}{\boldsymbol {z}}_{i-1}+\zeta _{i}{\boldsymbol {p}}_{i}\\&={\boldsymbol {x}}_{i-1}+\zeta _{i}{\boldsymbol {p}}_{i}{\text{.}}\end{aligned}}} 以上的递推关系立即导出比共轭梯度法稍微更复杂的直接Lanczos方法。 Remove ads从正交性和共轭性导出共轭梯度法 如果我们允许缩放 p i {\displaystyle {\boldsymbol {p}}_{i}} 并通过常数因子补偿缩放的系数,我们可能可以得到以下形式的更简单的递推式: x i = x i − 1 + α i − 1 p i − 1 , r i = r i − 1 − α i − 1 A p i − 1 , p i = r i + β i − 1 p i − 1 . {\displaystyle {\begin{aligned}{\boldsymbol {x}}_{i}&={\boldsymbol {x}}_{i-1}+\alpha _{i-1}{\boldsymbol {p}}_{i-1}{\text{,}}\\{\boldsymbol {r}}_{i}&={\boldsymbol {r}}_{i-1}-\alpha _{i-1}{\boldsymbol {Ap}}_{i-1}{\text{,}}\\{\boldsymbol {p}}_{i}&={\boldsymbol {r}}_{i}+\beta _{i-1}{\boldsymbol {p}}_{i-1}{\text{.}}\end{aligned}}} 作为简化的前提,我们现在推导 r i {\displaystyle {\boldsymbol {r}}_{i}} 的正交性和 p i {\displaystyle {\boldsymbol {p}}_{i}} 的共轭性,即对于 i ≠ j {\displaystyle i\neq j} , r i T r j = 0 , p i T A p j = 0 . {\displaystyle {\begin{aligned}{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{j}&=0{\text{,}}\\{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{j}&=0{\text{.}}\end{aligned}}} 各个残量之间满足正交性的原因是 r i {\displaystyle {\boldsymbol {r}}_{i}} 实际上可由 v i + 1 {\displaystyle {\boldsymbol {v}}_{i+1}} 乘上一个系数而得。这是因为对于 i = 0 {\displaystyle i=0} , r 0 = ‖ r 0 ‖ 2 v 1 {\displaystyle {\boldsymbol {r}}_{0}=\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {v}}_{1}} ,对于 i > 0 {\displaystyle i>0} , r i = b − A x i = b − A ( x 0 + V i y i ) = r 0 − A V i y i = r 0 − V i + 1 H ~ i y i = r 0 − V i H i y i − h i + 1 , i ( e i T y i ) v i + 1 = ‖ r 0 ‖ 2 v 1 − V i ( ‖ r 0 ‖ 2 e 1 ) − h i + 1 , i ( e i T y i ) v i + 1 = − h i + 1 , i ( e i T y i ) v i + 1 . {\displaystyle {\begin{aligned}{\boldsymbol {r}}_{i}&={\boldsymbol {b}}-{\boldsymbol {Ax}}_{i}\\&={\boldsymbol {b}}-{\boldsymbol {A}}({\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {y}}_{i})\\&={\boldsymbol {r}}_{0}-{\boldsymbol {AV}}_{i}{\boldsymbol {y}}_{i}\\&={\boldsymbol {r}}_{0}-{\boldsymbol {V}}_{i+1}{\boldsymbol {\tilde {H}}}_{i}{\boldsymbol {y}}_{i}\\&={\boldsymbol {r}}_{0}-{\boldsymbol {V}}_{i}{\boldsymbol {H}}_{i}{\boldsymbol {y}}_{i}-h_{i+1,i}({\boldsymbol {e}}_{i}^{\mathrm {T} }{\boldsymbol {y}}_{i}){\boldsymbol {v}}_{i+1}\\&=\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {v}}_{1}-{\boldsymbol {V}}_{i}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})-h_{i+1,i}({\boldsymbol {e}}_{i}^{\mathrm {T} }{\boldsymbol {y}}_{i}){\boldsymbol {v}}_{i+1}\\&=-h_{i+1,i}({\boldsymbol {e}}_{i}^{\mathrm {T} }{\boldsymbol {y}}_{i}){\boldsymbol {v}}_{i+1}{\text{.}}\end{aligned}}} 要得到 p i {\displaystyle {\boldsymbol {p}}_{i}} 的共轭性,只需证明 P i T A P i {\displaystyle {\boldsymbol {P}}_{i}^{\mathrm {T} }{\boldsymbol {AP}}_{i}} 是对角矩阵: P i T A P i = U i − T V i T A V i U i − 1 = U i − T H i U i − 1 = U i − T L i U i U i − 1 = U i − T L i {\displaystyle {\begin{aligned}{\boldsymbol {P}}_{i}^{\mathrm {T} }{\boldsymbol {AP}}_{i}&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {V}}_{i}^{\mathrm {T} }{\boldsymbol {AV}}_{i}{\boldsymbol {U}}_{i}^{-1}\\&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {H}}_{i}{\boldsymbol {U}}_{i}^{-1}\\&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {L}}_{i}{\boldsymbol {U}}_{i}{\boldsymbol {U}}_{i}^{-1}\\&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {L}}_{i}\end{aligned}}} 是对称的下三角矩阵,因而必然是对角矩阵。 现在我们可以单纯由 r i {\displaystyle {\boldsymbol {r}}_{i}} 的正交性和 p i {\displaystyle {\boldsymbol {p}}_{i}} 的共轭性推导相对于缩放后的 p i {\displaystyle {\boldsymbol {p}}_{i}} 的常数因子 α i {\displaystyle \alpha _{i}} 和 β i {\displaystyle \beta _{i}} 。 由于 r i {\displaystyle {\boldsymbol {r}}_{i}} 的正交性,必然有 r i + 1 T r i = ( r i − α i A p i ) T r i = 0 {\displaystyle {\boldsymbol {r}}_{i+1}^{\mathrm {T} }{\boldsymbol {r}}_{i}=({\boldsymbol {r}}_{i}-\alpha _{i}{\boldsymbol {Ap}}_{i})^{\mathrm {T} }{\boldsymbol {r}}_{i}=0} 。于是 α i = r i T r i r i T A p i = r i T r i ( p i − β i − 1 p i − 1 ) T A p i = r i T r i p i T A p i . {\displaystyle {\begin{aligned}\alpha _{i}&={\frac {{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&={\frac {{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{({\boldsymbol {p}}_{i}-\beta _{i-1}{\boldsymbol {p}}_{i-1})^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&={\frac {{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}{\text{.}}\end{aligned}}} 类似地,由于 p i {\displaystyle {\boldsymbol {p}}_{i}} 的共轭性,必然有 p i + 1 T A p i = ( r i + 1 + β i p i ) T A p i = 0 {\displaystyle {\boldsymbol {p}}_{i+1}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}=({\boldsymbol {r}}_{i+1}+\beta _{i}{\boldsymbol {p}}_{i})^{\mathrm {T} }{\boldsymbol {Ap}}_{i}=0} 。于是 β i = − r i + 1 T A p i p i T A p i = − r i + 1 T ( r i − r i + 1 ) α i p i T A p i = r i + 1 T r i + 1 r i T r i . {\displaystyle {\begin{aligned}\beta _{i}&=-{\frac {{\boldsymbol {r}}_{i+1}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}{{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&=-{\frac {{\boldsymbol {r}}_{i+1}^{\mathrm {T} }({\boldsymbol {r}}_{i}-{\boldsymbol {r}}_{i+1})}{\alpha _{i}{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&={\frac {{\boldsymbol {r}}_{i+1}^{\mathrm {T} }{\boldsymbol {r}}_{i+1}}{{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}}{\text{.}}\end{aligned}}} 推导至此完成。 Remove ads参考文献 Hestenes, M. R.; Stiefel, E. Methods of conjugate gradients for solving linear systems (PDF). Journal of Research of the National Bureau of Standards. 1952年12月, 49 (6) [2010-06-20]. (原始内容 (PDF)存档于2010-05-05). Saad, Y. Chapter 6: Krylov Subspace Methods, Part I. Iterative methods for sparse linear systems 2nd. SIAM. 2003. ISBN 978-0898715347. Loading related searches...Wikiwand - on Seamless Wikipedia browsing. On steroids.Remove ads