在數值線性代數中,共軛梯度法是一種求解對稱正定線性方程組 A x = b {\displaystyle {\boldsymbol {Ax}}={\boldsymbol {b}}} 的迭代方法。共軛梯度法可以從不同的角度推導而得,包括作為求解最優化問題的共軛方向法的特例,以及作為求解特徵值問題的Arnoldi/Lanczos迭代的變種。 本條目記述這些推導方法中的重要步驟。 從共軛方向法推導 此章節需要擴充。 (2010年6月) 共軛梯度法可以看作是應用於二次函數最小化的共軛方向法的特例 f ( x ) = x T A x − 2 b T x . {\displaystyle f({\boldsymbol {x}})={\boldsymbol {x}}^{\mathrm {T} }{\boldsymbol {A}}{\boldsymbol {x}}-2{\boldsymbol {b}}^{\mathrm {T} }{\boldsymbol {x}}{\text{.}}} Remove ads共軛方向法 在共軛方向上求最小化: f ( x ) = x T A x − 2 b T x . {\displaystyle f({\boldsymbol {x}})={\boldsymbol {x}}^{\mathrm {T} }{\boldsymbol {A}}{\boldsymbol {x}}-2{\boldsymbol {b}}^{\mathrm {T} }{\boldsymbol {x}}{\text{.}}} 從最初的猜測 x 0 {\displaystyle {\boldsymbol {x}}_{0}} 開始,以及相應的殘差 r 0 = b − A x 0 {\displaystyle {\boldsymbol {r}}_{0}={\boldsymbol {b}}-{\boldsymbol {Ax}}_{0}} , 並通過公式計算迭代和殘差 α i = p i T r i p i T A p i , x i + 1 = x i + α i p i , r i + 1 = r i − α i A p i {\displaystyle {\begin{aligned}\alpha _{i}&={\frac {{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}{\text{,}}\\{\boldsymbol {x}}_{i+1}&={\boldsymbol {x}}_{i}+\alpha _{i}{\boldsymbol {p}}_{i}{\text{,}}\\{\boldsymbol {r}}_{i+1}&={\boldsymbol {r}}_{i}-\alpha _{i}{\boldsymbol {Ap}}_{i}\end{aligned}}} 其中, p 0 , p 1 , p 2 , … {\displaystyle {\boldsymbol {p}}_{0},{\boldsymbol {p}}_{1},{\boldsymbol {p}}_{2},\ldots } 為一系列相互共軛的方向,例如: p i T A p j = 0 {\displaystyle {\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{j}=0} ,對於任何 i ≠ j {\displaystyle i\neq j} 由於共軛方向法沒有給出用於選擇方向的公式,因此該方法具有誤差 p 0 , p 1 , p 2 , … {\displaystyle {\boldsymbol {p}}_{0},{\boldsymbol {p}}_{1},{\boldsymbol {p}}_{2},\ldots } 而共軛方向法也有多種方法分支,包括共軛梯度法和高斯消元法。 Remove ads從Arnoldi/Lanczos迭代推導 共軛梯度法可以看作Arnoldi/Lanczos迭代應用於求解線性方程組時的一個變種。 一般Arnoldi方法 Arnoldi迭代從一個向量 r 0 {\displaystyle {\boldsymbol {r}}_{0}} 開始,通過定義 v i = w i / ‖ w i ‖ 2 {\displaystyle {\boldsymbol {v}}_{i}={\boldsymbol {w}}_{i}/\lVert {\boldsymbol {w}}_{i}\rVert _{2}} ,其中 w i = { r 0 if i = 1 , A v i − 1 − ∑ j = 1 i − 1 ( v j T A v i − 1 ) v j if i > 1 , {\displaystyle {\boldsymbol {w}}_{i}={\begin{cases}{\boldsymbol {r}}_{0}&{\text{if }}i=1{\text{,}}\\{\boldsymbol {Av}}_{i-1}-\sum _{j=1}^{i-1}({\boldsymbol {v}}_{j}^{\mathrm {T} }{\boldsymbol {Av}}_{i-1}){\boldsymbol {v}}_{j}&{\text{if }}i>1{\text{,}}\end{cases}}} 逐步建立Krylov子空間 K ( A , r 0 ) = { r 0 , A r 0 , A 2 r 0 , … } {\displaystyle {\mathcal {K}}({\boldsymbol {A}},{\boldsymbol {r}}_{0})=\{{\boldsymbol {r}}_{0},{\boldsymbol {Ar}}_{0},{\boldsymbol {A}}^{2}{\boldsymbol {r}}_{0},\ldots \}} 的一組標準正交基 { v 1 , v 2 , v 3 , … } {\displaystyle \{{\boldsymbol {v}}_{1},{\boldsymbol {v}}_{2},{\boldsymbol {v}}_{3},\ldots \}} 。 換言之,對於 i > 1 {\displaystyle i>1} , v i {\displaystyle {\boldsymbol {v}}_{i}} 由將 A v i − 1 {\displaystyle {\boldsymbol {Av}}_{i-1}} 相對於 { v 1 , v 2 , … , v i − 1 } {\displaystyle \{{\boldsymbol {v}}_{1},{\boldsymbol {v}}_{2},\ldots ,{\boldsymbol {v}}_{i-1}\}} 進行Gram-Schmidt正交化然後歸一化得到。 寫成矩陣形式,迭代過程可以表示為 A V i = V i + 1 H ~ i , {\displaystyle {\boldsymbol {AV}}_{i}={\boldsymbol {V}}_{i+1}{\boldsymbol {\tilde {H}}}_{i}{\text{,}}} 其中 V i = [ v 1 v 2 ⋯ v i ] , H ~ i = [ h 11 h 12 h 13 ⋯ h 1 , i h 21 h 22 h 23 ⋯ h 2 , i h 32 h 33 ⋯ h 3 , i ⋱ ⋱ ⋮ h i , i − 1 h i , i h i + 1 , i ] = [ H i h i + 1 , i e i T ] , h j i = { v j T A v i if j ≤ i , ‖ w i + 1 ‖ 2 if j = i + 1 , 0 if j > i + 1 . {\displaystyle {\begin{aligned}{\boldsymbol {V}}_{i}&={\begin{bmatrix}{\boldsymbol {v}}_{1}&{\boldsymbol {v}}_{2}&\cdots &{\boldsymbol {v}}_{i}\end{bmatrix}}{\text{,}}\\{\boldsymbol {\tilde {H}}}_{i}&={\begin{bmatrix}h_{11}&h_{12}&h_{13}&\cdots &h_{1,i}\\h_{21}&h_{22}&h_{23}&\cdots &h_{2,i}\\&h_{32}&h_{33}&\cdots &h_{3,i}\\&&\ddots &\ddots &\vdots \\&&&h_{i,i-1}&h_{i,i}\\&&&&h_{i+1,i}\end{bmatrix}}={\begin{bmatrix}{\boldsymbol {H}}_{i}\\h_{i+1,i}{\boldsymbol {e}}_{i}^{\mathrm {T} }\end{bmatrix}}{\text{,}}\\h_{ji}&={\begin{cases}{\boldsymbol {v}}_{j}^{\mathrm {T} }{\boldsymbol {Av}}_{i}&{\text{if }}j\leq i{\text{,}}\\\lVert {\boldsymbol {w}}_{i+1}\rVert _{2}&{\text{if }}j=i+1{\text{,}}\\0&{\text{if }}j>i+1{\text{.}}\end{cases}}\end{aligned}}} 當應用於求解線性方程組時,Arnoldi迭代對應於初始解 x 0 {\displaystyle {\boldsymbol {x}}_{0}} 的殘量 r 0 = b − A x 0 {\displaystyle {\boldsymbol {r}}_{0}={\boldsymbol {b}}-{\boldsymbol {Ax}}_{0}} 開始迭代,在每一步迭代之後計算 y i = H i − 1 ( ‖ r 0 ‖ 2 e 1 ) {\displaystyle {\boldsymbol {y}}_{i}={\boldsymbol {H}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})} 和新的近似解 x i = x 0 + V i y i {\displaystyle {\boldsymbol {x}}_{i}={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {y}}_{i}} . Remove ads直接Lanzcos方法 在餘下的討論中,我們假定 A {\displaystyle {\boldsymbol {A}}} 是對稱正定矩陣。由於 A {\displaystyle {\boldsymbol {A}}} 的對稱性, 上Hessenberg矩陣 H i = V i T A V i {\displaystyle {\boldsymbol {H}}_{i}={\boldsymbol {V}}_{i}^{\mathrm {T} }{\boldsymbol {AV}}_{i}} 變成對陣三對角矩陣。於是它可以被更明確地表示為 H i = [ a 1 b 2 b 2 a 2 b 3 ⋱ ⋱ ⋱ b i − 1 a i − 1 b i b i a i ] . {\displaystyle {\boldsymbol {H}}_{i}={\begin{bmatrix}a_{1}&b_{2}\\b_{2}&a_{2}&b_{3}\\&\ddots &\ddots &\ddots \\&&b_{i-1}&a_{i-1}&b_{i}\\&&&b_{i}&a_{i}\end{bmatrix}}{\text{.}}} 這使得迭代中的 v i {\displaystyle {\boldsymbol {v}}_{i}} 有一個簡短的三項遞推式。Arnoldi迭代從而簡化為Lanczos迭代。 由於 A {\displaystyle {\boldsymbol {A}}} 對稱正定, H i {\displaystyle {\boldsymbol {H}}_{i}} 同樣也對稱正定。因此, H i {\displaystyle {\boldsymbol {H}}_{i}} 可以通過不選主元的LU分解分解為 H i = L i U i = [ 1 c 2 1 ⋱ ⋱ c i − 1 1 c i 1 ] [ d 1 b 2 d 2 b 3 ⋱ ⋱ d i − 1 b i d i ] , {\displaystyle {\boldsymbol {H}}_{i}={\boldsymbol {L}}_{i}{\boldsymbol {U}}_{i}={\begin{bmatrix}1\\c_{2}&1\\&\ddots &\ddots \\&&c_{i-1}&1\\&&&c_{i}&1\end{bmatrix}}{\begin{bmatrix}d_{1}&b_{2}\\&d_{2}&b_{3}\\&&\ddots &\ddots \\&&&d_{i-1}&b_{i}\\&&&&d_{i}\end{bmatrix}}{\text{,}}} 其中 c i {\displaystyle c_{i}} 和 d i {\displaystyle d_{i}} 有簡單的遞推式: c i = b i / d i − 1 , d i = { a 1 if i = 1 , a i − c i b i if i > 1 . {\displaystyle {\begin{aligned}c_{i}&=b_{i}/d_{i-1}{\text{,}}\\d_{i}&={\begin{cases}a_{1}&{\text{if }}i=1{\text{,}}\\a_{i}-c_{i}b_{i}&{\text{if }}i>1{\text{.}}\end{cases}}\end{aligned}}} 改寫 x i = x 0 + V i y i {\displaystyle {\boldsymbol {x}}_{i}={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {y}}_{i}} 為 x i = x 0 + V i H i − 1 ( ‖ r 0 ‖ 2 e 1 ) = x 0 + V i U i − 1 L i − 1 ( ‖ r 0 ‖ 2 e 1 ) = x 0 + P i z i , {\displaystyle {\begin{aligned}{\boldsymbol {x}}_{i}&={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {H}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})\\&={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {U}}_{i}^{-1}{\boldsymbol {L}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})\\&={\boldsymbol {x}}_{0}+{\boldsymbol {P}}_{i}{\boldsymbol {z}}_{i}{\text{,}}\end{aligned}}} 其中 P i = V i U i − 1 , z i = L i − 1 ( ‖ r 0 ‖ 2 e 1 ) . {\displaystyle {\begin{aligned}{\boldsymbol {P}}_{i}&={\boldsymbol {V}}_{i}{\boldsymbol {U}}_{i}^{-1}{\text{,}}\\{\boldsymbol {z}}_{i}&={\boldsymbol {L}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1}){\text{.}}\end{aligned}}} 此時需要觀察到 P i = [ P i − 1 p i ] , z i = [ z i − 1 ζ i ] . {\displaystyle {\begin{aligned}{\boldsymbol {P}}_{i}&={\begin{bmatrix}{\boldsymbol {P}}_{i-1}&{\boldsymbol {p}}_{i}\end{bmatrix}}{\text{,}}\\{\boldsymbol {z}}_{i}&={\begin{bmatrix}{\boldsymbol {z}}_{i-1}\\\zeta _{i}\end{bmatrix}}{\text{.}}\end{aligned}}} 實際上, p i {\displaystyle {\boldsymbol {p}}_{i}} 和 ζ i {\displaystyle \zeta _{i}} 同樣有簡短的遞推式: p i = 1 d i ( v i − b i p i − 1 ) , α i = − c i ζ i − 1 . {\displaystyle {\begin{aligned}{\boldsymbol {p}}_{i}&={\frac {1}{d_{i}}}({\boldsymbol {v}}_{i}-b_{i}{\boldsymbol {p}}_{i-1}){\text{,}}\\\alpha _{i}&=-c_{i}\zeta _{i-1}{\text{.}}\end{aligned}}} 通過這個形式,我們得到 x i {\displaystyle {\boldsymbol {x}}_{i}} 的一個簡單的遞推式: x i = x 0 + P i z i = x 0 + P i − 1 z i − 1 + ζ i p i = x i − 1 + ζ i p i . {\displaystyle {\begin{aligned}{\boldsymbol {x}}_{i}&={\boldsymbol {x}}_{0}+{\boldsymbol {P}}_{i}{\boldsymbol {z}}_{i}\\&={\boldsymbol {x}}_{0}+{\boldsymbol {P}}_{i-1}{\boldsymbol {z}}_{i-1}+\zeta _{i}{\boldsymbol {p}}_{i}\\&={\boldsymbol {x}}_{i-1}+\zeta _{i}{\boldsymbol {p}}_{i}{\text{.}}\end{aligned}}} 以上的遞推關係立即導出比共軛梯度法稍微更複雜的直接Lanczos方法。 Remove ads從正交性和共軛性導出共軛梯度法 如果我們允許縮放 p i {\displaystyle {\boldsymbol {p}}_{i}} 並通過常數因子補償縮放的係數,我們可能可以得到以下形式的更簡單的遞推式: x i = x i − 1 + α i − 1 p i − 1 , r i = r i − 1 − α i − 1 A p i − 1 , p i = r i + β i − 1 p i − 1 . {\displaystyle {\begin{aligned}{\boldsymbol {x}}_{i}&={\boldsymbol {x}}_{i-1}+\alpha _{i-1}{\boldsymbol {p}}_{i-1}{\text{,}}\\{\boldsymbol {r}}_{i}&={\boldsymbol {r}}_{i-1}-\alpha _{i-1}{\boldsymbol {Ap}}_{i-1}{\text{,}}\\{\boldsymbol {p}}_{i}&={\boldsymbol {r}}_{i}+\beta _{i-1}{\boldsymbol {p}}_{i-1}{\text{.}}\end{aligned}}} 作為簡化的前提,我們現在推導 r i {\displaystyle {\boldsymbol {r}}_{i}} 的正交性和 p i {\displaystyle {\boldsymbol {p}}_{i}} 的共軛性,即對於 i ≠ j {\displaystyle i\neq j} , r i T r j = 0 , p i T A p j = 0 . {\displaystyle {\begin{aligned}{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{j}&=0{\text{,}}\\{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{j}&=0{\text{.}}\end{aligned}}} 各個殘量之間滿足正交性的原因是 r i {\displaystyle {\boldsymbol {r}}_{i}} 實際上可由 v i + 1 {\displaystyle {\boldsymbol {v}}_{i+1}} 乘上一個係數而得。這是因為對於 i = 0 {\displaystyle i=0} , r 0 = ‖ r 0 ‖ 2 v 1 {\displaystyle {\boldsymbol {r}}_{0}=\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {v}}_{1}} ,對於 i > 0 {\displaystyle i>0} , r i = b − A x i = b − A ( x 0 + V i y i ) = r 0 − A V i y i = r 0 − V i + 1 H ~ i y i = r 0 − V i H i y i − h i + 1 , i ( e i T y i ) v i + 1 = ‖ r 0 ‖ 2 v 1 − V i ( ‖ r 0 ‖ 2 e 1 ) − h i + 1 , i ( e i T y i ) v i + 1 = − h i + 1 , i ( e i T y i ) v i + 1 . {\displaystyle {\begin{aligned}{\boldsymbol {r}}_{i}&={\boldsymbol {b}}-{\boldsymbol {Ax}}_{i}\\&={\boldsymbol {b}}-{\boldsymbol {A}}({\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {y}}_{i})\\&={\boldsymbol {r}}_{0}-{\boldsymbol {AV}}_{i}{\boldsymbol {y}}_{i}\\&={\boldsymbol {r}}_{0}-{\boldsymbol {V}}_{i+1}{\boldsymbol {\tilde {H}}}_{i}{\boldsymbol {y}}_{i}\\&={\boldsymbol {r}}_{0}-{\boldsymbol {V}}_{i}{\boldsymbol {H}}_{i}{\boldsymbol {y}}_{i}-h_{i+1,i}({\boldsymbol {e}}_{i}^{\mathrm {T} }{\boldsymbol {y}}_{i}){\boldsymbol {v}}_{i+1}\\&=\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {v}}_{1}-{\boldsymbol {V}}_{i}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})-h_{i+1,i}({\boldsymbol {e}}_{i}^{\mathrm {T} }{\boldsymbol {y}}_{i}){\boldsymbol {v}}_{i+1}\\&=-h_{i+1,i}({\boldsymbol {e}}_{i}^{\mathrm {T} }{\boldsymbol {y}}_{i}){\boldsymbol {v}}_{i+1}{\text{.}}\end{aligned}}} 要得到 p i {\displaystyle {\boldsymbol {p}}_{i}} 的共軛性,只需證明 P i T A P i {\displaystyle {\boldsymbol {P}}_{i}^{\mathrm {T} }{\boldsymbol {AP}}_{i}} 是對角矩陣: P i T A P i = U i − T V i T A V i U i − 1 = U i − T H i U i − 1 = U i − T L i U i U i − 1 = U i − T L i {\displaystyle {\begin{aligned}{\boldsymbol {P}}_{i}^{\mathrm {T} }{\boldsymbol {AP}}_{i}&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {V}}_{i}^{\mathrm {T} }{\boldsymbol {AV}}_{i}{\boldsymbol {U}}_{i}^{-1}\\&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {H}}_{i}{\boldsymbol {U}}_{i}^{-1}\\&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {L}}_{i}{\boldsymbol {U}}_{i}{\boldsymbol {U}}_{i}^{-1}\\&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {L}}_{i}\end{aligned}}} 是對稱的下三角矩陣,因而必然是對角矩陣。 現在我們可以單純由 r i {\displaystyle {\boldsymbol {r}}_{i}} 的正交性和 p i {\displaystyle {\boldsymbol {p}}_{i}} 的共軛性推導相對於縮放後的 p i {\displaystyle {\boldsymbol {p}}_{i}} 的常數因子 α i {\displaystyle \alpha _{i}} 和 β i {\displaystyle \beta _{i}} 。 由於 r i {\displaystyle {\boldsymbol {r}}_{i}} 的正交性,必然有 r i + 1 T r i = ( r i − α i A p i ) T r i = 0 {\displaystyle {\boldsymbol {r}}_{i+1}^{\mathrm {T} }{\boldsymbol {r}}_{i}=({\boldsymbol {r}}_{i}-\alpha _{i}{\boldsymbol {Ap}}_{i})^{\mathrm {T} }{\boldsymbol {r}}_{i}=0} 。於是 α i = r i T r i r i T A p i = r i T r i ( p i − β i − 1 p i − 1 ) T A p i = r i T r i p i T A p i . {\displaystyle {\begin{aligned}\alpha _{i}&={\frac {{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&={\frac {{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{({\boldsymbol {p}}_{i}-\beta _{i-1}{\boldsymbol {p}}_{i-1})^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&={\frac {{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}{\text{.}}\end{aligned}}} 類似地,由於 p i {\displaystyle {\boldsymbol {p}}_{i}} 的共軛性,必然有 p i + 1 T A p i = ( r i + 1 + β i p i ) T A p i = 0 {\displaystyle {\boldsymbol {p}}_{i+1}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}=({\boldsymbol {r}}_{i+1}+\beta _{i}{\boldsymbol {p}}_{i})^{\mathrm {T} }{\boldsymbol {Ap}}_{i}=0} 。於是 β i = − r i + 1 T A p i p i T A p i = − r i + 1 T ( r i − r i + 1 ) α i p i T A p i = r i + 1 T r i + 1 r i T r i . {\displaystyle {\begin{aligned}\beta _{i}&=-{\frac {{\boldsymbol {r}}_{i+1}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}{{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&=-{\frac {{\boldsymbol {r}}_{i+1}^{\mathrm {T} }({\boldsymbol {r}}_{i}-{\boldsymbol {r}}_{i+1})}{\alpha _{i}{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&={\frac {{\boldsymbol {r}}_{i+1}^{\mathrm {T} }{\boldsymbol {r}}_{i+1}}{{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}}{\text{.}}\end{aligned}}} 推導至此完成。 Remove ads參考文獻 Hestenes, M. R.; Stiefel, E. Methods of conjugate gradients for solving linear systems (PDF). Journal of Research of the National Bureau of Standards. 1952年12月, 49 (6) [2010-06-20]. (原始內容 (PDF)存檔於2010-05-05). Saad, Y. Chapter 6: Krylov Subspace Methods, Part I. Iterative methods for sparse linear systems 2nd. SIAM. 2003. ISBN 978-0898715347. Loading related searches...Wikiwand - on Seamless Wikipedia browsing. On steroids.Remove ads