000198666 001__ 198666
000198666 005__ 20181203023504.0
000198666 0247_ $$2doi$$a10.1145/2559986
000198666 022__ $$a0098-3500
000198666 02470 $$2ISI$$a000333653400004
000198666 037__ $$aARTICLE
000198666 245__ $$aOptimally Packed Chains of Bulges in Multishift QR Algorithms
000198666 260__ $$bAssociation for Computing Machinery$$c2014$$aNew York
000198666 269__ $$a2014
000198666 300__ $$a15
000198666 336__ $$aJournal Articles
000198666 520__ $$aThe QR algorithm is the method of choice for computing all eigenvalues of a dense nonsymmetric matrix A. After an initial reduction to Hessenberg form, a QR iteration can be viewed as chasing a small bulge from the top left to the bottom right corner along the subdiagonal of A. To increase data locality and create potential for parallelism, modern variants of the QR algorithm perform several iterations simultaneously, which amounts to chasing a chain of several bulges instead of a single bulge. To make effective use of level 3 BLAS, it is important to pack these bulges as tightly as possible within the chain. In this work, we show that the tightness of the packing in existing approaches is not optimal and can be increased. This directly translates into a reduced chain length by 33% compared to the state-of-the-art LAPACK implementation of the QR algorithm. To demonstrate the impact of our idea, we have modified the LAPACK implementation to make use of the optimal packing. Numerical experiments reveal a uniform reduction of the execution time, without affecting stability or robustness.
000198666 6531_ $$aAlgorithms
000198666 6531_ $$aPerformance
000198666 6531_ $$aMultishift QR algorithms
000198666 6531_ $$aimplicit shifts
000198666 6531_ $$alevel 3 BLAS
000198666 700__ $$uUmea Univ, Dept Comp Sci, S-90187 Umea, Sweden$$aKarlsson, Lars
000198666 700__ $$0246441$$g213191$$uEPF Lausanne, ANCHP, MATHICSE, Lausanne, Switzerland$$aKressner, Daniel
000198666 700__ $$aLang, Bruno
000198666 773__ $$j40$$tACM Transactions on Mathematical Software$$k2
000198666 909C0 $$xU12478$$0252494$$pANCHP
000198666 909CO $$pSB$$particle$$ooai:infoscience.tind.io:198666
000198666 917Z8 $$x213191
000198666 937__ $$aEPFL-ARTICLE-198666
000198666 973__ $$rREVIEWED$$sPUBLISHED$$aEPFL
000198666 980__ $$aARTICLE