Routine: PSSYTTRD()  File: SRC\pssyttrd.f

 
 
# lines: 1198
  # code: 1198
  # comment: 0
  # blank:0
# Variables:113
# Callers:1
# Callings:3
# Words:445
# Keywords:295
 

 

..
     .. Array Arguments ..
     ..
     Purpose
     =======
     PSSYTTRD reduces a complex Hermitian matrix sub( A ) to Hermitian
     tridiagonal form T by an unitary similarity transformation:
     Q' * sub( A ) * Q = T, where sub( A ) = A(IA:IA+N-1,JA:JA+N-1).
     Notes
     =====
     Each global data object is described by an associated description
     vector.  This vector stores the information required to establish
     the mapping between an object element and its corresponding
     process and memory location.
     Let A be a generic term for any 2D block cyclicly distributed
     array.
     Such a global array has an associated description vector DESCA.
     In the following comments, the character _ should be read as
     "of the global array".
     NOTATION        STORED IN      EXPLANATION
     --------------- -------------- -----------------------------------
     DTYPE_A(global) DESCA( DTYPE_ )The descriptor type.  In this case,
     DTYPE_A = 1.
     CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle,
     indicating the BLACS process grid A is distribu-
     ted over. The context itself is glo-
     bal, but the handle (the integer
     value) may vary.
     M_A    (global) DESCA( M_ )    The number of rows in the global
     array A.
     N_A    (global) DESCA( N_ )    The number of columns in the global
     array A.
     MB_A   (global) DESCA( MB_ )   The blocking factor used to
     distribute the rows of the array.
     NB_A   (global) DESCA( NB_ )   The blocking factor used to
     distribute the columns of the array.
     RSRC_A (global) DESCA( RSRC_ ) The process row over which the
     first row of the array A is distributed.
     CSRC_A (global) DESCA( CSRC_ ) The process column over which the
     first column of the array A is
     distributed.
     LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
     array.  LLD_A >= MAX(1,LOCp(M_A)).
     Let K be the number of rows or columns of a distributed matrix,
     and assume that its process grid has dimension p x q.
     LOCp( K ) denotes the number of elements of K that a process
     would receive if K were distributed over the p processes of its
     process column.
     Similarly, LOCq( K ) denotes the number of elements of K that a
     process would receive if K were distributed over the q processes
     of its process row.
     The values of LOCp() and LOCq() may be determined via a call to
     the ScaLAPACK tool function, NUMROC:
     LOCp( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
     LOCq( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
     An upper bound for these quantities may be computed by:
     LOCp( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
     LOCq( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
     Arguments
     =========
     UPLO    (global input) CHARACTER
     Specifies whether the upper or lower triangular part of the
     Hermitian matrix sub( A ) is stored:
     = 'U':  Upper triangular
     = 'L':  Lower triangular
     N       (global input) INTEGER
     The number of rows and columns to be operated on, i.e. the
     order of the distributed submatrix sub( A ). N >= 0.
     A       (local input/local output) REAL pointer into the
     local memory to an array of dimension (LLD_A,LOCq(JA+N-1)).
     On entry, this array contains the local pieces of the
     Hermitian distributed matrix sub( A ).  If UPLO = 'U', the
     leading N-by-N upper triangular part of sub( A ) contains
     the upper triangular part of the matrix, and its strictly
     lower triangular part is not referenced. If UPLO = 'L', the
     leading N-by-N lower triangular part of sub( A ) contains the
     lower triangular part of the matrix, and its strictly upper
     triangular part is not referenced. On exit, if UPLO = 'U',
     the diagonal and first superdiagonal of sub( A ) are over-
     written by the corresponding elements of the tridiagonal
     matrix T, and the elements above the first superdiagonal,
     with the array TAU, represent the unitary matrix Q as a
     product of elementary reflectors; if UPLO = 'L', the diagonal
     and first subdiagonal of sub( A ) are overwritten by the
     corresponding elements of the tridiagonal matrix T, and the
     elements below the first subdiagonal, with the array TAU,
     represent the unitary matrix Q as a product of elementary
     reflectors. See Further Details.
     IA      (global input) INTEGER
     The row index in the global array A indicating the first
     row of sub( A ).
     JA      (global input) INTEGER
     The column index in the global array A indicating the
     first column of sub( A ).
     DESCA   (global and local input) INTEGER array of dimension DLEN_.
     The array descriptor for the distributed matrix A.
     D       (local output) REAL array, dim LOCq(JA+N-1)
     The diagonal elements of the tridiagonal matrix T:
     D(i) = A(i,i). D is tied to the distributed matrix A.
     E       (local output) REAL array, dim LOCq(JA+N-1)
     if UPLO = 'U', LOCq(JA+N-2) otherwise. The off-diagonal
     elements of the tridiagonal matrix T: E(i) = A(i,i+1) if
     UPLO = 'U', E(i) = A(i+1,i) if UPLO = 'L'. E is tied to the
     distributed matrix A.
     TAU     (local output) REAL, array, dimension
     LOCq(JA+N-1). This array contains the scalar factors TAU of
     the elementary reflectors. TAU is tied to the distributed
     matrix A.
     WORK    (local workspace) REAL array, dimension (LWORK)
     On exit, WORK( 1 ) returns the minimal and optimal workspace
     LWORK   (local input) INTEGER
     The dimension of the array WORK.
     LWORK >= 2*( ANB+1 )*( 4*NPS+2 ) + NPS
     Where:
         NPS = MAX( NUMROC( N, 1, 0, 0, NPROW ), 2*ANB )
         ANB = PJLAENV( DESCA( CTXT_ ), 3, 'PSSYTTRD', 'L', 0, 0,
           0, 0 )
         NUMROC is a ScaLAPACK tool function;
         PJLAENV is a ScaLAPACK envionmental inquiry function
         MYROW, MYCOL, NPROW and NPCOL can be determined by calling
         the subroutine BLACS_GRIDINFO.
     INFO    (global output) INTEGER
     = 0:  successful exit
     < 0:  If the i-th argument is an array and the j-entry had
     an illegal value, then INFO = -(i*100+j), if the i-th
     argument is a scalar and had an illegal value, then
     INFO = -i.
     Further Details
     ===============
     If UPLO = 'U', the matrix Q is represented as a product of
     elementary reflectors
     Q = H(n-1) . . . H(2) H(1).
     Each H(i) has the form
     H(i) = I - tau * v * v'
     where tau is a complex scalar, and v is a complex vector with
     v(i+1:n) = 0 and v(i) = 1; v(1:i-1) is stored on exit in
     A(ia:ia+i-2,ja+i), and tau in TAU(ja+i-1).
     If UPLO = 'L', the matrix Q is represented as a product of
     elementary reflectors
     Q = H(1) H(2) . . . H(n-1).
     Each H(i) has the form
     H(i) = I - tau * v * v'
     where tau is a complex scalar, and v is a complex vector with
     v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored on exit in
     A(ia+i+1:ia+n-1,ja+i-1), and tau in TAU(ja+i-1).
     The contents of sub( A ) on exit are illustrated by the following
     examples with n = 5:
     if UPLO = 'U':                       if UPLO = 'L':
     (  d   e   v2  v3  v4 )              (  d                  )
     (      d   e   v3  v4 )              (  e   d              )
     (          d   e   v4 )              (  v1  e   d          )
     (              d   e  )              (  v1  v2  e   d      )
     (                  d  )              (  v1  v2  v3  e   d  )
     where d and e denote diagonal and off-diagonal elements of T, and
     vi denotes an element of the vector defining H(i).
     Data storage requirements
     =========================
     PSSYTTRD is not intended to be called directly.  All users are
     encourage to call PSSYTRD which will then call PSHETTRD if
     appropriate.  A must be in cyclic format (i.e. MB = NB = 1),
     the process grid must be square ( i.e. NPROW = NPCOL ) and
     only lower triangular storage is supported.
     Local variables
     ===============
     PSSYTTRD uses five local arrays:
       WORK ( InV ) dimension ( NP, ANB+1): array V
       WORK ( InH ) dimension ( NP, ANB+1): array H
       WORK ( InVT ) dimension ( NQ, ANB+1): transpose of the array V
       WORK ( InHT ) dimension ( NQ, ANB+1): transpose of the array H
       WORK ( InVTT ) dimension ( NQ, 1): transpose of the array VT
     Arrays V and H are replicated across all processor columns.
     Arrays V^T and H^T are replicated across all processor rows.
         WORK ( InVT ), or V^T, is stored as a tall skinny
         array ( NQ x ANB-1 ) for efficiency.  Since only the lower
         triangular portion of A is updated, Av is computed as:
         tril(A) * v + v^T * tril(A,-1).  This is performed as
         two local triangular matrix-vector multiplications (both in
         MVR2) followed by a transpose and a sum across the columns.
         In the local computation, WORK( InVT ) is used to compute
         tril(A) * v and WORK( InV ) is used to compute
         v^T * tril(A,-1)
     The following variables are global indices into A:
       INDEX:  The current global row and column number.
       MAXINDEX:  The global row and column for the first row and
       column in the trailing block of A.
       LIIB, LIJB:  The first row, column in
     The following variables point into the arrays A, V, H, V^T, H^T:
       BINDEX  =INDEX-MININDEX: The column index in V, H, V^T, H^T.
       LII:  local index I:  The local row number for row INDEX
       LIJ:  local index J:  The local column number for column INDEX
       LIIP1:  local index I+1:  The local row number for row INDEX+1
       LIJP1:  local index J+1:  The local col number for col INDEX+1
       LTLI: lower triangular local index I:  The local row for the
         upper left entry in tril( A(INDEX, INDEX) )
       LTLIP1: lower triangular local index I+1:  The local row for the
         upper left entry in tril( A(INDEX+1, INDEX+1) )
         Details:  The distinction between LII and LTLI (and between
         LIIP1 and LTLIP1) is subtle.  Within the current processor
         column (i.e. MYCOL .eq. CURCOL) they are the same.  However,
         on some processors, A( LII, LIJ ) points to an element
         above the diagonal, on these processors, LTLI = LII+1.
     The following variables give the number of rows and/or columns
     in various matrices:
       NP:  The number of local rows in A( 1:N, 1:N )
       NQ:  The number of local columns in A( 1:N, 1:N )
       NPM0:  The number of local rows in A( INDEX:N, INDEX:N )
       NQM0:  The number of local columns in A( INDEX:N, INDEX:N )
       NPM1:  The number of local rows in A( INDEX+1:N, INDEX:N )
       NQM1:  The number of local columns in A( INDEX+1:N, INDEX:N )
       LTNM0:  The number of local rows & columns in
         tril( A( INDEX:N, INDEX:N ) )
       LTNM1:  The number of local rows & columns in
         tril( A( INDEX+1:N, INDEX+1:N ) )
         NOTE:  LTNM0 == LTNM1 on all processors except the diagonal
         processors, i.e. those where MYCOL == MYROW.
         Invariants:
           NP = NPM0 + LII - 1
           NQ = NQM0 + LIJ - 1
           NP = NPM1 + LIIP1 - 1
           NQ = NQM1 + LIJP1 - 1
           NP = LTLI + LTNM0 - 1
           NP = LTLIP1 + LTNM1 - 1
       Temporary variables.  The following variables are used within
       a few lines after they are set and do hold state from one loop
       iteration to the next:
     The matrix A:
       The matrix A does not hold the same values that it would
       in an unblocked code nor the values that it would hold in
       in a blocked code.
       The value of A is confusing.  It is easiest to state the
       difference between trueA and A at the point that MVR2 is called,
       so we will start there.
       Let trueA be the value that A would
       have at a given point in an unblocked code and A
       be the value that A has in this code at the same point.
       At the time of the call to MVR2,
       trueA = A + V' * H + H' * V
       where H = H( MAXINDEX:N, 1:BINDEX ) and
       V = V( MAXINDEX:N, 1:BINDEX ).
       At the bottom of the inner loop,
       trueA = A +  V' * H + H' * V + v' * h + h' * v
       where H = H( MAXINDEX:N, 1:BINDEX ) and
       V = V( MAXINDEX:N, 1:BINDEX ) and
       v = V( liip1:N, BINDEX+1 ) and
       h = H( liip1:N, BINDEX+1 )
       At the top of the loop, BINDEX gets incremented, hence:
       trueA = A +  V' * H + H' * V + v' * h + h' * v
       where H = H( MAXINDEX:N, 1:BINDEX-1 ) and
       V = V( MAXINDEX:N, 1:BINDEX-1 ) and
       v = V( liip1:N, BINDEX ) and
       h = H( liip1:N, BINDEX )
       A gets updated at the bottom of the outer loop
       After this update, trueA = A + v' * h + h' * v
       where v = V( liip1:N, BINDEX ) and
       h = H( liip1:N, BINDEX ) and BINDEX = 0
       Indeed, the previous loop invariant as stated above for the
       top of the loop still holds, but with BINDEX = 0, H and V
       are null matrices.
       After the current column of A is updated,
         trueA( INDEX, INDEX:N ) = A( INDEX, INDEX:N )
       the rest of A is untouched.
       After the current block column of A is updated,
       trueA = A + V' * H + H' * V
       where H = H( MAXINDEX:N, 1:BINDEX ) and
       V = V( MAXINDEX:N, 1:BINDEX )
       This brings us back to the point at which mvr2 is called.
     Details of the parallelization:
       We delay spreading v across to all processor columns (which
       would naturally happen at the bottom of the loop) in order to
       combine the spread of v( : , i-1 ) with the spread of h( : , i )
       In order to compute h( :, i ), we must update A( :, i )
       which means that the processor column owning A( :, i ) must
       have: c, tau, v( i, i ) and h( i, i ).
       The traditional
       way of computing v (and the one used in pzlatrd.f and
       zlatrd.f) is:
         v = tau * v
         c = v' * h
         alpha = - tau * c / 2
         v = v + alpha * h
       However, the traditional way of computing v requires that tau
       be broadcast to all processors in the current column (to compute
       v = tau * v) and then a sum-to-all is required (to
       compute v' * h ).  We use the following formula instead:
         c = v' * h
         v = tau * ( v - c * tau' * h / 2 )
       The above formula allows tau to be spread down in the
       same call to SGSUM2D which performs the sum-to-all of c.
       The computation of v, which could be performed in any processor
       column (or other procesor subsets), is performed in the
       processor column that owns A( :, i+1 ) so that A( :, i+1 )
       can be updated prior to spreading v across.
       We keep the block column of A up-to-date to minimize the
       work required in updating the current column of A.  Updating
       the block column of A is reasonably load balanced whereas
       updating the current column of A is not (only the current
       processor column is involved).
     In the following overview of the steps performed, M in the
     margin indicates message traffic and C indicates O(n^2 nb/sqrt(p))
     or more flops per processor.
     Inner loop:
       A( index:n, index ) -= ( v * ht(bindex) + h * vt( bindex) )
M      h = house( A(index:n, index) )
M      Spread v, h across
M      vt = v^T; ht = h^T
       A( index+1:n, index+1:maxindex ) -=
         ( v * ht(index+1:maxindex) + h *vt(index+1:maxindex) )
C      v = tril(A) * h; vt = ht * tril(A,-1)
MorC   v = v - H*V*h - V*H*h
M      v = v + vt^T
M      c = v' * h
       v = tau * ( v - c * tau' * h / 2 )
C    A = A - H*V - V*H
     =================================================================
     .. Parameters ..

 
Display dynamic version Find AutoScroll Reload FontSize: - + Hide Comments Hide Blanks Frame FullScreen MailPrint

 
001        SUBROUTINE PSSYTTRD( UPLO , N , A , IA , JA , DESCA , D , E , TAU , WORK ,
002       $LWORK , INFO )
003  
004  *     -- ScaLAPACK routine(version 1.7) --
005  *     University of Tennessee , Knoxville , Oak Ridge National Laboratory ,
006  *     and University of California , Berkeley.
007  *     October 15 , 1999
008  
009  *     .. Scalar Arguments ..
010        CHARACTER UPLO
011        INTEGER IA , INFO , JA , LWORK , N
012        INTEGER BLOCK_CYCLIC_2D , DLEN_ , DTYPE_ , CTXT_ , M_ , N_ ,
013       $MB_ , NB_ , RSRC_ , CSRC_ , LLD_
014        PARAMETER( BLOCK_CYCLIC_2D = 1 , DLEN_ = 9 , DTYPE_ = 1 ,
015       $CTXT_ = 2 , M_ = 3 , N_ = 4 , MB_ = 5 , NB_ = 6 ,
016       $RSRC_ = 7 , CSRC_ = 8 , LLD_ = 9 )
017        REAL ONE
018        PARAMETER( ONE = 1.0E0 )
019        REAL Z_ONE , Z_NEGONE , Z_ZERO
020        PARAMETER( Z_ONE = 1.0E0 , Z_NEGONE = - 1.0E0 ,
021       $Z_ZERO = 0.0E0 )
022        REAL ZERO
023        PARAMETER( ZERO = 0.0E + 0 )
024  *     ..
025  
026  *     .. Local Scalars ..
027  
028        LOGICAL BALANCED , INTERLEAVE , TWOGEMMS , UPPER
029        INTEGER ANB , BINDEX , CURCOL , CURROW , I , ICTXT , INDEX ,
030       $INDEXA , INDEXINH , INDEXINV , INH , INHB , INHT ,
031       $INHTB , INTMP , INV , INVB , INVT , INVTB , J , LDA ,
032       $LDV , LDZG , LII , LIIB , LIIP1 , LIJ , LIJB , LIJP1 ,
033       $LTLIP1 , LTNM1 , LWMIN , MAXINDEX , MININDEX ,
034       $MYCOL , MYFIRSTROW , MYROW , MYSETNUM , NBZG , NP ,
035       $NPB , NPCOL , NPM0 , NPM1 , NPROW , NPS , NPSET , NQ ,
036       $NQB , NQM1 , NUMROWS , NXTCOL , NXTROW , PBMAX ,
037       $PBMIN , PBSIZE , PNB , ROWSPERPROC
038        REAL ALPHA , BETA , C , NORM , ONEOVERBETA , SAFMAX ,
039       $SAFMIN , TOPH , TOPNV , TOPTAU , TOPV , TTOPH , TTOPV
040  *     ..
041  *     .. Local Arrays ..
042  
043        INTEGER IDUM1( 1 ) , IDUM2( 1 )
044        REAL CC( 3 ) , DTMP( 5 )
045  *     ..
046  *     .. External Subroutines ..
047        EXTERNAL BLACS_GRIDINFO , CHK1MAT , PCHK1MAT , PSTREECOMB ,
048       $PXERBLA , SCOMBNRM2 , SGEBR2D , SGEBS2D , SGEMM ,
049       $SGEMV , SGERV2D , SGESD2D , SGSUM2D , SLACPY ,
050       $SSCAL , STRMVT  
051  *     ..
052  *     .. External Functions ..
053  
054        LOGICAL LSAME
055        INTEGER ICEIL , NUMROC , PJLAENV
056        REAL PSLAMCH , SNRM2
057        EXTERNAL LSAME , ICEIL , NUMROC , PJLAENV , PSLAMCH , SNRM2
058  *     ..
059  *     .. Intrinsic Functions ..
060        INTRINSIC ICHAR , MAX , MIN , MOD , REAL , SIGN , SQRT
061  *     ..
062  
063  *     .. Executable Statements ..
064  *     This is just to keep ftnchek and toolpack / 1 happy
065        IF( BLOCK_CYCLIC_2D*CSRC_*CTXT_*DLEN_*DTYPE_*LLD_*MB_*M_*NB_*N_*
066       $    RSRC_.LT.0 )RETURN
067  
068  *         Further details
069  *         ===============
070  
071  *         At the top of the loop , v and nh have been computed but not
072  *         spread across. Hence , A is out - of - date even after the
073  *         rank 2k update. Furthermore , we compute the next v before
074  *         nh is spread across.
075  
076  *         I claim that if we used a sum - to - all on NV , by summing CC within
077  *         each column , that we could compute NV locally and could avoid
078  *         spreading V across. Bruce claims that sum - to - all can be made
079  *         to cost no more than sum - to - one on the Paragon. If that is
080  *         true , this would be a win. But ,
081  *         the BLACS sum - to - all is just a sum - to - one followed by a broadcast ,
082  *         and hence the present scheme is better for now.
083  
084  *         Get grid parameters
085  
086            ICTXT = DESCA( CTXT_ )
087            CALL BLACS_GRIDINFO( ICTXT , NPROW , NPCOL , MYROW , MYCOL )
088  
089            SAFMAX = SQRT( PSLAMCH( ICTXT , 'O' ) ) / N
090            SAFMIN = SQRT( PSLAMCH( ICTXT , 'S' ) )
091  
092  *         Test the input parameters
093  
094            INFO = 0
095            IF( NPROW.EQ. - 1 ) THEN
096                INFO = - ( 600 + CTXT_ )
097            ELSE
098  
099  *             Here we set execution options for PSSYTTRD
100  
101                PNB = PJLAENV( ICTXT , 2 , 'PSSYTTRD' , 'L' , 0 , 0 , 0 , 0 )
102                ANB = PJLAENV( ICTXT , 3 , 'PSSYTTRD' , 'L' , 0 , 0 , 0 , 0 )
103  
104                INTERLEAVE =( PJLAENV( ICTXT , 4 , 'PSSYTTRD' , 'L' , 1 , 0 , 0 ,
105       $        0 ).EQ.1 )
106                TWOGEMMS =( PJLAENV( ICTXT , 4 , 'PSSYTTRD' , 'L' , 2 , 0 , 0 ,
107       $        0 ).EQ.1 )
108                BALANCED =( PJLAENV( ICTXT , 4 , 'PSSYTTRD' , 'L' , 3 , 0 , 0 ,
109       $        0 ).EQ.1 )
110  
111                CALL CHK1MAT( N , 2 , N , 2 , IA , JA , DESCA , 6 , INFO )
112  
113                UPPER = LSAME( UPLO , 'U' )
114                IF( INFO.EQ.0 .AND. DESCA( NB_ ).NE.1 )
115       $            INFO = 600 + NB_
116                    IF( INFO.EQ.0 ) THEN
117  
118  *                     Here is the arithmetic :
119  *                     Let maxnpq = max( np , nq , 2 * ANB )
120  *                     LDV = 4 * max( np , nq ) + 2
121  *                     LWMIN = 2 * ( ANB + 1 ) * LDV + MAX( np , 2 * ANB )
122  *                     = 2 * ( ANB + 1 ) * ( 4 * NPS + 2 ) + NPS
123  
124  *                     This overestimates memory requirements when ANB > NP / 2
125  *                     Memory requirements are lower when interleave = .false.
126  *                     Hence , we could have two sets of memory requirements ,
127  *                     one for interleave and one for
128  
129                        NPS = MAX( NUMROC( N , 1 , 0 , 0 , NPROW ) , 2*ANB )
130                        LWMIN = 2*( ANB + 1 )*( 4*NPS + 2 ) + NPS
131  
132                        WORK( 1 ) = REAL( LWMIN )
133                        IF( .NOT.LSAME( UPLO , 'L' ) ) THEN
134                            INFO = - 1
135                        ELSE IF( IA.NE.1 ) THEN
136                            INFO = - 4
137                        ELSE IF( JA.NE.1 ) THEN
138                            INFO = - 5
139                        ELSE IF( NPROW.NE.NPCOL ) THEN
140                            INFO = - ( 600 + CTXT_ )
141                        ELSE IF( DESCA( DTYPE_ ).NE.1 ) THEN
142                            INFO = - ( 600 + DTYPE_ )
143                        ELSE IF( DESCA( MB_ ).NE.1 ) THEN
144                            INFO = - ( 600 + MB_ )
145                        ELSE IF( DESCA( NB_ ).NE.1 ) THEN
146                            INFO = - ( 600 + NB_ )
147                        ELSE IF( DESCA( RSRC_ ).NE.0 ) THEN
148                            INFO = - ( 600 + RSRC_ )
149                        ELSE IF( DESCA( CSRC_ ).NE.0 ) THEN
150                            INFO = - ( 600 + CSRC_ )
151                        ELSE IF( LWORK.LT.LWMIN ) THEN
152                            INFO = - 11
153                        END IF
154                    END IF
155                    IF( UPPER ) THEN
156                        IDUM1( 1 ) = ICHAR( 'U' )
157                    ELSE
158                        IDUM1( 1 ) = ICHAR( 'L' )
159                    END IF
160                    IDUM2( 1 ) = 1
161  
162                    CALL PCHK1MAT( N , 2 , N , 2 , IA , JA , DESCA , 6 , 1 , IDUM1 , IDUM2 ,
163       $            INFO )
164                END IF
165  
166                IF( INFO.NE.0 ) THEN
167                    CALL PXERBLA( ICTXT , 'PSSYTTRD' , - INFO )
168                    RETURN
169                END IF
170  
171  *             Quick return if possible
172  
173                IF( N.EQ.0 )
174       $            RETURN
175  
176  *                 Reduce the lower triangle of sub( A )
177                    NP = NUMROC( N , 1 , MYROW , 0 , NPROW )
178                    NQ = NUMROC( N , 1 , MYCOL , 0 , NPCOL )
179  
180                    NXTROW = 0
181                    NXTCOL = 0
182  
183                    LIIP1 = 1
184                    LIJP1 = 1
185                    NPM1 = NP
186                    NQM1 = NQ
187  
188                    LDA = DESCA( LLD_ )
189                    ICTXT = DESCA( CTXT_ )
190  
191  *                 Miscellaneous details :
192  *                 Put tau , D and E in the right places
193  *                 Check signs
194  *                 Place all the arrays in WORK , control their placement
195  *                 in memory.
196  
197  *                 Loop invariants
198  *                 A(LIIP1 , LIJ) points to the first element of A(I + 1 , J)
199  *                 NPM1 , NQM1 = the number of rows , cols in A( LII + 1 : N , LIJ + 1 : N )
200  *                 A(LII : N , LIJ : N) is one step out of date.
201  *                 proc( CURROW , CURCOL ) owns A(LII , LIJ)
202  *                 proc( NXTROW , CURCOL ) owns A(LIIP1 , LIJ)
203  
204                    INH = 1
205  
206                    IF( INTERLEAVE ) THEN
207  
208  *                     H and V are interleaved to minimize memory movement
209  *                     LDV has to be twice as large to accomodate interleaving.
210  *                     In addition , LDV is doubled again to allow v , h and
211  *                     toptau to be spreaad across and transposed in a
212  *                     single communication operation with minimum memory
213  *                     movement.
214  
215  *                     We could reduce LDV back to 2*MAX(NPM1 , NQM1)
216  *                     by increasing the memory movement required in
217  *                     the spread and transpose of v , h and toptau.
218  *                     However , since the non - interleaved path already
219  *                     provides a mear minimum memory requirement option ,
220  *                     we did not provide this additional path.
221  
222                        LDV = 4*( MAX( NPM1 , NQM1 ) ) + 2
223  
224                        INH = 1
225  
226                        INV = INH + LDV / 2
227                        INVT = INH + ( ANB + 1 )*LDV
228  
229                        INHT = INVT + LDV / 2
230                        INTMP = INVT + LDV*( ANB + 1 )
231  
232                    ELSE
233                        LDV = MAX( NPM1 , NQM1 )
234  
235                        INHT = INH + LDV*( ANB + 1 )
236                        INV = INHT + LDV*( ANB + 1 )
237  
238  *                     The code works without this + 1 , but only because of a
239  *                     coincidence. Without the + 1 , WORK(INVT) gets trashed , but
240  *                     WORK(INVT) is only used once and when it is used , it is
241  *                     multiplied by WORK( INH ) which is zero. Hence , the fact
242  *                     that WORK(INVT) is trashed has no effect.
243  
244                        INVT = INV + LDV*( ANB + 1 ) + 1
245                        INTMP = INVT + LDV*( 2*ANB )
246  
247                    END IF
248  
249                    IF( INFO.NE.0 ) THEN
250                        CALL PXERBLA( ICTXT , 'PSSYTTRD' , - INFO )
251                        WORK( 1 ) = REAL( LWMIN )
252                        RETURN
253                    END IF
254  
255  *                 The satisfies the loop invariant : trueA = A - V * HT - H * VT ,
256  *                 (where V , H , VT and HT all have BINDEX + 1 rows / columns)
257  *                 the first ANB times through the loop.
258  
259  *                 Setting either( InH and InHT ) or InV to Z_ZERO
260  *                 is adequate except in the face of NaNs.
261  
262                    DO 10 I = 1 , NP
263                        WORK( INH + I - 1 ) = Z_ZERO
264                        WORK( INV + I - 1 ) = Z_ZERO
265     10             CONTINUE
266                    DO 20 I = 1 , NQ
267                        WORK( INHT + I - 1 ) = Z_ZERO
268     20             CONTINUE
269  
270                    TOPNV = Z_ZERO
271  
272                    LTLIP1 = LIJP1
273                    LTNM1 = NPM1
274                    IF( MYCOL.GT.MYROW ) THEN
275                        LTLIP1 = LTLIP1 + 1
276                        LTNM1 = LTNM1 - 1
277                    END IF
278  
279                    DO 210 MININDEX = 1 , N - 1 , ANB
280  
281                        MAXINDEX = MIN( MININDEX + ANB - 1 , N )
282                        LIJB = NUMROC( MAXINDEX , 1 , MYCOL , 0 , NPCOL ) + 1
283                        LIIB = NUMROC( MAXINDEX , 1 , MYROW , 0 , NPROW ) + 1
284  
285                        NQB = NQ - LIJB + 1
286                        NPB = NP - LIIB + 1
287                        INHTB = INHT + LIJB - 1
288                        INVTB = INVT + LIJB - 1
289                        INHB = INH + LIIB - 1
290                        INVB = INV + LIIB - 1
291  
292                        DO 160 INDEX = MININDEX , MIN( MAXINDEX , N - 1 )
293  
294                            BINDEX = INDEX - MININDEX
295  
296                            CURROW = NXTROW
297                            CURCOL = NXTCOL
298  
299                            NXTROW = MOD( CURROW + 1 , NPROW )
300                            NXTCOL = MOD( CURCOL + 1 , NPCOL )
301  
302                            LII = LIIP1
303                            LIJ = LIJP1
304                            NPM0 = NPM1
305  
306                            IF( MYROW.EQ.CURROW ) THEN
307                                NPM1 = NPM1 - 1
308                                LIIP1 = LIIP1 + 1
309                            END IF
310                            IF( MYCOL.EQ.CURCOL ) THEN
311                                NQM1 = NQM1 - 1
312                                LIJP1 = LIJP1 + 1
313                                LTLIP1 = LTLIP1 + 1
314                                LTNM1 = LTNM1 - 1
315                            END IF
316  
317  *                         V = NV , VT = NVT , H = NH , HT = NHT
318  
319  *                         Update the current column of A
320  
321                            IF( MYCOL.EQ.CURCOL ) THEN
322  
323                                INDEXA = LII + ( LIJ - 1 )*LDA
324                                INDEXINV = INV + LII - 1 + ( BINDEX - 1 )*LDV
325                                INDEXINH = INH + LII - 1 + ( BINDEX - 1 )*LDV
326                                TTOPH = WORK( INHT + LIJ - 1 + BINDEX*LDV )
327                                TTOPV = TOPNV
328  
329                                IF( INDEX.GT.1 ) THEN
330                                    DO 30 I = 0 , NPM0 - 1
331  *                                     A( INDEXA + I ) = A( INDEXA + I )
332                                        A( INDEXA + I ) = A( INDEXA + I ) -
333       $                                WORK( INDEXINV + LDV + I )*TTOPH -
334       $                                WORK( INDEXINH + LDV + I )*TTOPV
335     30                             CONTINUE
336                                END IF
337  
338                            END IF
339  
340                            IF( MYCOL.EQ.CURCOL ) THEN
341  
342  *                             Compute the householder vector
343  
344                                IF( MYROW.EQ.CURROW ) THEN
345                                    DTMP( 2 ) = A( LII + ( LIJ - 1 )*LDA )
346                                ELSE
347                                    DTMP( 2 ) = ZERO
348                                END IF
349                                IF( MYROW.EQ.NXTROW ) THEN
350                                    DTMP( 3 ) = A( LIIP1 + ( LIJ - 1 )*LDA )
351                                    DTMP( 4 ) = ZERO
352                                ELSE
353                                    DTMP( 3 ) = ZERO
354                                    DTMP( 4 ) = ZERO
355                                END IF
356  
357                                NORM = SNRM2( NPM1 , A( LIIP1 + ( LIJ - 1 )*LDA ) , 1 )
358                                DTMP( 1 ) = NORM
359  
360  *                             IF DTMP(5) = 1.0 , NORM is too large and might cause
361  *                             overflow , hence PSTREECOMB must be called. IF DTMP(5)
362  *                             is zero on output , DTMP(1) can be trusted.
363  
364                                DTMP( 5 ) = ZERO
365                                IF( DTMP( 1 ).GE.SAFMAX .OR. DTMP( 1 ).LT.SAFMIN ) THEN
366                                    DTMP( 5 ) = ONE
367                                    DTMP( 1 ) = ZERO
368                                END IF
369  
370                                DTMP( 1 ) = DTMP( 1 )*DTMP( 1 )
371                                CALL SGSUM2D( ICTXT , 'C' , ' ' , 5 , 1 , DTMP , 5 , - 1 ,
372       $                        CURCOL )
373                                IF( DTMP( 5 ).EQ.ZERO ) THEN
374                                    DTMP( 1 ) = SQRT( DTMP( 1 ) )
375                                ELSE
376                                    DTMP( 1 ) = NORM
377                                    CALL PSTREECOMB( ICTXT , 'C' , 1 , DTMP , - 1 , MYCOL ,
378       $                            SCOMBNRM2 )
379                                END IF
380  
381                                NORM = DTMP( 1 )
382  
383                                D( LIJ ) = DTMP( 2 )
384                                IF( MYROW.EQ.CURROW .AND. MYCOL.EQ.CURCOL ) THEN
385                                    A( LII + ( LIJ - 1 )*LDA ) = D( LIJ )
386                                END IF
387  
388                                ALPHA = DTMP( 3 )
389  
390                                NORM = SIGN( NORM , ALPHA )
391  
392                                IF( NORM.EQ.ZERO ) THEN
393                                    TOPTAU = ZERO
394                                ELSE
395                                    BETA = NORM + ALPHA
396                                    TOPTAU = BETA / NORM
397                                    ONEOVERBETA = 1.0E0 / BETA
398  
399                                    CALL SSCAL( NPM1 , ONEOVERBETA ,
400       $                            A( LIIP1 + ( LIJ - 1 )*LDA ) , 1 )
401                                END IF
402  
403                                IF( MYROW.EQ.NXTROW ) THEN
404                                    A( LIIP1 + ( LIJ - 1 )*LDA ) = Z_ONE
405                                END IF
406  
407                                TAU( LIJ ) = TOPTAU
408                                E( LIJ ) = - NORM
409  
410                            END IF
411  
412  *                         Spread v , nh , toptau across
413  
414                            DO 40 I = 0 , NPM1 - 1
415                                WORK( INV + LIIP1 - 1 + BINDEX*LDV + NPM1 + I ) = A( LIIP1 + I +
416       $( LIJ - 1 )*LDA )
417     40                     CONTINUE
418  
419                            IF( MYCOL.EQ.CURCOL ) THEN
420                                WORK( INV + LIIP1 - 1 + BINDEX*LDV + NPM1 + NPM1 ) = TOPTAU
421                                CALL SGEBS2D( ICTXT , 'R' , ' ' , NPM1 + NPM1 + 1 , 1 ,
422       $                        WORK( INV + LIIP1 - 1 + BINDEX*LDV ) ,
423       $                        NPM1 + NPM1 + 1 )
424                            ELSE
425                                CALL SGEBR2D( ICTXT , 'R' , ' ' , NPM1 + NPM1 + 1 , 1 ,
426       $                        WORK( INV + LIIP1 - 1 + BINDEX*LDV ) ,
427       $                        NPM1 + NPM1 + 1 , MYROW , CURCOL )
428                                TOPTAU = WORK( INV + LIIP1 - 1 + BINDEX*LDV + NPM1 + NPM1 )
429                            END IF
430                            DO 50 I = 0 , NPM1 - 1
431                                WORK( INH + LIIP1 - 1 + ( BINDEX + 1 )*LDV + I ) = WORK( INV + LIIP1 -
432       $                        1 + BINDEX*LDV + NPM1 + I )
433     50                     CONTINUE
434  
435                            IF( INDEX.LT.N ) THEN
436                                IF( MYROW.EQ.NXTROW .AND. MYCOL.EQ.CURCOL )
437       $                            A( LIIP1 + ( LIJ - 1 )*LDA ) = E( LIJ )
438                                END IF
439  
440  *                             Transpose v , nh
441  
442                                IF( MYROW.EQ.MYCOL ) THEN
443                                    DO 60 I = 0 , NPM1 + NPM1
444                                        WORK( INVT + LIJP1 - 1 + BINDEX*LDV + I ) = WORK( INV + LIIP1 - 1 +
445       $                                BINDEX*LDV + I )
446     60                 CONTINUE
447                    ELSE
448                        CALL SGESD2D( ICTXT , NPM1 + NPM1 , 1 ,
449       $                WORK( INV + LIIP1 - 1 + BINDEX*LDV ) , NPM1 + NPM1 ,
450       $                MYCOL , MYROW )
451                        CALL SGERV2D( ICTXT , NQM1 + NQM1 , 1 ,
452       $                WORK( INVT + LIJP1 - 1 + BINDEX*LDV ) , NQM1 + NQM1 ,
453       $                MYCOL , MYROW )
454                    END IF
455  
456                    DO 70 I = 0 , NQM1 - 1
457                        WORK( INHT + LIJP1 - 1 + ( BINDEX + 1 )*LDV + I ) = WORK( INVT +
458       $                LIJP1 - 1 + BINDEX*LDV + NQM1 + I )
459     70             CONTINUE
460  
461  *                 Update the current block column of A
462  
463                    IF( INDEX.GT.1 ) THEN
464                        DO 90 J = LIJP1 , LIJB - 1
465                            DO 80 I = 0 , NPM1 - 1
466  
467                                A( LIIP1 + I + ( J - 1 )*LDA ) = A( LIIP1 + I + ( J - 1 )*LDA )
468       $                        - WORK( INV + LIIP1 - 1 + BINDEX*LDV + I )*
469       $                        WORK( INHT + J - 1 + BINDEX*LDV ) -
470       $                        WORK( INH + LIIP1 - 1 + BINDEX*LDV + I )*
471       $                        WORK( INVT + J - 1 + BINDEX*LDV )
472     80                     CONTINUE
473     90                 CONTINUE
474                    END IF
475  
476  *                 Compute NV = A * NHT ; NVT = A * NH
477  
478  *                 These two lines are necessary because these elements
479  *                 are not always involved in the calls to STRMVT
480  *                 for two reasons :
481  *                 1) On diagonal processors , the call to TRMVT
482  *                 involves only LTNM1 - 1 elements
483  *                 2) On some processes , NQM1 < LTM1 or LIIP1 < LTLIP1
484  *                 and when the results are combined across all processes ,
485  *                 uninitialized values may be included.
486                    WORK( INV + LIIP1 - 1 + ( BINDEX + 1 )*LDV ) = Z_ZERO
487                    WORK( INVT + LIJP1 - 1 + ( BINDEX + 1 )*LDV + NQM1 - 1 ) = Z_ZERO
488  
489                    IF( MYROW.EQ.MYCOL ) THEN
490                        IF( LTNM1.GT.1 ) THEN
491                            CALL STRMVT ( 'L' , LTNM1 - 1 ,
492       $                    A( LTLIP1 + 1 + ( LIJP1 - 1 )*LDA ) , LDA ,
493       $                    WORK( INVT + LIJP1 - 1 + ( BINDEX + 1 )*LDV ) , 1 ,
494       $                    WORK( INH + LTLIP1 + 1 - 1 + ( BINDEX + 1 )*LDV ) ,
495       $                    1 , WORK( INV + LTLIP1 + 1 - 1 + ( BINDEX + 1 )*
496       $                    LDV ) , 1 , WORK( INHT + LIJP1 - 1 + ( BINDEX +
497       $                    1 )*LDV ) , 1 )
498                        END IF
499                        DO 100 I = 1 , LTNM1
500                            WORK( INVT + LIJP1 + I - 1 - 1 + ( BINDEX + 1 )*LDV )
501       $                    = WORK( INVT + LIJP1 + I - 1 - 1 + ( BINDEX + 1 )*LDV ) +
502       $                    A( LTLIP1 + I - 1 + ( LIJP1 + I - 1 - 1 )*LDA )*
503       $                    WORK( INH + LTLIP1 + I - 1 - 1 + ( BINDEX + 1 )*LDV )
504    100                 CONTINUE
505                    ELSE
506                        IF( LTNM1.GT.0 )
507       $                    CALL STRMVT ( 'L' , LTNM1 , A( LTLIP1 + ( LIJP1 - 1 )*LDA ) ,
508       $                    LDA , WORK( INVT + LIJP1 - 1 + ( BINDEX + 1 )*
509       $                    LDV ) , 1 , WORK( INH + LTLIP1 - 1 + ( BINDEX +
510       $                    1 )*LDV ) , 1 , WORK( INV + LTLIP1 - 1 +
511       $( BINDEX + 1 )*LDV ) , 1 ,
512       $                    WORK( INHT + LIJP1 - 1 + ( BINDEX + 1 )*LDV ) ,
513       $                    1 )
514  
515                        END IF
516  
517  *                     We take advantage of the fact that :
518  *                     A * sum( B ) = sum( A * B ) for matrices A , B
519  
520  *                     trueA = A + V * HT + H * VT
521  *                     hence :(trueA)v = Av' + V * HT * v + H * VT * v
522  *                     VT * v = sum_p_in_NPROW( VTp * v )
523  *                     H * VT * v = H * sum(VTp * v) = sum( H * VTp * v )
524  
525  *                     v = v + V * HT * h + H * VT * h
526  
527  *                     tmp = HT * nh1
528                        DO 110 I = 1 , 2*( BINDEX + 1 )
529                            WORK( INTMP - 1 + I ) = 0
530    110                 CONTINUE
531  
532                        IF( BALANCED ) THEN
533                            NPSET = NPROW
534                            MYSETNUM = MYROW
535                            ROWSPERPROC = ICEIL( NQB , NPSET )
536                            MYFIRSTROW = MIN( NQB + 1 , 1 + ROWSPERPROC*MYSETNUM )
537                            NUMROWS = MIN( ROWSPERPROC , NQB - MYFIRSTROW + 1 )
538  
539  *                         tmp = HT * v
540  
541                            CALL SGEMV( 'C' , NUMROWS , BINDEX + 1 , Z_ONE ,
542       $                    WORK( INHTB + MYFIRSTROW - 1 ) , LDV ,
543       $                    WORK( INHTB + MYFIRSTROW - 1 + ( BINDEX + 1 )*LDV ) ,
544       $                    1 , Z_ZERO , WORK( INTMP ) , 1 )
545  *                         tmp2 = VT * v
546                            CALL SGEMV( 'C' , NUMROWS , BINDEX + 1 , Z_ONE ,
547       $                    WORK( INVTB + MYFIRSTROW - 1 ) , LDV ,
548       $                    WORK( INHTB + MYFIRSTROW - 1 + ( BINDEX + 1 )*LDV ) ,
549       $                    1 , Z_ZERO , WORK( INTMP + BINDEX + 1 ) , 1 )
550  
551                            CALL SGSUM2D( ICTXT , 'C' , ' ' , 2*( BINDEX + 1 ) , 1 ,
552       $                    WORK( INTMP ) , 2*( BINDEX + 1 ) , - 1 , - 1 )
553                        ELSE
554  *                         tmp = HT * v
555  
556                            CALL SGEMV( 'C' , NQB , BINDEX + 1 , Z_ONE , WORK( INHTB ) ,
557       $                    LDV , WORK( INHTB + ( BINDEX + 1 )*LDV ) , 1 ,
558       $                    Z_ZERO , WORK( INTMP ) , 1 )
559  *                         tmp2 = VT * v
560                            CALL SGEMV( 'C' , NQB , BINDEX + 1 , Z_ONE , WORK( INVTB ) ,
561       $                    LDV , WORK( INHTB + ( BINDEX + 1 )*LDV ) , 1 ,
562       $                    Z_ZERO , WORK( INTMP + BINDEX + 1 ) , 1 )
563  
564                        END IF
565  
566                        IF( BALANCED ) THEN
567                            MYSETNUM = MYCOL
568  
569                            ROWSPERPROC = ICEIL( NPB , NPSET )
570                            MYFIRSTROW = MIN( NPB + 1 , 1 + ROWSPERPROC*MYSETNUM )
571                            NUMROWS = MIN( ROWSPERPROC , NPB - MYFIRSTROW + 1 )
572  
573                            CALL SGSUM2D( ICTXT , 'R' , ' ' , 2*( BINDEX + 1 ) , 1 ,
574       $                    WORK( INTMP ) , 2*( BINDEX + 1 ) , - 1 , - 1 )
575  
576  *                         v = v + V * tmp
577                            IF( INDEX.GT.1. ) THEN
578                                CALL SGEMV( 'N' , NUMROWS , BINDEX + 1 , Z_NEGONE ,
579       $                        WORK( INVB + MYFIRSTROW - 1 ) , LDV ,
580       $                        WORK( INTMP ) , 1 , Z_ONE ,
581       $                        WORK( INVB + MYFIRSTROW - 1 + ( BINDEX + 1 )*
582       $                        LDV ) , 1 )
583  
584  *                             v = v + H * tmp2
585                                CALL SGEMV( 'N' , NUMROWS , BINDEX + 1 , Z_NEGONE ,
586       $                        WORK( INHB + MYFIRSTROW - 1 ) , LDV ,
587       $                        WORK( INTMP + BINDEX + 1 ) , 1 , Z_ONE ,
588       $                        WORK( INVB + MYFIRSTROW - 1 + ( BINDEX + 1 )*
589       $                        LDV ) , 1 )
590                            END IF
591  
592                        ELSE
593  *                         v = v + V * tmp
594                            CALL SGEMV( 'N' , NPB , BINDEX + 1 , Z_NEGONE , WORK( INVB ) ,
595       $                    LDV , WORK( INTMP ) , 1 , Z_ONE ,
596       $                    WORK( INVB + ( BINDEX + 1 )*LDV ) , 1 )
597  
598  *                         v = v + H * tmp2
599                            CALL SGEMV( 'N' , NPB , BINDEX + 1 , Z_NEGONE , WORK( INHB ) ,
600       $                    LDV , WORK( INTMP + BINDEX + 1 ) , 1 , Z_ONE ,
601       $                    WORK( INVB + ( BINDEX + 1 )*LDV ) , 1 )
602  
603                        END IF
604  
605  *                     Transpose NV and add it back into NVT
606  
607                        IF( MYROW.EQ.MYCOL ) THEN
608                            DO 120 I = 0 , NQM1 - 1
609                                WORK( INTMP + I ) = WORK( INVT + LIJP1 - 1 + ( BINDEX + 1 )*LDV +
610       $                        I )
611    120                     CONTINUE
612                        ELSE
613                            CALL SGESD2D( ICTXT , NQM1 , 1 ,
614       $                    WORK( INVT + LIJP1 - 1 + ( BINDEX + 1 )*LDV ) ,
615       $                    NQM1 , MYCOL , MYROW )
616                            CALL SGERV2D( ICTXT , NPM1 , 1 , WORK( INTMP ) , NPM1 , MYCOL ,
617       $                    MYROW )
618  
619                        END IF
620                        DO 130 I = 0 , NPM1 - 1
621                            WORK( INV + LIIP1 - 1 + ( BINDEX + 1 )*LDV + I ) = WORK( INV + LIIP1 -
622       $                    1 + ( BINDEX + 1 )*LDV + I ) + WORK( INTMP + I )
623    130                 CONTINUE
624  
625  *                     Sum - to - one NV rowwise(within a row)
626  
627                        CALL SGSUM2D( ICTXT , 'R' , ' ' , NPM1 , 1 ,
628       $                WORK( INV + LIIP1 - 1 + ( BINDEX + 1 )*LDV ) , NPM1 ,
629       $                MYROW , NXTCOL )
630  
631  *                     Dot product c = NV * NH
632  *                     Sum - to - all c within next processor column
633  
634                        IF( MYCOL.EQ.NXTCOL ) THEN
635                            CC( 1 ) = Z_ZERO
636                            DO 140 I = 0 , NPM1 - 1
637                                CC( 1 ) = CC( 1 ) + WORK( INV + LIIP1 - 1 + ( BINDEX + 1 )*
638       $                        LDV + I )*WORK( INH + LIIP1 - 1 + ( BINDEX + 1 )*LDV +
639       $                        I )
640    140                     CONTINUE
641                            IF( MYROW.EQ.NXTROW ) THEN
642                                CC( 2 ) = WORK( INV + LIIP1 - 1 + ( BINDEX + 1 )*LDV )
643                                CC( 3 ) = WORK( INH + LIIP1 - 1 + ( BINDEX + 1 )*LDV )
644                            ELSE
645                                CC( 2 ) = Z_ZERO
646                                CC( 3 ) = Z_ZERO
647                            END IF
648                            CALL SGSUM2D( ICTXT , 'C' , ' ' , 3 , 1 , CC , 3 , - 1 , NXTCOL )
649  
650                            TOPV = CC( 2 )
651                            C = CC( 1 )
652                            TOPH = CC( 3 )
653  
654                            TOPNV = TOPTAU*( TOPV - C*TOPTAU / 2*TOPH )
655  
656  *                         Compute V = Tau * (V - C * Tau' / 2 * H )
657  
658                            DO 150 I = 0 , NPM1 - 1
659                                WORK( INV + LIIP1 - 1 + ( BINDEX + 1 )*LDV + I ) = TOPTAU*
660       $( WORK( INV + LIIP1 - 1 + ( BINDEX + 1 )*LDV + I ) - C*TOPTAU /
661       $                        2*WORK( INH + LIIP1 - 1 + ( BINDEX + 1 )*LDV + I ) )
662    150                     CONTINUE
663  
664                        END IF
665  
666    160                 CONTINUE
667  
668  *                     Perform the rank2k update
669  
670                        IF( MAXINDEX.LT.N ) THEN
671  
672                            DO 170 I = 0 , NPM1 - 1
673                                WORK( INTMP + I ) = WORK( INH + LIIP1 - 1 + ANB*LDV + I )
674    170                     CONTINUE
675  
676                            IF( .NOT.TWOGEMMS ) THEN
677                                IF( INTERLEAVE ) THEN
678                                    LDZG = LDV / 2
679                                ELSE
680                                    CALL SLACPY( 'A' , LTNM1 , ANB , WORK( INHT + LIJP1 - 1 ) ,
681       $                            LDV , WORK( INVT + LIJP1 - 1 + ANB*LDV ) , LDV )
682  
683                                    CALL SLACPY( 'A' , LTNM1 , ANB , WORK( INV + LTLIP1 - 1 ) ,
684       $                            LDV , WORK( INH + LTLIP1 - 1 + ANB*LDV ) , LDV )
685                                    LDZG = LDV
686                                END IF
687                                NBZG = ANB*2
688                            ELSE
689                                LDZG = LDV
690                                NBZG = ANB
691                            END IF
692  
693                            DO 180 PBMIN = 1 , LTNM1 , PNB
694  
695                                PBSIZE = MIN( PNB , LTNM1 - PBMIN + 1 )
696                                PBMAX = MIN( LTNM1 , PBMIN + PNB - 1 )
697                                CALL SGEMM( 'N' , 'C' , PBSIZE , PBMAX , NBZG , Z_NEGONE ,
698       $                        WORK( INH + LTLIP1 - 1 + PBMIN - 1 ) , LDZG ,
699       $                        WORK( INVT + LIJP1 - 1 ) , LDZG , Z_ONE ,
700       $                        A( LTLIP1 + PBMIN - 1 + ( LIJP1 - 1 )*LDA ) , LDA )
701                                IF( TWOGEMMS ) THEN
702                                    CALL SGEMM( 'N' , 'C' , PBSIZE , PBMAX , ANB , Z_NEGONE ,
703       $                            WORK( INV + LTLIP1 - 1 + PBMIN - 1 ) , LDZG ,
704       $                            WORK( INHT + LIJP1 - 1 ) , LDZG , Z_ONE ,
705       $                            A( LTLIP1 + PBMIN - 1 + ( LIJP1 - 1 )*LDA ) , LDA )
706                                END IF
707    180                     CONTINUE
708  
709                            DO 190 I = 0 , NPM1 - 1
710                                WORK( INV + LIIP1 - 1 + I ) = WORK( INV + LIIP1 - 1 + ANB*LDV + I )
711                                WORK( INH + LIIP1 - 1 + I ) = WORK( INTMP + I )
712    190                     CONTINUE
713                            DO 200 I = 0 , NQM1 - 1
714                                WORK( INHT + LIJP1 - 1 + I ) = WORK( INHT + LIJP1 - 1 + ANB*LDV + I )
715    200                     CONTINUE
716  
717                        END IF
718  
719  *                     End of the update A code
720  
721    210             CONTINUE
722  
723                    IF( MYCOL.EQ.NXTCOL ) THEN
724                        IF( MYROW.EQ.NXTROW ) THEN
725  
726                            D( NQ ) = A( NP + ( NQ - 1 )*LDA )
727  
728                            CALL SGEBS2D( ICTXT , 'C' , ' ' , 1 , 1 , D( NQ ) , 1 )
729                        ELSE
730                            CALL SGEBR2D( ICTXT , 'C' , ' ' , 1 , 1 , D( NQ ) , 1 , NXTROW ,
731       $                    NXTCOL )
732                        END IF
733                    END IF
734  
735                    WORK( 1 ) = REAL( LWMIN )
736                    RETURN
737  
738  *                 End of PSSYTTRD
739  
740                END