Routine: PCHETTRD()  File: SRC\pchettrd.f

 
 
# lines: 1204
  # code: 1204
  # comment: 0
  # blank:0
# Variables:113
# Callers:1
# Callings:3
# Words:467
# Keywords:297
 

 

..
     .. Array Arguments ..
     ..
     Purpose
     =======
     PCHETTRD reduces a complex Hermitian matrix sub( A ) to Hermitian
     tridiagonal form T by an unitary similarity transformation:
     Q' * sub( A ) * Q = T, where sub( A ) = A(IA:IA+N-1,JA:JA+N-1).
     Notes
     =====
     Each global data object is described by an associated description
     vector.  This vector stores the information required to establish
     the mapping between an object element and its corresponding
     process and memory location.
     Let A be a generic term for any 2D block cyclicly distributed
     array.
     Such a global array has an associated description vector DESCA.
     In the following comments, the character _ should be read as
     "of the global array".
     NOTATION        STORED IN      EXPLANATION
     --------------- -------------- -----------------------------------
     DTYPE_A(global) DESCA( DTYPE_ )The descriptor type.  In this case,
     DTYPE_A = 1.
     CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle,
     indicating the BLACS process grid A is distribu-
     ted over. The context itself is glo-
     bal, but the handle (the integer
     value) may vary.
     M_A    (global) DESCA( M_ )    The number of rows in the global
     array A.
     N_A    (global) DESCA( N_ )    The number of columns in the global
     array A.
     MB_A   (global) DESCA( MB_ )   The blocking factor used to
     distribute the rows of the array.
     NB_A   (global) DESCA( NB_ )   The blocking factor used to
     distribute the columns of the array.
     RSRC_A (global) DESCA( RSRC_ ) The process row over which the
     first row of the array A is distributed.
     CSRC_A (global) DESCA( CSRC_ ) The process column over which the
     first column of the array A is
     distributed.
     LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
     array.  LLD_A >= MAX(1,LOCp(M_A)).
     Let K be the number of rows or columns of a distributed matrix,
     and assume that its process grid has dimension p x q.
     LOCp( K ) denotes the number of elements of K that a process
     would receive if K were distributed over the p processes of its
     process column.
     Similarly, LOCq( K ) denotes the number of elements of K that a
     process would receive if K were distributed over the q processes
     of its process row.
     The values of LOCp() and LOCq() may be determined via a call to
     the ScaLAPACK tool function, NUMROC:
     LOCp( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
     LOCq( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
     An upper bound for these quantities may be computed by:
     LOCp( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
     LOCq( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
     Arguments
     =========
     UPLO    (global input) CHARACTER
     Specifies whether the upper or lower triangular part of the
     Hermitian matrix sub( A ) is stored:
     = 'U':  Upper triangular
     = 'L':  Lower triangular
     N       (global input) INTEGER
     The number of rows and columns to be operated on, i.e. the
     order of the distributed submatrix sub( A ). N >= 0.
     A       (local input/local output) COMPLEX pointer into the
     local memory to an array of dimension (LLD_A,LOCq(JA+N-1)).
     On entry, this array contains the local pieces of the
     Hermitian distributed matrix sub( A ).  If UPLO = 'U', the
     leading N-by-N upper triangular part of sub( A ) contains
     the upper triangular part of the matrix, and its strictly
     lower triangular part is not referenced. If UPLO = 'L', the
     leading N-by-N lower triangular part of sub( A ) contains the
     lower triangular part of the matrix, and its strictly upper
     triangular part is not referenced. On exit, if UPLO = 'U',
     the diagonal and first superdiagonal of sub( A ) are over-
     written by the corresponding elements of the tridiagonal
     matrix T, and the elements above the first superdiagonal,
     with the array TAU, represent the unitary matrix Q as a
     product of elementary reflectors; if UPLO = 'L', the diagonal
     and first subdiagonal of sub( A ) are overwritten by the
     corresponding elements of the tridiagonal matrix T, and the
     elements below the first subdiagonal, with the array TAU,
     represent the unitary matrix Q as a product of elementary
     reflectors. See Further Details.
     IA      (global input) INTEGER
     The row index in the global array A indicating the first
     row of sub( A ).
     JA      (global input) INTEGER
     The column index in the global array A indicating the
     first column of sub( A ).
     DESCA   (global and local input) INTEGER array of dimension DLEN_.
     The array descriptor for the distributed matrix A.
     D       (local output) REAL array, dim LOCq(JA+N-1)
     The diagonal elements of the tridiagonal matrix T:
     D(i) = A(i,i). D is tied to the distributed matrix A.
     E       (local output) REAL array, dim LOCq(JA+N-1)
     if UPLO = 'U', LOCq(JA+N-2) otherwise. The off-diagonal
     elements of the tridiagonal matrix T: E(i) = A(i,i+1) if
     UPLO = 'U', E(i) = A(i+1,i) if UPLO = 'L'. E is tied to the
     distributed matrix A.
     TAU     (local output) COMPLEX, array, dimension
     LOCq(JA+N-1). This array contains the scalar factors TAU of
     the elementary reflectors. TAU is tied to the distributed
     matrix A.
     WORK    (local workspace) COMPLEX array, dimension (LWORK)
     On exit, WORK( 1 ) returns the minimal and optimal workspace
     LWORK   (local input) INTEGER
     The dimension of the array WORK.
     LWORK >= 2*( ANB+1 )*( 4*NPS+2 ) + NPS
     Where:
         NPS = MAX( NUMROC( N, 1, 0, 0, NPROW ), 2*ANB )
         ANB = PJLAENV( DESCA( CTXT_ ), 3, 'PCHETTRD', 'L', 0, 0,
           0, 0 )
         NUMROC is a ScaLAPACK tool function;
         PJLAENV is a ScaLAPACK envionmental inquiry function
         MYROW, MYCOL, NPROW and NPCOL can be determined by calling
         the subroutine BLACS_GRIDINFO.
     INFO    (global output) INTEGER
     = 0:  successful exit
     < 0:  If the i-th argument is an array and the j-entry had
     an illegal value, then INFO = -(i*100+j), if the i-th
     argument is a scalar and had an illegal value, then
     INFO = -i.
     Further Details
     ===============
     If UPLO = 'U', the matrix Q is represented as a product of
     elementary reflectors
     Q = H(n-1) . . . H(2) H(1).
     Each H(i) has the form
     H(i) = I - tau * v * v'
     where tau is a complex scalar, and v is a complex vector with
     v(i+1:n) = 0 and v(i) = 1; v(1:i-1) is stored on exit in
     A(ia:ia+i-2,ja+i), and tau in TAU(ja+i-1).
     If UPLO = 'L', the matrix Q is represented as a product of
     elementary reflectors
     Q = H(1) H(2) . . . H(n-1).
     Each H(i) has the form
     H(i) = I - tau * v * v'
     where tau is a complex scalar, and v is a complex vector with
     v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored on exit in
     A(ia+i+1:ia+n-1,ja+i-1), and tau in TAU(ja+i-1).
     The contents of sub( A ) on exit are illustrated by the following
     examples with n = 5:
     if UPLO = 'U':                       if UPLO = 'L':
     (  d   e   v2  v3  v4 )              (  d                  )
     (      d   e   v3  v4 )              (  e   d              )
     (          d   e   v4 )              (  v1  e   d          )
     (              d   e  )              (  v1  v2  e   d      )
     (                  d  )              (  v1  v2  v3  e   d  )
     where d and e denote diagonal and off-diagonal elements of T, and
     vi denotes an element of the vector defining H(i).
     Data storage requirements
     =========================
     PCHETTRD is not intended to be called directly.  All users are
     encourage to call PCHETRD which will then call PCHETTRD if
     appropriate.  A must be in cyclic format (i.e. MB = NB = 1),
     the process grid must be square ( i.e. NPROW = NPCOL ) and
     only lower triangular storage is supported.
     Local variables
     ===============
     PCHETTRD uses five local arrays:
       WORK ( InV ) dimension ( NP, ANB+1): array V
       WORK ( InH ) dimension ( NP, ANB+1): array H
       WORK ( InVT ) dimension ( NQ, ANB+1): transpose of the array V
       WORK ( InHT ) dimension ( NQ, ANB+1): transpose of the array H
       WORK ( InVTT ) dimension ( NQ, 1): transpose of the array VT
     Arrays V and H are replicated across all processor columns.
     Arrays V^T and H^T are replicated across all processor rows.
         WORK ( InVT ), or V^T, is stored as a tall skinny
         array ( NQ x ANB-1 ) for efficiency.  Since only the lower
         triangular portion of A is updated, Av is computed as:
         tril(A) * v + v^T * tril(A,-1).  This is performed as
         two local triangular matrix-vector multiplications (both in
         MVR2) followed by a transpose and a sum across the columns.
         In the local computation, WORK( InVT ) is used to compute
         tril(A) * v and WORK( InV ) is used to compute
         v^T * tril(A,-1)
     The following variables are global indices into A:
       INDEX:  The current global row and column number.
       MAXINDEX:  The global row and column for the first row and
       column in the trailing block of A.
       LIIB, LIJB:  The first row, column in
     The following variables point into the arrays A, V, H, V^T, H^T:
       BINDEX  =INDEX-MININDEX: The column index in V, H, V^T, H^T.
       LII:  local index I:  The local row number for row INDEX
       LIJ:  local index J:  The local column number for column INDEX
       LIIP1:  local index I+1:  The local row number for row INDEX+1
       LIJP1:  local index J+1:  The local col number for col INDEX+1
       LTLI: lower triangular local index I:  The local row for the
         upper left entry in tril( A(INDEX, INDEX) )
       LTLIP1: lower triangular local index I+1:  The local row for the
         upper left entry in tril( A(INDEX+1, INDEX+1) )
         Details:  The distinction between LII and LTLI (and between
         LIIP1 and LTLIP1) is subtle.  Within the current processor
         column (i.e. MYCOL .eq. CURCOL) they are the same.  However,
         on some processors, A( LII, LIJ ) points to an element
         above the diagonal, on these processors, LTLI = LII+1.
     The following variables give the number of rows and/or columns
     in various matrices:
       NP:  The number of local rows in A( 1:N, 1:N )
       NQ:  The number of local columns in A( 1:N, 1:N )
       NPM0:  The number of local rows in A( INDEX:N, INDEX:N )
       NQM0:  The number of local columns in A( INDEX:N, INDEX:N )
       NPM1:  The number of local rows in A( INDEX+1:N, INDEX:N )
       NQM1:  The number of local columns in A( INDEX+1:N, INDEX:N )
       LTNM0:  The number of local rows & columns in
         tril( A( INDEX:N, INDEX:N ) )
       LTNM1:  The number of local rows & columns in
         tril( A( INDEX+1:N, INDEX+1:N ) )
         NOTE:  LTNM0 == LTNM1 on all processors except the diagonal
         processors, i.e. those where MYCOL == MYROW.
         Invariants:
           NP = NPM0 + LII - 1
           NQ = NQM0 + LIJ - 1
           NP = NPM1 + LIIP1 - 1
           NQ = NQM1 + LIJP1 - 1
           NP = LTLI + LTNM0 - 1
           NP = LTLIP1 + LTNM1 - 1
       Temporary variables.  The following variables are used within
       a few lines after they are set and do hold state from one loop
       iteration to the next:
     The matrix A:
       The matrix A does not hold the same values that it would
       in an unblocked code nor the values that it would hold in
       in a blocked code.
       The value of A is confusing.  It is easiest to state the
       difference between trueA and A at the point that MVR2 is called,
       so we will start there.
       Let trueA be the value that A would
       have at a given point in an unblocked code and A
       be the value that A has in this code at the same point.
       At the time of the call to MVR2,
       trueA = A + V' * H + H' * V
       where H = H( MAXINDEX:N, 1:BINDEX ) and
       V = V( MAXINDEX:N, 1:BINDEX ).
       At the bottom of the inner loop,
       trueA = A +  V' * H + H' * V + v' * h + h' * v
       where H = H( MAXINDEX:N, 1:BINDEX ) and
       V = V( MAXINDEX:N, 1:BINDEX ) and
       v = V( liip1:N, BINDEX+1 ) and
       h = H( liip1:N, BINDEX+1 )
       At the top of the loop, BINDEX gets incremented, hence:
       trueA = A +  V' * H + H' * V + v' * h + h' * v
       where H = H( MAXINDEX:N, 1:BINDEX-1 ) and
       V = V( MAXINDEX:N, 1:BINDEX-1 ) and
       v = V( liip1:N, BINDEX ) and
       h = H( liip1:N, BINDEX )
       A gets updated at the bottom of the outer loop
       After this update, trueA = A + v' * h + h' * v
       where v = V( liip1:N, BINDEX ) and
       h = H( liip1:N, BINDEX ) and BINDEX = 0
       Indeed, the previous loop invariant as stated above for the
       top of the loop still holds, but with BINDEX = 0, H and V
       are null matrices.
       After the current column of A is updated,
         trueA( INDEX, INDEX:N ) = A( INDEX, INDEX:N )
       the rest of A is untouched.
       After the current block column of A is updated,
       trueA = A + V' * H + H' * V
       where H = H( MAXINDEX:N, 1:BINDEX ) and
       V = V( MAXINDEX:N, 1:BINDEX )
       This brings us back to the point at which mvr2 is called.
     Details of the parallelization:
       We delay spreading v across to all processor columns (which
       would naturally happen at the bottom of the loop) in order to
       combine the spread of v( : , i-1 ) with the spread of h( : , i )
       In order to compute h( :, i ), we must update A( :, i )
       which means that the processor column owning A( :, i ) must
       have: c, tau, v( i, i ) and h( i, i ).
       The traditional
       way of computing v (and the one used in pzlatrd.f and
       zlatrd.f) is:
         v = tau * v
         c = v' * h
         alpha = - tau * c / 2
         v = v + alpha * h
       However, the traditional way of computing v requires that tau
       be broadcast to all processors in the current column (to compute
       v = tau * v) and then a sum-to-all is required (to
       compute v' * h ).  We use the following formula instead:
         c = v' * h
         v = tau * ( v - c * tau' * h / 2 )
       The above formula allows tau to be spread down in the
       same call to SGSUM2D which performs the sum-to-all of c.
       The computation of v, which could be performed in any processor
       column (or other procesor subsets), is performed in the
       processor column that owns A( :, i+1 ) so that A( :, i+1 )
       can be updated prior to spreading v across.
       We keep the block column of A up-to-date to minimize the
       work required in updating the current column of A.  Updating
       the block column of A is reasonably load balanced whereas
       updating the current column of A is not (only the current
       processor column is involved).
     In the following overview of the steps performed, M in the
     margin indicates message traffic and C indicates O(n^2 nb/sqrt(p))
     or more flops per processor.
     Inner loop:
       A( index:n, index ) -= ( v * ht(bindex) + h * vt( bindex) )
M      h = house( A(index:n, index) )
M      Spread v, h across
M      vt = v^T; ht = h^T
       A( index+1:n, index+1:maxindex ) -=
         ( v * ht(index+1:maxindex) + h *vt(index+1:maxindex) )
C      v = tril(A) * h; vt = ht * tril(A,-1)
MorC   v = v - H*V*h - V*H*h
M      v = v + vt^T
M      c = v' * h
       v = tau * ( v - c * tau' * h / 2 )
C    A = A - H*V - V*H
     =================================================================
     .. Parameters ..

 
Display dynamic version Find AutoScroll Reload FontSize: - + Hide Comments Hide Blanks Frame FullScreen MailPrint

 
001        SUBROUTINE PCHETTRD( UPLO , N , A , IA , JA , DESCA , D , E , TAU , WORK ,
002       $LWORK , INFO )
003  
004  *     -- ScaLAPACK routine(version 1.7) --
005  *     University of Tennessee , Knoxville , Oak Ridge National Laboratory ,
006  *     and University of California , Berkeley.
007  *     October 15 , 1999
008  
009  *     .. Scalar Arguments ..
010        CHARACTER UPLO
011        INTEGER IA , INFO , JA , LWORK , N
012        INTEGER BLOCK_CYCLIC_2D , DLEN_ , DTYPE_ , CTXT_ , M_ , N_ ,
013       $MB_ , NB_ , RSRC_ , CSRC_ , LLD_
014        PARAMETER( BLOCK_CYCLIC_2D = 1 , DLEN_ = 9 , DTYPE_ = 1 ,
015       $CTXT_ = 2 , M_ = 3 , N_ = 4 , MB_ = 5 , NB_ = 6 ,
016       $RSRC_ = 7 , CSRC_ = 8 , LLD_ = 9 )
017        REAL ONE
018        PARAMETER( ONE = 1.0E0 )
019        COMPLEX Z_ONE , Z_NEGONE , Z_ZERO
020        PARAMETER( Z_ONE = 1.0E0 , Z_NEGONE = - 1.0E0 ,
021       $Z_ZERO = 0.0E0 )
022        REAL ZERO
023        PARAMETER( ZERO = 0.0E + 0 )
024  *     ..
025  
026  *     .. Local Scalars ..
027  
028        LOGICAL BALANCED , INTERLEAVE , TWOGEMMS , UPPER
029        INTEGER ANB , BINDEX , CURCOL , CURROW , I , ICTXT , INDEX ,
030       $INDEXA , INDEXINH , INDEXINV , INH , INHB , INHT ,
031       $INHTB , INTMP , INV , INVB , INVT , INVTB , J , LDA ,
032       $LDV , LDZG , LII , LIIB , LIIP1 , LIJ , LIJB , LIJP1 ,
033       $LTLIP1 , LTNM1 , LWMIN , MAXINDEX , MININDEX ,
034       $MYCOL , MYFIRSTROW , MYROW , MYSETNUM , NBZG , NP ,
035       $NPB , NPCOL , NPM0 , NPM1 , NPROW , NPS , NPSET , NQ ,
036       $NQB , NQM1 , NUMROWS , NXTCOL , NXTROW , PBMAX ,
037       $PBMIN , PBSIZE , PNB , ROWSPERPROC
038        REAL NORM , SAFMAX , SAFMIN
039        COMPLEX ALPHA , BETA , C , ONEOVERBETA , TOPH , TOPNV ,
040       $TOPTAU , TOPV , TTOPH , TTOPV
041  *     ..
042  *     .. Local Arrays ..
043  
044        INTEGER IDUM1( 1 ) , IDUM2( 1 )
045        REAL DTMP( 5 )
046        COMPLEX CC( 3 )
047  *     ..
048  *     .. External Subroutines ..
049        EXTERNAL BLACS_GRIDINFO , CGEBR2D , CGEBS2D , CGEMM , CGEMV ,
050       $CGERV2D , CGESD2D , CGSUM2D , CHK1MAT , CLACPY ,
051       $CSCAL , CTRMVT , PCHK1MAT , PSTREECOMB , PXERBLA ,
052       $SCOMBNRM2 , SGEBR2D , SGEBS2D , SGSUM2D
053  *     ..
054  *     .. External Functions ..
055  
056        LOGICAL LSAME
057        INTEGER ICEIL , NUMROC , PJLAENV
058        REAL PSLAMCH , SCNRM2
059        EXTERNAL LSAME , ICEIL , NUMROC , PJLAENV , PSLAMCH , SCNRM2
060  *     ..
061  *     .. Intrinsic Functions ..
062        INTRINSIC AIMAG , CMPLX , CONJG , ICHAR , MAX , MIN , MOD ,
063       $REAL , SIGN , SQRT
064  *     ..
065  
066  *     .. Executable Statements ..
067  *     This is just to keep ftnchek and toolpack / 1 happy
068        IF( BLOCK_CYCLIC_2D*CSRC_*CTXT_*DLEN_*DTYPE_*LLD_*MB_*M_*NB_*N_*
069       $    RSRC_.LT.0 )RETURN
070  
071  *         Further details
072  *         ===============
073  
074  *         At the top of the loop , v and nh have been computed but not
075  *         spread across. Hence , A is out - of - date even after the
076  *         rank 2k update. Furthermore , we compute the next v before
077  *         nh is spread across.
078  
079  *         I claim that if we used a sum - to - all on NV , by summing CC within
080  *         each column , that we could compute NV locally and could avoid
081  *         spreading V across. Bruce claims that sum - to - all can be made
082  *         to cost no more than sum - to - one on the Paragon. If that is
083  *         true , this would be a win. But ,
084  *         the BLACS sum - to - all is just a sum - to - one followed by a broadcast ,
085  *         and hence the present scheme is better for now.
086  
087  *         Get grid parameters
088  
089            ICTXT = DESCA( CTXT_ )
090            CALL BLACS_GRIDINFO( ICTXT , NPROW , NPCOL , MYROW , MYCOL )
091  
092            SAFMAX = SQRT( PSLAMCH( ICTXT , 'O' ) ) / N
093            SAFMIN = SQRT( PSLAMCH( ICTXT , 'S' ) )
094  
095  *         Test the input parameters
096  
097            INFO = 0
098            IF( NPROW.EQ. - 1 ) THEN
099                INFO = - ( 600 + CTXT_ )
100            ELSE
101  
102  *             Here we set execution options for PCHETTRD
103  
104                PNB = PJLAENV( ICTXT , 2 , 'PCHETTRD' , 'L' , 0 , 0 , 0 , 0 )
105                ANB = PJLAENV( ICTXT , 3 , 'PCHETTRD' , 'L' , 0 , 0 , 0 , 0 )
106  
107                INTERLEAVE =( PJLAENV( ICTXT , 4 , 'PCHETTRD' , 'L' , 1 , 0 , 0 ,
108       $        0 ).EQ.1 )
109                TWOGEMMS =( PJLAENV( ICTXT , 4 , 'PCHETTRD' , 'L' , 2 , 0 , 0 ,
110       $        0 ).EQ.1 )
111                BALANCED =( PJLAENV( ICTXT , 4 , 'PCHETTRD' , 'L' , 3 , 0 , 0 ,
112       $        0 ).EQ.1 )
113  
114                CALL CHK1MAT( N , 2 , N , 2 , IA , JA , DESCA , 6 , INFO )
115  
116                UPPER = LSAME( UPLO , 'U' )
117                IF( INFO.EQ.0 .AND. DESCA( NB_ ).NE.1 )
118       $            INFO = 600 + NB_
119                    IF( INFO.EQ.0 ) THEN
120  
121  *                     Here is the arithmetic :
122  *                     Let maxnpq = max( np , nq , 2 * ANB )
123  *                     LDV = 4 * max( np , nq ) + 2
124  *                     LWMIN = 2 * ( ANB + 1 ) * LDV + MAX( np , 2 * ANB )
125  *                     = 2 * ( ANB + 1 ) * ( 4 * NPS + 2 ) + NPS
126  
127  *                     This overestimates memory requirements when ANB > NP / 2
128  *                     Memory requirements are lower when interleave = .false.
129  *                     Hence , we could have two sets of memory requirements ,
130  *                     one for interleave and one for
131  
132                        NPS = MAX( NUMROC( N , 1 , 0 , 0 , NPROW ) , 2*ANB )
133                        LWMIN = 2*( ANB + 1 )*( 4*NPS + 2 ) + NPS
134  
135                        WORK( 1 ) = CMPLX( LWMIN )
136                        IF( .NOT.LSAME( UPLO , 'L' ) ) THEN
137                            INFO = - 1
138                        ELSE IF( IA.NE.1 ) THEN
139                            INFO = - 4
140                        ELSE IF( JA.NE.1 ) THEN
141                            INFO = - 5
142                        ELSE IF( NPROW.NE.NPCOL ) THEN
143                            INFO = - ( 600 + CTXT_ )
144                        ELSE IF( DESCA( DTYPE_ ).NE.1 ) THEN
145                            INFO = - ( 600 + DTYPE_ )
146                        ELSE IF( DESCA( MB_ ).NE.1 ) THEN
147                            INFO = - ( 600 + MB_ )
148                        ELSE IF( DESCA( NB_ ).NE.1 ) THEN
149                            INFO = - ( 600 + NB_ )
150                        ELSE IF( DESCA( RSRC_ ).NE.0 ) THEN
151                            INFO = - ( 600 + RSRC_ )
152                        ELSE IF( DESCA( CSRC_ ).NE.0 ) THEN
153                            INFO = - ( 600 + CSRC_ )
154                        ELSE IF( LWORK.LT.LWMIN ) THEN
155                            INFO = - 11
156                        END IF
157                    END IF
158                    IF( UPPER ) THEN
159                        IDUM1( 1 ) = ICHAR( 'U' )
160                    ELSE
161                        IDUM1( 1 ) = ICHAR( 'L' )
162                    END IF
163                    IDUM2( 1 ) = 1
164  
165                    CALL PCHK1MAT( N , 2 , N , 2 , IA , JA , DESCA , 6 , 1 , IDUM1 , IDUM2 ,
166       $            INFO )
167                END IF
168  
169                IF( INFO.NE.0 ) THEN
170                    CALL PXERBLA( ICTXT , 'PCHETTRD' , - INFO )
171                    RETURN
172                END IF
173  
174  *             Quick return if possible
175  
176                IF( N.EQ.0 )
177       $            RETURN
178  
179  *                 Reduce the lower triangle of sub( A )
180                    NP = NUMROC( N , 1 , MYROW , 0 , NPROW )
181                    NQ = NUMROC( N , 1 , MYCOL , 0 , NPCOL )
182  
183                    NXTROW = 0
184                    NXTCOL = 0
185  
186                    LIIP1 = 1
187                    LIJP1 = 1
188                    NPM1 = NP
189                    NQM1 = NQ
190  
191                    LDA = DESCA( LLD_ )
192                    ICTXT = DESCA( CTXT_ )
193  
194  *                 Miscellaneous details :
195  *                 Put tau , D and E in the right places
196  *                 Check signs
197  *                 Place all the arrays in WORK , control their placement
198  *                 in memory.
199  
200  *                 Loop invariants
201  *                 A(LIIP1 , LIJ) points to the first element of A(I + 1 , J)
202  *                 NPM1 , NQM1 = the number of rows , cols in A( LII + 1 : N , LIJ + 1 : N )
203  *                 A(LII : N , LIJ : N) is one step out of date.
204  *                 proc( CURROW , CURCOL ) owns A(LII , LIJ)
205  *                 proc( NXTROW , CURCOL ) owns A(LIIP1 , LIJ)
206  
207                    INH = 1
208  
209                    IF( INTERLEAVE ) THEN
210  
211  *                     H and V are interleaved to minimize memory movement
212  *                     LDV has to be twice as large to accomodate interleaving.
213  *                     In addition , LDV is doubled again to allow v , h and
214  *                     toptau to be spreaad across and transposed in a
215  *                     single communication operation with minimum memory
216  *                     movement.
217  
218  *                     We could reduce LDV back to 2*MAX(NPM1 , NQM1)
219  *                     by increasing the memory movement required in
220  *                     the spread and transpose of v , h and toptau.
221  *                     However , since the non - interleaved path already
222  *                     provides a mear minimum memory requirement option ,
223  *                     we did not provide this additional path.
224  
225                        LDV = 4*( MAX( NPM1 , NQM1 ) ) + 2
226  
227                        INH = 1
228  
229                        INV = INH + LDV / 2
230                        INVT = INH + ( ANB + 1 )*LDV
231  
232                        INHT = INVT + LDV / 2
233                        INTMP = INVT + LDV*( ANB + 1 )
234  
235                    ELSE
236                        LDV = MAX( NPM1 , NQM1 )
237  
238                        INHT = INH + LDV*( ANB + 1 )
239                        INV = INHT + LDV*( ANB + 1 )
240  
241  *                     The code works without this + 1 , but only because of a
242  *                     coincidence. Without the + 1 , WORK(INVT) gets trashed , but
243  *                     WORK(INVT) is only used once and when it is used , it is
244  *                     multiplied by WORK( INH ) which is zero. Hence , the fact
245  *                     that WORK(INVT) is trashed has no effect.
246  
247                        INVT = INV + LDV*( ANB + 1 ) + 1
248                        INTMP = INVT + LDV*( 2*ANB )
249  
250                    END IF
251  
252                    IF( INFO.NE.0 ) THEN
253                        CALL PXERBLA( ICTXT , 'PCHETTRD' , - INFO )
254                        WORK( 1 ) = CMPLX( LWMIN )
255                        RETURN
256                    END IF
257  
258  *                 The satisfies the loop invariant : trueA = A - V * HT - H * VT ,
259  *                 (where V , H , VT and HT all have BINDEX + 1 rows / columns)
260  *                 the first ANB times through the loop.
261  
262  *                 Setting either( InH and InHT ) or InV to Z_ZERO
263  *                 is adequate except in the face of NaNs.
264  
265                    DO 10 I = 1 , NP
266                        WORK( INH + I - 1 ) = Z_ZERO
267                        WORK( INV + I - 1 ) = Z_ZERO
268     10             CONTINUE
269                    DO 20 I = 1 , NQ
270                        WORK( INHT + I - 1 ) = Z_ZERO
271     20             CONTINUE
272  
273                    TOPNV = Z_ZERO
274  
275                    LTLIP1 = LIJP1
276                    LTNM1 = NPM1
277                    IF( MYCOL.GT.MYROW ) THEN
278                        LTLIP1 = LTLIP1 + 1
279                        LTNM1 = LTNM1 - 1
280                    END IF
281  
282                    DO 210 MININDEX = 1 , N - 1 , ANB
283  
284                        MAXINDEX = MIN( MININDEX + ANB - 1 , N )
285                        LIJB = NUMROC( MAXINDEX , 1 , MYCOL , 0 , NPCOL ) + 1
286                        LIIB = NUMROC( MAXINDEX , 1 , MYROW , 0 , NPROW ) + 1
287  
288                        NQB = NQ - LIJB + 1
289                        NPB = NP - LIIB + 1
290                        INHTB = INHT + LIJB - 1
291                        INVTB = INVT + LIJB - 1
292                        INHB = INH + LIIB - 1
293                        INVB = INV + LIIB - 1
294  
295                        DO 160 INDEX = MININDEX , MIN( MAXINDEX , N - 1 )
296  
297                            BINDEX = INDEX - MININDEX
298  
299                            CURROW = NXTROW
300                            CURCOL = NXTCOL
301  
302                            NXTROW = MOD( CURROW + 1 , NPROW )
303                            NXTCOL = MOD( CURCOL + 1 , NPCOL )
304  
305                            LII = LIIP1
306                            LIJ = LIJP1
307                            NPM0 = NPM1
308  
309                            IF( MYROW.EQ.CURROW ) THEN
310                                NPM1 = NPM1 - 1
311                                LIIP1 = LIIP1 + 1
312                            END IF
313                            IF( MYCOL.EQ.CURCOL ) THEN
314                                NQM1 = NQM1 - 1
315                                LIJP1 = LIJP1 + 1
316                                LTLIP1 = LTLIP1 + 1
317                                LTNM1 = LTNM1 - 1
318                            END IF
319  
320  *                         V = NV , VT = NVT , H = NH , HT = NHT
321  
322  *                         Update the current column of A
323  
324                            IF( MYCOL.EQ.CURCOL ) THEN
325  
326                                INDEXA = LII + ( LIJ - 1 )*LDA
327                                INDEXINV = INV + LII - 1 + ( BINDEX - 1 )*LDV
328                                INDEXINH = INH + LII - 1 + ( BINDEX - 1 )*LDV
329                                TTOPH = CONJG( WORK( INHT + LIJ - 1 + BINDEX*LDV ) )
330                                TTOPV = CONJG( TOPNV )
331  
332                                IF( INDEX.GT.1 ) THEN
333                                    DO 30 I = 0 , NPM0 - 1
334  *                                     A( INDEXA + I ) = A( INDEXA + I )
335                                        A( INDEXA + I ) = A( INDEXA + I ) -
336       $                                WORK( INDEXINV + LDV + I )*TTOPH -
337       $                                WORK( INDEXINH + LDV + I )*TTOPV
338     30                             CONTINUE
339                                END IF
340  
341                            END IF
342  
343                            IF( MYCOL.EQ.CURCOL ) THEN
344  
345  *                             Compute the householder vector
346  
347                                IF( MYROW.EQ.CURROW ) THEN
348                                    DTMP( 2 ) = REAL( A( LII + ( LIJ - 1 )*LDA ) )
349                                ELSE
350                                    DTMP( 2 ) = ZERO
351                                END IF
352                                IF( MYROW.EQ.NXTROW ) THEN
353                                    DTMP( 3 ) = REAL( A( LIIP1 + ( LIJ - 1 )*LDA ) )
354                                    DTMP( 4 ) = AIMAG( A( LIIP1 + ( LIJ - 1 )*LDA ) )
355                                ELSE
356                                    DTMP( 3 ) = ZERO
357                                    DTMP( 4 ) = ZERO
358                                END IF
359  
360                                NORM = SCNRM2( NPM1 , A( LIIP1 + ( LIJ - 1 )*LDA ) , 1 )
361                                DTMP( 1 ) = NORM
362  
363  *                             IF DTMP(5) = 1.0 , NORM is too large and might cause
364  *                             overflow , hence PSTREECOMB must be called. IF DTMP(5)
365  *                             is zero on output , DTMP(1) can be trusted.
366  
367                                DTMP( 5 ) = ZERO
368                                IF( DTMP( 1 ).GE.SAFMAX .OR. DTMP( 1 ).LT.SAFMIN ) THEN
369                                    DTMP( 5 ) = ONE
370                                    DTMP( 1 ) = ZERO
371                                END IF
372  
373                                DTMP( 1 ) = DTMP( 1 )*DTMP( 1 )
374                                CALL SGSUM2D( ICTXT , 'C' , ' ' , 5 , 1 , DTMP , 5 , - 1 ,
375       $                        CURCOL )
376                                IF( DTMP( 5 ).EQ.ZERO ) THEN
377                                    DTMP( 1 ) = SQRT( DTMP( 1 ) )
378                                ELSE
379                                    DTMP( 1 ) = NORM
380                                    CALL PSTREECOMB( ICTXT , 'C' , 1 , DTMP , - 1 , MYCOL ,
381       $                            SCOMBNRM2 )
382                                END IF
383  
384                                NORM = DTMP( 1 )
385  
386                                D( LIJ ) = DTMP( 2 )
387                                IF( MYROW.EQ.CURROW .AND. MYCOL.EQ.CURCOL ) THEN
388                                    A( LII + ( LIJ - 1 )*LDA ) = CMPLX( D( LIJ ) , ZERO )
389                                END IF
390  
391                                ALPHA = CMPLX( DTMP( 3 ) , DTMP( 4 ) )
392  
393                                NORM = SIGN( NORM , REAL( ALPHA ) )
394  
395                                IF( NORM.EQ.ZERO ) THEN
396                                    TOPTAU = ZERO
397                                ELSE
398                                    BETA = NORM + ALPHA
399                                    TOPTAU = BETA / NORM
400                                    ONEOVERBETA = 1.0E0 / BETA
401  
402                                    CALL CSCAL( NPM1 , ONEOVERBETA ,
403       $                            A( LIIP1 + ( LIJ - 1 )*LDA ) , 1 )
404                                END IF
405  
406                                IF( MYROW.EQ.NXTROW ) THEN
407                                    A( LIIP1 + ( LIJ - 1 )*LDA ) = Z_ONE
408                                END IF
409  
410                                TAU( LIJ ) = TOPTAU
411                                E( LIJ ) = - NORM
412  
413                            END IF
414  
415  *                         Spread v , nh , toptau across
416  
417                            DO 40 I = 0 , NPM1 - 1
418                                WORK( INV + LIIP1 - 1 + BINDEX*LDV + NPM1 + I ) = A( LIIP1 + I +
419       $( LIJ - 1 )*LDA )
420     40                     CONTINUE
421  
422                            IF( MYCOL.EQ.CURCOL ) THEN
423                                WORK( INV + LIIP1 - 1 + BINDEX*LDV + NPM1 + NPM1 ) = TOPTAU
424                                CALL CGEBS2D( ICTXT , 'R' , ' ' , NPM1 + NPM1 + 1 , 1 ,
425       $                        WORK( INV + LIIP1 - 1 + BINDEX*LDV ) ,
426       $                        NPM1 + NPM1 + 1 )
427                            ELSE
428                                CALL CGEBR2D( ICTXT , 'R' , ' ' , NPM1 + NPM1 + 1 , 1 ,
429       $                        WORK( INV + LIIP1 - 1 + BINDEX*LDV ) ,
430       $                        NPM1 + NPM1 + 1 , MYROW , CURCOL )
431                                TOPTAU = WORK( INV + LIIP1 - 1 + BINDEX*LDV + NPM1 + NPM1 )
432                            END IF
433                            DO 50 I = 0 , NPM1 - 1
434                                WORK( INH + LIIP1 - 1 + ( BINDEX + 1 )*LDV + I ) = WORK( INV + LIIP1 -
435       $                        1 + BINDEX*LDV + NPM1 + I )
436     50                     CONTINUE
437  
438                            IF( INDEX.LT.N ) THEN
439                                IF( MYROW.EQ.NXTROW .AND. MYCOL.EQ.CURCOL )
440       $                            A( LIIP1 + ( LIJ - 1 )*LDA ) = E( LIJ )
441                                END IF
442  
443  *                             Transpose v , nh
444  
445                                IF( MYROW.EQ.MYCOL ) THEN
446                                    DO 60 I = 0 , NPM1 + NPM1
447                                        WORK( INVT + LIJP1 - 1 + BINDEX*LDV + I ) = WORK( INV + LIIP1 - 1 +
448       $                                BINDEX*LDV + I )
449     60                 CONTINUE
450                    ELSE
451                        CALL CGESD2D( ICTXT , NPM1 + NPM1 , 1 ,
452       $                WORK( INV + LIIP1 - 1 + BINDEX*LDV ) , NPM1 + NPM1 ,
453       $                MYCOL , MYROW )
454                        CALL CGERV2D( ICTXT , NQM1 + NQM1 , 1 ,
455       $                WORK( INVT + LIJP1 - 1 + BINDEX*LDV ) , NQM1 + NQM1 ,
456       $                MYCOL , MYROW )
457                    END IF
458  
459                    DO 70 I = 0 , NQM1 - 1
460                        WORK( INHT + LIJP1 - 1 + ( BINDEX + 1 )*LDV + I ) = WORK( INVT +
461       $                LIJP1 - 1 + BINDEX*LDV + NQM1 + I )
462     70             CONTINUE
463  
464  *                 Update the current block column of A
465  
466                    IF( INDEX.GT.1 ) THEN
467                        DO 90 J = LIJP1 , LIJB - 1
468                            DO 80 I = 0 , NPM1 - 1
469  
470                                A( LIIP1 + I + ( J - 1 )*LDA ) = A( LIIP1 + I + ( J - 1 )*LDA )
471       $                        - WORK( INV + LIIP1 - 1 + BINDEX*LDV + I )*
472       $                        CONJG( WORK( INHT + J - 1 + BINDEX*LDV ) ) -
473       $                        WORK( INH + LIIP1 - 1 + BINDEX*LDV + I )*
474       $                        CONJG( WORK( INVT + J - 1 + BINDEX*LDV ) )
475     80                     CONTINUE
476     90                 CONTINUE
477                    END IF
478  
479  *                 Compute NV = A * NHT ; NVT = A * NH
480  
481  *                 These two lines are necessary because these elements
482  *                 are not always involved in the calls to CTRMVT
483  *                 for two reasons :
484  *                 1) On diagonal processors , the call to TRMVT
485  *                 involves only LTNM1 - 1 elements
486  *                 2) On some processes , NQM1 < LTM1 or LIIP1 < LTLIP1
487  *                 and when the results are combined across all processes ,
488  *                 uninitialized values may be included.
489                    WORK( INV + LIIP1 - 1 + ( BINDEX + 1 )*LDV ) = Z_ZERO
490                    WORK( INVT + LIJP1 - 1 + ( BINDEX + 1 )*LDV + NQM1 - 1 ) = Z_ZERO
491  
492                    IF( MYROW.EQ.MYCOL ) THEN
493                        IF( LTNM1.GT.1 ) THEN
494                            CALL CTRMVT ( 'L' , LTNM1 - 1 ,
495       $                    A( LTLIP1 + 1 + ( LIJP1 - 1 )*LDA ) , LDA ,
496       $                    WORK( INVT + LIJP1 - 1 + ( BINDEX + 1 )*LDV ) , 1 ,
497       $                    WORK( INH + LTLIP1 + 1 - 1 + ( BINDEX + 1 )*LDV ) ,
498       $                    1 , WORK( INV + LTLIP1 + 1 - 1 + ( BINDEX + 1 )*
499       $                    LDV ) , 1 , WORK( INHT + LIJP1 - 1 + ( BINDEX +
500       $                    1 )*LDV ) , 1 )
501                        END IF
502                        DO 100 I = 1 , LTNM1
503                            WORK( INVT + LIJP1 + I - 1 - 1 + ( BINDEX + 1 )*LDV )
504       $                    = WORK( INVT + LIJP1 + I - 1 - 1 + ( BINDEX + 1 )*LDV ) +
505       $                    A( LTLIP1 + I - 1 + ( LIJP1 + I - 1 - 1 )*LDA )*
506       $                    WORK( INH + LTLIP1 + I - 1 - 1 + ( BINDEX + 1 )*LDV )
507    100                 CONTINUE
508                    ELSE
509                        IF( LTNM1.GT.0 )
510       $                    CALL CTRMVT ( 'L' , LTNM1 , A( LTLIP1 + ( LIJP1 - 1 )*LDA ) ,
511       $                    LDA , WORK( INVT + LIJP1 - 1 + ( BINDEX + 1 )*
512       $                    LDV ) , 1 , WORK( INH + LTLIP1 - 1 + ( BINDEX +
513       $                    1 )*LDV ) , 1 , WORK( INV + LTLIP1 - 1 +
514       $( BINDEX + 1 )*LDV ) , 1 ,
515       $                    WORK( INHT + LIJP1 - 1 + ( BINDEX + 1 )*LDV ) ,
516       $                    1 )
517  
518                        END IF
519  
520  *                     We take advantage of the fact that :
521  *                     A * sum( B ) = sum( A * B ) for matrices A , B
522  
523  *                     trueA = A + V * HT + H * VT
524  *                     hence :(trueA)v = Av' + V * HT * v + H * VT * v
525  *                     VT * v = sum_p_in_NPROW( VTp * v )
526  *                     H * VT * v = H * sum(VTp * v) = sum( H * VTp * v )
527  
528  *                     v = v + V * HT * h + H * VT * h
529  
530  *                     tmp = HT * nh1
531                        DO 110 I = 1 , 2*( BINDEX + 1 )
532                            WORK( INTMP - 1 + I ) = 0
533    110                 CONTINUE
534  
535                        IF( BALANCED ) THEN
536                            NPSET = NPROW
537                            MYSETNUM = MYROW
538                            ROWSPERPROC = ICEIL( NQB , NPSET )
539                            MYFIRSTROW = MIN( NQB + 1 , 1 + ROWSPERPROC*MYSETNUM )
540                            NUMROWS = MIN( ROWSPERPROC , NQB - MYFIRSTROW + 1 )
541  
542  *                         tmp = HT * v
543  
544                            CALL CGEMV( 'C' , NUMROWS , BINDEX + 1 , Z_ONE ,
545       $                    WORK( INHTB + MYFIRSTROW - 1 ) , LDV ,
546       $                    WORK( INHTB + MYFIRSTROW - 1 + ( BINDEX + 1 )*LDV ) ,
547       $                    1 , Z_ZERO , WORK( INTMP ) , 1 )
548  *                         tmp2 = VT * v
549                            CALL CGEMV( 'C' , NUMROWS , BINDEX + 1 , Z_ONE ,
550       $                    WORK( INVTB + MYFIRSTROW - 1 ) , LDV ,
551       $                    WORK( INHTB + MYFIRSTROW - 1 + ( BINDEX + 1 )*LDV ) ,
552       $                    1 , Z_ZERO , WORK( INTMP + BINDEX + 1 ) , 1 )
553  
554                            CALL CGSUM2D( ICTXT , 'C' , ' ' , 2*( BINDEX + 1 ) , 1 ,
555       $                    WORK( INTMP ) , 2*( BINDEX + 1 ) , - 1 , - 1 )
556                        ELSE
557  *                         tmp = HT * v
558  
559                            CALL CGEMV( 'C' , NQB , BINDEX + 1 , Z_ONE , WORK( INHTB ) ,
560       $                    LDV , WORK( INHTB + ( BINDEX + 1 )*LDV ) , 1 ,
561       $                    Z_ZERO , WORK( INTMP ) , 1 )
562  *                         tmp2 = VT * v
563                            CALL CGEMV( 'C' , NQB , BINDEX + 1 , Z_ONE , WORK( INVTB ) ,
564       $                    LDV , WORK( INHTB + ( BINDEX + 1 )*LDV ) , 1 ,
565       $                    Z_ZERO , WORK( INTMP + BINDEX + 1 ) , 1 )
566  
567                        END IF
568  
569                        IF( BALANCED ) THEN
570                            MYSETNUM = MYCOL
571  
572                            ROWSPERPROC = ICEIL( NPB , NPSET )
573                            MYFIRSTROW = MIN( NPB + 1 , 1 + ROWSPERPROC*MYSETNUM )
574                            NUMROWS = MIN( ROWSPERPROC , NPB - MYFIRSTROW + 1 )
575  
576                            CALL CGSUM2D( ICTXT , 'R' , ' ' , 2*( BINDEX + 1 ) , 1 ,
577       $                    WORK( INTMP ) , 2*( BINDEX + 1 ) , - 1 , - 1 )
578  
579  *                         v = v + V * tmp
580                            IF( INDEX.GT.1. ) THEN
581                                CALL CGEMV( 'N' , NUMROWS , BINDEX + 1 , Z_NEGONE ,
582       $                        WORK( INVB + MYFIRSTROW - 1 ) , LDV ,
583       $                        WORK( INTMP ) , 1 , Z_ONE ,
584       $                        WORK( INVB + MYFIRSTROW - 1 + ( BINDEX + 1 )*
585       $                        LDV ) , 1 )
586  
587  *                             v = v + H * tmp2
588                                CALL CGEMV( 'N' , NUMROWS , BINDEX + 1 , Z_NEGONE ,
589       $                        WORK( INHB + MYFIRSTROW - 1 ) , LDV ,
590       $                        WORK( INTMP + BINDEX + 1 ) , 1 , Z_ONE ,
591       $                        WORK( INVB + MYFIRSTROW - 1 + ( BINDEX + 1 )*
592       $                        LDV ) , 1 )
593                            END IF
594  
595                        ELSE
596  *                         v = v + V * tmp
597                            CALL CGEMV( 'N' , NPB , BINDEX + 1 , Z_NEGONE , WORK( INVB ) ,
598       $                    LDV , WORK( INTMP ) , 1 , Z_ONE ,
599       $                    WORK( INVB + ( BINDEX + 1 )*LDV ) , 1 )
600  
601  *                         v = v + H * tmp2
602                            CALL CGEMV( 'N' , NPB , BINDEX + 1 , Z_NEGONE , WORK( INHB ) ,
603       $                    LDV , WORK( INTMP + BINDEX + 1 ) , 1 , Z_ONE ,
604       $                    WORK( INVB + ( BINDEX + 1 )*LDV ) , 1 )
605  
606                        END IF
607  
608  *                     Transpose NV and add it back into NVT
609  
610                        IF( MYROW.EQ.MYCOL ) THEN
611                            DO 120 I = 0 , NQM1 - 1
612                                WORK( INTMP + I ) = WORK( INVT + LIJP1 - 1 + ( BINDEX + 1 )*LDV +
613       $                        I )
614    120                     CONTINUE
615                        ELSE
616                            CALL CGESD2D( ICTXT , NQM1 , 1 ,
617       $                    WORK( INVT + LIJP1 - 1 + ( BINDEX + 1 )*LDV ) ,
618       $                    NQM1 , MYCOL , MYROW )
619                            CALL CGERV2D( ICTXT , NPM1 , 1 , WORK( INTMP ) , NPM1 , MYCOL ,
620       $                    MYROW )
621  
622                        END IF
623                        DO 130 I = 0 , NPM1 - 1
624                            WORK( INV + LIIP1 - 1 + ( BINDEX + 1 )*LDV + I ) = WORK( INV + LIIP1 -
625       $                    1 + ( BINDEX + 1 )*LDV + I ) + WORK( INTMP + I )
626    130                 CONTINUE
627  
628  *                     Sum - to - one NV rowwise(within a row)
629  
630                        CALL CGSUM2D( ICTXT , 'R' , ' ' , NPM1 , 1 ,
631       $                WORK( INV + LIIP1 - 1 + ( BINDEX + 1 )*LDV ) , NPM1 ,
632       $                MYROW , NXTCOL )
633  
634  *                     Dot product c = NV * NH
635  *                     Sum - to - all c within next processor column
636  
637                        IF( MYCOL.EQ.NXTCOL ) THEN
638                            CC( 1 ) = Z_ZERO
639                            DO 140 I = 0 , NPM1 - 1
640                                CC( 1 ) = CC( 1 ) + CONJG( WORK( INV + LIIP1 - 1 + ( BINDEX +
641       $                        1 )*LDV + I ) )*WORK( INH + LIIP1 - 1 +
642       $( BINDEX + 1 )*LDV + I )
643    140                     CONTINUE
644                            IF( MYROW.EQ.NXTROW ) THEN
645                                CC( 2 ) = WORK( INV + LIIP1 - 1 + ( BINDEX + 1 )*LDV )
646                                CC( 3 ) = WORK( INH + LIIP1 - 1 + ( BINDEX + 1 )*LDV )
647                            ELSE
648                                CC( 2 ) = Z_ZERO
649                                CC( 3 ) = Z_ZERO
650                            END IF
651                            CALL CGSUM2D( ICTXT , 'C' , ' ' , 3 , 1 , CC , 3 , - 1 , NXTCOL )
652  
653                            TOPV = CC( 2 )
654                            C = CC( 1 )
655                            TOPH = CC( 3 )
656  
657                            TOPNV = TOPTAU*( TOPV - C*CONJG( TOPTAU ) / 2*TOPH )
658  
659  *                         Compute V = Tau * (V - C * Tau' / 2 * H )
660  
661                            DO 150 I = 0 , NPM1 - 1
662                                WORK( INV + LIIP1 - 1 + ( BINDEX + 1 )*LDV + I ) = TOPTAU*
663       $( WORK( INV + LIIP1 - 1 + ( BINDEX + 1 )*LDV + I ) - C*
664       $                        CONJG( TOPTAU ) / 2*WORK( INH + LIIP1 - 1 + ( BINDEX + 1 )*
665       $                        LDV + I ) )
666    150                     CONTINUE
667  
668                        END IF
669  
670    160                 CONTINUE
671  
672  *                     Perform the rank2k update
673  
674                        IF( MAXINDEX.LT.N ) THEN
675  
676                            DO 170 I = 0 , NPM1 - 1
677                                WORK( INTMP + I ) = WORK( INH + LIIP1 - 1 + ANB*LDV + I )
678    170                     CONTINUE
679  
680                            IF( .NOT.TWOGEMMS ) THEN
681                                IF( INTERLEAVE ) THEN
682                                    LDZG = LDV / 2
683                                ELSE
684                                    CALL CLACPY( 'A' , LTNM1 , ANB , WORK( INHT + LIJP1 - 1 ) ,
685       $                            LDV , WORK( INVT + LIJP1 - 1 + ANB*LDV ) , LDV )
686  
687                                    CALL CLACPY( 'A' , LTNM1 , ANB , WORK( INV + LTLIP1 - 1 ) ,
688       $                            LDV , WORK( INH + LTLIP1 - 1 + ANB*LDV ) , LDV )
689                                    LDZG = LDV
690                                END IF
691                                NBZG = ANB*2
692                            ELSE
693                                LDZG = LDV
694                                NBZG = ANB
695                            END IF
696  
697                            DO 180 PBMIN = 1 , LTNM1 , PNB
698  
699                                PBSIZE = MIN( PNB , LTNM1 - PBMIN + 1 )
700                                PBMAX = MIN( LTNM1 , PBMIN + PNB - 1 )
701                                CALL CGEMM( 'N' , 'C' , PBSIZE , PBMAX , NBZG , Z_NEGONE ,
702       $                        WORK( INH + LTLIP1 - 1 + PBMIN - 1 ) , LDZG ,
703       $                        WORK( INVT + LIJP1 - 1 ) , LDZG , Z_ONE ,
704       $                        A( LTLIP1 + PBMIN - 1 + ( LIJP1 - 1 )*LDA ) , LDA )
705                                IF( TWOGEMMS ) THEN
706                                    CALL CGEMM( 'N' , 'C' , PBSIZE , PBMAX , ANB , Z_NEGONE ,
707       $                            WORK( INV + LTLIP1 - 1 + PBMIN - 1 ) , LDZG ,
708       $                            WORK( INHT + LIJP1 - 1 ) , LDZG , Z_ONE ,
709       $                            A( LTLIP1 + PBMIN - 1 + ( LIJP1 - 1 )*LDA ) , LDA )
710                                END IF
711    180                     CONTINUE
712  
713                            DO 190 I = 0 , NPM1 - 1
714                                WORK( INV + LIIP1 - 1 + I ) = WORK( INV + LIIP1 - 1 + ANB*LDV + I )
715                                WORK( INH + LIIP1 - 1 + I ) = WORK( INTMP + I )
716    190                     CONTINUE
717                            DO 200 I = 0 , NQM1 - 1
718                                WORK( INHT + LIJP1 - 1 + I ) = WORK( INHT + LIJP1 - 1 + ANB*LDV + I )
719    200                     CONTINUE
720  
721                        END IF
722  
723  *                     End of the update A code
724  
725    210             CONTINUE
726  
727                    IF( MYCOL.EQ.NXTCOL ) THEN
728                        IF( MYROW.EQ.NXTROW ) THEN
729  
730                            D( NQ ) = REAL( A( NP + ( NQ - 1 )*LDA ) )
731                            A( NP + ( NQ - 1 )*LDA ) = D( NQ )
732  
733                            CALL SGEBS2D( ICTXT , 'C' , ' ' , 1 , 1 , D( NQ ) , 1 )
734                        ELSE
735                            CALL SGEBR2D( ICTXT , 'C' , ' ' , 1 , 1 , D( NQ ) , 1 , NXTROW ,
736       $                    NXTCOL )
737                        END IF
738                    END IF
739  
740                    WORK( 1 ) = CMPLX( LWMIN )
741                    RETURN
742  
743  *                 End of PCHETTRD
744  
745                END