|
|
| |
| # lines: |
1511 | | # code: |
1511 | | # comment: | 0 | |
# blank: | 0 |
| # Variables: | 37 |
| # Callers: | 1 |
| # Callings: | 1 |
| # Words: | 304 |
| # Keywords: | 120 |
|
|
|
|
|
..
.. Local Scalars ..
..
.. Local Arrays ..
..
.. External Subroutines ..
..
.. External Functions ..
..
.. Intrinsic Functions ..
..
.. Executable Statements ..
Test the input parameters
Convert descriptor into standard form for easy access to
parameters, check that grid is of right shape.
Temporarily set the descriptor type to 1xP type
Consistency checks for DESCA and DESCB.
Context must be the same
These are alignment restrictions that may or may not be removed
in future releases. -Andy Cleary, April 14, 1996.
Block sizes must be the same
Source processor must be the same
Get values out of descriptor for use in code.
Get grid parameters
Current alignment restriction
Argument checking that is specific to Divide & Conquer routine
Pack params and positions into arrays for global consistency check
Want to find errors with MIN( ), so if no error, set it to a big
number. If there already is an error, multiply by the the
descriptor multiplier.
Check consistency across processors
Prepare output: set info = 0 if no error, and divide by DESCMULT
if error is not in a descriptor entry.
Quick return if possible
Adjust addressing into matrix space to properly get into
the beginning part of the relevant data
Form a new BLACS grid (the "standard form" grid) with only procs
holding part of the matrix, of size 1xNP where NP is adjusted,
starting at csrc=0, with JA modified to reflect dropped procs.
First processor to hold part of the matrix:
Calculate new JA one while dropping off unused processors.
Save and compute new value of NP
Call utility routine that forms "standard-form" grid
Use new context from standard grid as context.
Get information about new grid.
Drop out processors that do not have part of the matrix.
********************************
Values reused throughout routine
User-input value of partition size
Number of columns in each processor
Offset in columns to beginning of main partition in each proc
Size of main (or odd) partition in each processor
Begin main code
Frontsolve
*****************************************
Local computation phase
*****************************************
Use main partition in each processor to solve locally
Use factorization of odd-even connection block to modify
locally stored portion of right hand side(s)
Use the "spike" fillin to calculate contribution to previous
processor's righthand-side.
***********************************************
Formation and solution of reduced system
***********************************************
Send modifications to prior processor's right hand sides
Receive modifications to processor's right hand sides
Combine contribution to locally stored right hand sides
The last processor does not participate in the solution of the
reduced system, having sent its contribution already.
*************************************
Modification Loop
The distance for sending and receiving for each level starts
at 1 for the first level.
Do until this proc is needed to modify other procs' equations
Receive and add contribution to righthand sides from left
Receive and add contribution to righthand sides from right
[End of GOTO Loop]
*********************************
Calculate and use this proc's blocks to modify other procs
Solve with diagonal block
*********
Calculate contribution from this block to next diagonal block
Send contribution to diagonal block's owning processor.
End of "if( mycol/level_dist .le. (npcol-1)/level_dist-2 )..."
************
Use offdiagonal block to calculate modification to diag block
of processor to the left
Send contribution to diagonal block's owning processor.
End of "if( mycol/level_dist.le. (npcol-1)/level_dist -1 )..."
******************* BACKSOLVE *************************************
*******************************************************************
.. Begin reduced system phase of algorithm ..
*******************************************************************
The last processor does not participate in the solution of the
reduced system and just waits to receive its solution.
Determine number of steps in tree loop
Receive solution from processor to left
Use offdiagonal block to calculate modification to RHS stored
on this processor
End of "if( mycol/level_dist.le. (npcol-1)/level_dist -1 )..."
Receive solution from processor to right
Calculate contribution from this block to next diagonal block
End of "if( mycol/level_dist .le. (npcol-1)/level_dist-2 )..."
Solve with diagonal block
**Modification Loop *******
Send solution to the right
Send solution to left
[End of GOTO Loop]
[Processor npcol - 1 jumped to here to await next stage]
******************************
Reduced system has been solved, communicate solutions to nearest
neighbors in preparation for local computation phase.
Send elements of solution to next proc
Receive modifications to processor's right hand sides
*********************************************
Local computation phase
*********************************************
Use the "spike" fillin to calculate contribution from previous
processor's solution.
Use factorization of odd-even connection block to modify
locally stored portion of right hand side(s)
Use main partition in each processor to solve locally
End of "IF( LSAME( TRANS, 'N' ) )"...
**************************************************************
CASE UPLO = 'U' *
**************************************************************
Frontsolve
*****************************************
Local computation phase
*****************************************
Use main partition in each processor to solve locally
Use factorization of odd-even connection block to modify
locally stored portion of right hand side(s)
|
|
|
|
001 SUBROUTINE PZPTTRSV( UPLO , TRANS , N , NRHS , D , E , JA , DESCA , B , IB ,
002 $DESCB , AF , LAF , WORK , LWORK , INFO )
003
004 * -- ScaLAPACK routine(version 1.7) --
005 * University of Tennessee , Knoxville , Oak Ridge National Laboratory ,
006 * and University of California , Berkeley.
007 * November 15 , 1997
008
009 * .. Scalar Arguments ..
010 CHARACTER TRANS , UPLO
011 INTEGER IB , INFO , JA , LAF , LWORK , N , NRHS
012 * ..
013 * .. Array Arguments ..
014 INTEGER DESCA( * ) , DESCB( * )
015 COMPLEX*16 AF( * ) , B( * ) , E( * ) , WORK( * )
016 DOUBLE PRECISION D( * )
017 * ..
018
019 * Purpose
020 * === ====
021
022 * PZPTTRSV solves a tridiagonal triangular system of linear equations
023
024 * A(1 : N , JA : JA + N - 1) * X = B(IB : IB + N - 1 , 1 : NRHS)
025 * or
026 * A(1 : N , JA : JA + N - 1)^H * X = B(IB : IB + N - 1 , 1 : NRHS)
027
028 * where A(1 : N , JA : JA + N - 1) is a tridiagonal
029 * triangular matrix factor produced by the
030 * Cholesky factorization code PZPTTRF
031 * and is stored in A(1 : N , JA : JA + N - 1) and AF.
032 * The matrix stored in A(1 : N , JA : JA + N - 1) is either
033 * upper or lower triangular according to UPLO ,
034 * and the choice of solving A(1 : N , JA : JA + N - 1) or A(1 : N , JA : JA + N - 1)^H
035 * is dictated by the user by the parameter TRANS.
036
037 * Routine PZPTTRF MUST be called first.
038
039 * === ==================================================================
040
041 * Arguments
042 * === ======
043
044 * UPLO(global input) CHARACTER
045 * = 'U' : Upper triangle of A(1 : N , JA : JA + N - 1) is stored ;
046 * = 'L' : Lower triangle of A(1 : N , JA : JA + N - 1) is stored.
047
048 * TRANS(global input) CHARACTER
049 * = 'N' : Solve with A(1 : N , JA : JA + N - 1) ;
050 * = 'C' : Solve with conjugate_transpose( A(1 : N , JA : JA + N - 1) ) ;
051
052 * N(global input) INTEGER
053 * The number of rows and columns to be operated on , i.e. the
054 * order of the distributed submatrix A(1 : N , JA : JA + N - 1). N >= 0.
055
056 * NRHS(global input) INTEGER
057 * The number of right hand sides , i.e. , the number of columns
058 * of the distributed submatrix B(IB : IB + N - 1 , 1 : NRHS).
059 * NRHS >= 0.
060
061 * D(local input / local output) COMPLEX*16 pointer to local
062 * part of global vector storing the main diagonal of the
063 * matrix.
064 * On exit , this array contains information containing the
065 * factors of the matrix.
066 * Must be of size >= DESCA( NB_ ).
067
068 * E(local input / local output) COMPLEX*16 pointer to local
069 * part of global vector storing the upper diagonal of the
070 * matrix. Globally , DU(n) is not referenced , and DU must be
071 * aligned with D.
072 * On exit , this array contains information containing the
073 * factors of the matrix.
074 * Must be of size >= DESCA( NB_ ).
075
076 * JA(global input) INTEGER
077 * The index in the global array A that points to the start of
078 * the matrix to be operated on(which may be either all of A
079 * or a submatrix of A).
080
081 * DESCA(global and local input) INTEGER array of dimension DLEN.
082 * if 1D type(DTYPE_A = 501 or 502) , DLEN >= 7 ;
083 * if 2D type(DTYPE_A = 1) , DLEN >= 9.
084 * The array descriptor for the distributed matrix A.
085 * Contains information of mapping of A to memory. Please
086 * see NOTES below for full description and options.
087
088 * B(local input / local output) COMPLEX*16 pointer into
089 * local memory to an array of local lead dimension lld_b >= NB.
090 * On entry , this array contains the
091 * the local pieces of the right hand sides
092 * B(IB : IB + N - 1 , 1 : NRHS).
093 * On exit , this contains the local piece of the solutions
094 * distributed matrix X.
095
096 * IB(global input) INTEGER
097 * The row index in the global array B that points to the first
098 * row of the matrix to be operated on(which may be either
099 * all of B or a submatrix of B).
100
101 * DESCB(global and local input) INTEGER array of dimension DLEN.
102 * if 1D type(DTYPE_B = 502) , DLEN >= 7 ;
103 * if 2D type(DTYPE_B = 1) , DLEN >= 9.
104 * The array descriptor for the distributed matrix B.
105 * Contains information of mapping of B to memory. Please
106 * see NOTES below for full description and options.
107
108 * AF(local output) COMPLEX*16 array , dimension LAF.
109 * Auxiliary Fillin Space.
110 * Fillin is created during the factorization routine
111 * PZPTTRF and this is stored in AF. If a linear system
112 * is to be solved using PZPTTRS after the factorization
113 * routine , AF *must not be altered* after the factorization.
114
115 * LAF(local input) INTEGER
116 * Size of user - input Auxiliary Fillin space AF. Must be >=
117 * (NB + 2)
118 * If LAF is not large enough , an error code will be returned
119 * and the minimum acceptable size will be returned in AF( 1 )
120
121 * WORK(local workspace / local output)
122 * COMPLEX*16 temporary workspace. This space may
123 * be overwritten in between calls to routines. WORK must be
124 * the size given in LWORK.
125 * On exit , WORK( 1 ) contains the minimal LWORK.
126
127 * LWORK(local input or global input) INTEGER
128 * Size of user - input workspace WORK.
129 * If LWORK is too small , the minimal acceptable size will be
130 * returned in WORK(1) and an error code is returned. LWORK >=
131 * (10 + 2*min(100 , NRHS))*NPCOL + 4*NRHS
132
133 * INFO(local output) INTEGER
134 * = 0 : successful exit
135 * < 0 : If the i - th argument is an array and the j - entry had
136 * an illegal value , then INFO = - (i*100 + j) , if the i - th
137 * argument is a scalar and had an illegal value , then
138 * INFO = - i.
139
140 * === ==================================================================
141
142 * Restrictions
143 * === =========
144
145 * The following are restrictions on the input parameters. Some of these
146 * are temporary and will be removed in future releases , while others
147 * may reflect fundamental technical limitations.
148
149 * Non - cyclic restriction : VERY IMPORTANT !
150 * P*NB >= mod(JA - 1 , NB) + N.
151 * The mapping for matrices must be blocked , reflecting the nature
152 * of the divide and conquer algorithm as a task - parallel algorithm.
153 * This formula in words is : no processor may have more than one
154 * chunk of the matrix.
155
156 * Blocksize cannot be too small :
157 * If the matrix spans more than one processor , the following
158 * restriction on NB , the size of each block on each processor ,
159 * must hold :
160 * NB >= 2
161 * The bulk of parallel computation is done on the matrix of size
162 * O(NB) on each processor. If this is too small , divide and conquer
163 * is a poor choice of algorithm.
164
165 * Submatrix reference :
166 * JA = IB
167 * Alignment restriction that prevents unnecessary communication.
168
169 * === ==================================================================
170
171 * Notes
172 * === ==
173
174 * If the factorization routine and the solve routine are to be called
175 * separately(to solve various sets of righthand sides using the same
176 * coefficient matrix) , the auxiliary space AF *must not be altered*
177 * between calls to the factorization routine and the solve routine.
178
179 * The best algorithm for solving banded and tridiagonal linear systems
180 * depends on a variety of parameters , especially the bandwidth.
181 * Currently , only algorithms designed for the case N / P >> bw are
182 * implemented. These go by many names , including Divide and Conquer ,
183 * Partitioning , domain decomposition - type , etc.
184 * For tridiagonal matrices , it is obvious : N / P >> bw( = 1) , and so D&C
185 * algorithms are the appropriate choice.
186
187 * Algorithm description : Divide and Conquer
188
189 * The Divide and Conqer algorithm assumes the matrix is narrowly
190 * banded compared with the number of equations. In this situation ,
191 * it is best to distribute the input matrix A one - dimensionally ,
192 * with columns atomic and rows divided amongst the processes.
193 * The basic algorithm divides the tridiagonal matrix up into
194 * P pieces with one stored on each processor ,
195 * and then proceeds in 2 phases for the factorization or 3 for the
196 * solution of a linear system.
197 * 1) Local Phase :
198 * The individual pieces are factored independently and in
199 * parallel. These factors are applied to the matrix creating
200 * fillin , which is stored in a non - inspectable way in auxiliary
201 * space AF. Mathematically , this is equivalent to reordering
202 * the matrix A as P A P^T and then factoring the principal
203 * leading submatrix of size equal to the sum of the sizes of
204 * the matrices factored on each processor. The factors of
205 * these submatrices overwrite the corresponding parts of A
206 * in memory.
207 * 2) Reduced System Phase :
208 * A small((P - 1)) system is formed representing
209 * interaction of the larger blocks , and is stored(as are its
210 * factors) in the space AF. A parallel Block Cyclic Reduction
211 * algorithm is used. For a linear system , a parallel front solve
212 * followed by an analagous backsolve , both using the structure
213 * of the factored matrix , are performed.
214 * 3) Backsubsitution Phase :
215 * For a linear system , a local backsubstitution is performed on
216 * each processor in parallel.
217
218 * Descriptors
219 * === ========
220
221 * Descriptors now have *types* and differ from ScaLAPACK 1.0.
222
223 * Note : tridiagonal codes can use either the old two dimensional
224 * or new one - dimensional descriptors , though the processor grid in
225 * both cases *must be one - dimensional*. We describe both types below.
226
227 * Each global data object is described by an associated description
228 * vector. This vector stores the information required to establish
229 * the mapping between an object element and its corresponding process
230 * and memory location.
231
232 * Let A be a generic term for any 2D block cyclicly distributed array.
233 * Such a global array has an associated description vector DESCA.
234 * In the following comments , the character _ should be read as
235 * "of the global array".
236
237 * NOTATION STORED IN EXPLANATION
238 * --- ------------ -------------- --------------------------------------
239 * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case ,
240 * DTYPE_A = 1.
241 * CTXT_A(global) DESCA( CTXT_ ) The BLACS context handle , indicating
242 * the BLACS process grid A is distribu -
243 * ted over. The context itself is glo -
244 * bal , but the handle(the integer
245 * value) may vary.
246 * M_A(global) DESCA( M_ ) The number of rows in the global
247 * array A.
248 * N_A(global) DESCA( N_ ) The number of columns in the global
249 * array A.
250 * MB_A(global) DESCA( MB_ ) The blocking factor used to distribute
251 * the rows of the array.
252 * NB_A(global) DESCA( NB_ ) The blocking factor used to distribute
253 * the columns of the array.
254 * RSRC_A(global) DESCA( RSRC_ ) The process row over which the first
255 * row of the array A is distributed.
256 * CSRC_A(global) DESCA( CSRC_ ) The process column over which the
257 * first column of the array A is
258 * distributed.
259 * LLD_A(local) DESCA( LLD_ ) The leading dimension of the local
260 * array. LLD_A >= MAX(1 , LOCr(M_A)).
261
262 * Let K be the number of rows or columns of a distributed matrix ,
263 * and assume that its process grid has dimension p x q.
264 * LOCr( K ) denotes the number of elements of K that a process
265 * would receive if K were distributed over the p processes of its
266 * process column.
267 * Similarly , LOCc( K ) denotes the number of elements of K that a
268 * process would receive if K were distributed over the q processes of
269 * its process row.
270 * The values of LOCr() and LOCc() may be determined via a call to the
271 * ScaLAPACK tool function , NUMROC :
272 * LOCr( M ) = NUMROC( M , MB_A , MYROW , RSRC_A , NPROW ) ,
273 * LOCc( N ) = NUMROC( N , NB_A , MYCOL , CSRC_A , NPCOL ).
274 * An upper bound for these quantities may be computed by :
275 * LOCr( M ) <= ceil( ceil(M / MB_A) / NPROW )*MB_A
276 * LOCc( N ) <= ceil( ceil(N / NB_A) / NPCOL )*NB_A
277
278 * One - dimensional descriptors :
279
280 * One - dimensional descriptors are a new addition to ScaLAPACK since
281 * version 1.0. They simplify and shorten the descriptor for 1D
282 * arrays.
283
284 * Since ScaLAPACK supports two - dimensional arrays as the fundamental
285 * object , we allow 1D arrays to be distributed either over the
286 * first dimension of the array(as if the grid were P - by - 1) or the
287 * 2nd dimension(as if the grid were 1 - by - P). This choice is
288 * indicated by the descriptor type(501 or 502)
289 * as described below.
290 * However , for tridiagonal matrices , since the objects being
291 * distributed are the individual vectors storing the diagonals , we
292 * have adopted the convention that both the P - by - 1 descriptor and
293 * the 1 - by - P descriptor are allowed and are equivalent for
294 * tridiagonal matrices. Thus , for tridiagonal matrices ,
295 * DTYPE_A = 501 or 502 can be used interchangeably
296 * without any other change.
297 * We require that the distributed vectors storing the diagonals of a
298 * tridiagonal matrix be aligned with each other. Because of this , a
299 * single descriptor , DESCA , serves to describe the distribution of
300 * of all diagonals simultaneously.
301
302 * IMPORTANT NOTE : the actual BLACS grid represented by the
303 * CTXT entry in the descriptor may be *either* P - by - 1 or 1 - by - P
304 * irrespective of which one - dimensional descriptor type
305 * (501 or 502) is input.
306 * This routine will interpret the grid properly either way.
307 * ScaLAPACK routines *do not support intercontext operations* so that
308 * the grid passed to a single ScaLAPACK routine *must be the same*
309 * for all array descriptors passed to that routine.
310
311 * NOTE : In all cases where 1D descriptors are used , 2D descriptors
312 * may also be used , since a one - dimensional array is a special case
313 * of a two - dimensional array with one dimension of size unity.
314 * The two - dimensional array used in this case *must* be of the
315 * proper orientation :
316 * If the appropriate one - dimensional descriptor is DTYPEA = 501
317 * (1 by P type) , then the two dimensional descriptor must
318 * have a CTXT value that refers to a 1 by P BLACS grid ;
319 * If the appropriate one - dimensional descriptor is DTYPEA = 502
320 * (P by 1 type) , then the two dimensional descriptor must
321 * have a CTXT value that refers to a P by 1 BLACS grid.
322
323 * Summary of allowed descriptors , types , and BLACS grids :
324 * DTYPE 501 502 1 1
325 * BLACS grid 1xP or Px1 1xP or Px1 1xP Px1
326 * --- --------------------------------------------------
327 * A OK OK OK NO
328 * B NO OK NO OK
329
330 * Note that a consequence of this chart is that it is not possible
331 * for *both* DTYPE_A and DTYPE_B to be 2D_type(1) , as these lead
332 * to opposite requirements for the orientation of the BLACS grid ,
333 * and as noted before , the *same* BLACS context must be used in
334 * all descriptors in a single ScaLAPACK subroutine call.
335
336 * Let A be a generic term for any 1D block cyclicly distributed array.
337 * Such a global array has an associated description vector DESCA.
338 * In the following comments , the character _ should be read as
339 * "of the global array".
340
341 * NOTATION STORED IN EXPLANATION
342 * --- ------------ ---------- ------------------------------------------
343 * DTYPE_A(global) DESCA( 1 ) The descriptor type. For 1D grids ,
344 * TYPE_A = 501 : 1 - by - P grid.
345 * TYPE_A = 502 : P - by - 1 grid.
346 * CTXT_A(global) DESCA( 2 ) The BLACS context handle , indicating
347 * the BLACS process grid A is distribu -
348 * ted over. The context itself is glo -
349 * bal , but the handle(the integer
350 * value) may vary.
351 * N_A(global) DESCA( 3 ) The size of the array dimension being
352 * distributed.
353 * NB_A(global) DESCA( 4 ) The blocking factor used to distribute
354 * the distributed dimension of the array.
355 * SRC_A(global) DESCA( 5 ) The process row or column over which the
356 * first row or column of the array
357 * is distributed.
358 * Ignored DESCA( 6 ) Ignored for tridiagonal matrices.
359 * Reserved DESCA( 7 ) Reserved for future use.
360
361 * === ==================================================================
362
363 * Code Developer : Andrew J. Cleary , University of Tennessee.
364 * Current address : Lawrence Livermore National Labs.
365 * This version released : August , 2001.
366
367 * === ==================================================================
368
369 * ..
370 * .. Parameters ..
371 DOUBLE PRECISION ONE , ZERO
372 PARAMETER( ONE = 1.0D + 0 )
373 PARAMETER( ZERO = 0.0D + 0 )
374 COMPLEX*16 CONE , CZERO
375 PARAMETER( CONE =( 1.0D + 0 , 0.0D + 0 ) )
376 PARAMETER( CZERO =( 0.0D + 0 , 0.0D + 0 ) )
377 INTEGER INT_ONE
378 PARAMETER( INT_ONE = 1 )
379 INTEGER DESCMULT , BIGNUM
380 PARAMETER(DESCMULT = 100 , BIGNUM = DESCMULT * DESCMULT)
381 INTEGER BLOCK_CYCLIC_2D , CSRC_ , CTXT_ , DLEN_ , DTYPE_ ,
382 $LLD_ , MB_ , M_ , NB_ , N_ , RSRC_
383 PARAMETER( BLOCK_CYCLIC_2D = 1 , DLEN_ = 9 , DTYPE_ = 1 ,
384 $CTXT_ = 2 , M_ = 3 , N_ = 4 , MB_ = 5 , NB_ = 6 ,
385 $RSRC_ = 7 , CSRC_ = 8 , LLD_ = 9 )
386 IF( MYCOL .NE. 0 ) THEN
387 * Use the "spike" fillin to calculate contribution to previous
388 * processor's righthand - side.
389
389
390 CALL ZGEMM( 'T' , 'N' , 1 , NRHS , ODD_SIZE , - CONE , AF( 1 ) ,
391 $ ODD_SIZE , B( PART_OFFSET + 1 ) , LLDB , CZERO ,
392 $ WORK( 1 + INT_ONE - 1 ) , INT_ONE )
393 ENDIF
394
395 * ***********************************************
396 * Formation and solution of reduced system
397 * ***********************************************
398
399 * Send modifications to prior processor's right hand sides
400
401 IF( MYCOL .GT. 0) THEN
402
402
403 CALL ZGESD2D( ICTXT , INT_ONE , NRHS ,
404 $ WORK( 1 ) , INT_ONE ,
405 $ 0 , MYCOL - 1 )
406
407 ENDIF
408
409 * Receive modifications to processor's right hand sides
410
411 IF( MYCOL .LT. NPCOL - 1) THEN
412
412
413 CALL ZGERV2D( ICTXT , INT_ONE , NRHS ,
414 $ WORK( 1 ) , INT_ONE ,
415 $ 0 , MYCOL + 1 )
416
417 * Combine contribution to locally stored right hand sides
418
419 CALL ZMATADD( INT_ONE , NRHS , CONE ,
420 $ WORK( 1 ) , INT_ONE , CONE ,
421 $ B( PART_OFFSET + ODD_SIZE + 1 ) , LLDB )
422
423 ENDIF
424
425 * The last processor does not participate in the solution of the
426 * reduced system , having sent its contribution already.
427 IF( MYCOL .EQ. NPCOL - 1 ) THEN
427
428 GOTO 44
429 ENDIF
430
431 * *************************************
432 * Modification Loop
433
434 * The distance for sending and receiving for each level starts
435 * at 1 for the first level.
436 LEVEL_DIST = 1
437
438 * Do until this proc is needed to modify other procs' equations
439
440 42 CONTINUE
441 IF( MOD((MYCOL + 1) / LEVEL_DIST , 2) .NE. 0 ) GOTO 41
442
443 * Receive and add contribution to righthand sides from left
444
445 IF( MYCOL - LEVEL_DIST .GE. 0 ) THEN
446
446
447 CALL ZGERV2D( ICTXT , INT_ONE , NRHS ,
448 $ WORK( 1 ) ,
449 $ INT_ONE , 0 , MYCOL - LEVEL_DIST )
450
451 CALL ZMATADD( INT_ONE , NRHS , CONE ,
452 $ WORK( 1 ) , INT_ONE , CONE ,
453 $ B( PART_OFFSET + ODD_SIZE + 1 ) , LLDB )
454
455 ENDIF
456
457 * Receive and add contribution to righthand sides from right
458
459 IF( MYCOL + LEVEL_DIST .LT. NPCOL - 1 ) THEN
460
460
461 CALL ZGERV2D( ICTXT , INT_ONE , NRHS ,
462 $ WORK( 1 ) ,
463 $ INT_ONE , 0 , MYCOL + LEVEL_DIST )
464
465 CALL ZMATADD( INT_ONE , NRHS , CONE ,
466 $ WORK( 1 ) , INT_ONE , CONE ,
467 $ B( PART_OFFSET + ODD_SIZE + 1 ) , LLDB )
468
469 ENDIF
470
471 LEVEL_DIST = LEVEL_DIST*2
472
473 GOTO 42
474 41 CONTINUE
475 * [End of GOTO Loop]
476
477 * *********************************
478 * Calculate and use this proc's blocks to modify other procs
479
480 * Solve with diagonal block
481
482 CALL ZTRTRS( 'L' , 'N' , 'U' , INT_ONE , NRHS , AF( ODD_SIZE + 2 ) ,
483 $INT_ONE , B( PART_OFFSET + ODD_SIZE + 1 ) , LLDB , INFO )
484
485 IF( INFO.NE.0 ) THEN
485
486 GO TO 1000
487 ENDIF
488
489 * *********
490 IF( MYCOL / LEVEL_DIST .LE.(NPCOL - 1) / LEVEL_DIST - 2 )THEN
491
492 * Calculate contribution from this block to next diagonal block
493
493
494 CALL ZGEMM( 'T' , 'N' , INT_ONE , NRHS , INT_ONE , - CONE ,
495 $ AF((ODD_SIZE)*1 + 1 ) ,
496 $ INT_ONE ,
497 $ B( PART_OFFSET + ODD_SIZE + 1 ) ,
498 $ LLDB , CZERO ,
499 $ WORK( 1 ) ,
500 $ INT_ONE )
501
502 * Send contribution to diagonal block's owning processor.
503
504 CALL ZGESD2D( ICTXT , INT_ONE , NRHS ,
505 $ WORK( 1 ) ,
506 $ INT_ONE , 0 , MYCOL + LEVEL_DIST )
507
508 ENDIF
509 * End of "if( mycol/level_dist .le.(npcol-1)/level_dist-2 )..."
510
511 * ************
512 IF((MYCOL / LEVEL_DIST .GT. 0 ).AND.
513 $( MYCOL / LEVEL_DIST .LE.(NPCOL - 1) / LEVEL_DIST - 1 ) ) THEN
514
515 * Use offdiagonal block to calculate modification to diag block
516 * of processor to the left
517
518 CALL ZGEMM( 'C' , 'N' , INT_ONE , NRHS , INT_ONE , - CONE ,
519 $AF( ODD_SIZE*1 + 2 + 1 ) ,
520 $INT_ONE ,
521 $B( PART_OFFSET + ODD_SIZE + 1 ) ,
522 $LLDB , CZERO ,
523 $WORK( 1 ) ,
524 $INT_ONE )
525
526 * Send contribution to diagonal block's owning processor.
527
528 CALL ZGESD2D( ICTXT , INT_ONE , NRHS ,
529 $WORK( 1 ) ,
530 $INT_ONE , 0 , MYCOL - LEVEL_DIST )
531
532 ENDIF
533 * End of "if( mycol/level_dist.le.(npcol-1)/level_dist -1 )..."
534
535 44 CONTINUE
536
537 ELSE
538
539 * ******************* BACKSOLVE *************************************
540
541 * *******************************************************************
542 * .. Begin reduced system phase of algorithm ..
543 * *******************************************************************
544
545 * The last processor does not participate in the solution of the
546 * reduced system and just waits to receive its solution.
546
547 IF( MYCOL .EQ. NPCOL - 1 ) THEN
547
548 GOTO 54
549 ENDIF
550
551 * Determine number of steps in tree loop
552
553 LEVEL_DIST = 1
554 57 CONTINUE
555 IF( MOD((MYCOL + 1) / LEVEL_DIST , 2) .NE. 0 ) GOTO 56
556
557 LEVEL_DIST = LEVEL_DIST*2
558
559 GOTO 57
560 56 CONTINUE
561
562 IF((MYCOL / LEVEL_DIST .GT. 0 ).AND.
563 $( MYCOL / LEVEL_DIST .LE.(NPCOL - 1) / LEVEL_DIST - 1 ) ) THEN
564
565 * Receive solution from processor to left
566
567 CALL ZGERV2D( ICTXT , INT_ONE , NRHS ,
568 $WORK( 1 ) ,
569 $INT_ONE , 0 , MYCOL - LEVEL_DIST )
570
571 * Use offdiagonal block to calculate modification to RHS stored
572 * on this processor
573
574 CALL ZGEMM( 'T' , 'N' , INT_ONE , NRHS , INT_ONE , - CONE ,
575 $AF( ODD_SIZE*1 + 2 + 1 ) ,
576 $INT_ONE ,
577 $WORK( 1 ) ,
578 $INT_ONE , CONE ,
579 $B( PART_OFFSET + ODD_SIZE + 1 ) ,
580 $LLDB )
581 ENDIF
582 * End of "if( mycol/level_dist.le.(npcol-1)/level_dist -1 )..."
583
584 IF( MYCOL / LEVEL_DIST .LE.(NPCOL - 1) / LEVEL_DIST - 2 )THEN
585
586 * Receive solution from processor to right
587
587
588 CALL ZGERV2D( ICTXT , INT_ONE , NRHS ,
589 $ WORK( 1 ) ,
590 $ INT_ONE , 0 , MYCOL + LEVEL_DIST )
591
592 * Calculate contribution from this block to next diagonal block
593
594 CALL ZGEMM( 'C' , 'N' , INT_ONE , NRHS , INT_ONE , - CONE ,
595 $ AF((ODD_SIZE)*1 + 1 ) ,
596 $ INT_ONE ,
597 $ WORK( 1 ) ,
598 $ INT_ONE , CONE ,
599 $ B( PART_OFFSET + ODD_SIZE + 1 ) ,
600 $ LLDB )
601
602 ENDIF
603 * End of "if( mycol/level_dist .le.(npcol-1)/level_dist-2 )..."
604
605 * Solve with diagonal block
606
607 CALL ZTRTRS( 'L' , 'C' , 'U' , INT_ONE , NRHS , AF( ODD_SIZE + 2 ) ,
608 $INT_ONE , B( PART_OFFSET + ODD_SIZE + 1 ) , LLDB , INFO )
609
610 IF( INFO.NE.0 ) THEN
610
611 GO TO 1000
612 ENDIF
613
614 * **Modification Loop *******
615
616 52 CONTINUE
617 IF( LEVEL_DIST .EQ. 1 ) GOTO 51
618
619 LEVEL_DIST = LEVEL_DIST / 2
620
621 * Send solution to the right
622
623 IF( MYCOL + LEVEL_DIST .LT. NPCOL - 1 ) THEN
624
624
625 CALL ZGESD2D( ICTXT , INT_ONE , NRHS ,
626 $ B( PART_OFFSET + ODD_SIZE + 1 ) ,
627 $ LLDB , 0 , MYCOL + LEVEL_DIST )
628
629 ENDIF
630
631 * Send solution to left
632
633 IF( MYCOL - LEVEL_DIST .GE. 0 ) THEN
634
634
635 CALL ZGESD2D( ICTXT , INT_ONE , NRHS ,
636 $ B( PART_OFFSET + ODD_SIZE + 1 ) ,
637 $ LLDB , 0 , MYCOL - LEVEL_DIST )
638
639 ENDIF
640
641 GOTO 52
642 51 CONTINUE
643 * [End of GOTO Loop]
644
645 54 CONTINUE
646 * [Processor npcol - 1 jumped to here to await next stage]
647
648 * ******************************
649 * Reduced system has been solved , communicate solutions to nearest
650 * neighbors in preparation for local computation phase.
651
652 * Send elements of solution to next proc
653
654 IF( MYCOL .LT. NPCOL - 1) THEN
655
655
656 CALL ZGESD2D( ICTXT , INT_ONE , NRHS ,
657 $ B( PART_OFFSET + ODD_SIZE + 1 ) , LLDB ,
658 $ 0 , MYCOL + 1 )
659
660 ENDIF
661
662 * Receive modifications to processor's right hand sides
663
664 IF( MYCOL .GT. 0) THEN
665
665
666 CALL ZGERV2D( ICTXT , INT_ONE , NRHS ,
667 $ WORK( 1 ) , INT_ONE ,
668 $ 0 , MYCOL - 1 )
669
670 ENDIF
671
672 * *********************************************
673 * Local computation phase
674 * *********************************************
675
676 IF( MYCOL .NE. 0 ) THEN
677 * Use the "spike" fillin to calculate contribution from previous
678 * processor's solution.
679
679
680 CALL ZGEMM( 'C' , 'N' , ODD_SIZE , NRHS , 1 , - CONE , AF( 1 ) ,
681 $ INT_ONE , WORK( 1 ) , INT_ONE , CONE ,
682 $ B( PART_OFFSET + 1 ) , LLDB )
683
684 ENDIF
685
686 IF( MYCOL .LT. NP - 1 ) THEN
687 * Use factorization of odd - even connection block to modify
688 * locally stored portion of right hand side(s)
689
689
690 CALL ZAXPY( NRHS , - E( PART_OFFSET + ODD_SIZE ) ,
691 $ B( PART_OFFSET + ODD_SIZE + 1 ) , LLDB ,
692 $ B( PART_OFFSET + ODD_SIZE ) , LLDB )
693
694 ENDIF
695
696 * Use main partition in each processor to solve locally
697
698 CALL ZPTTRSV ( UPLO , 'N' , ODD_SIZE , NRHS , D( PART_OFFSET + 1 ) ,
699 $E( PART_OFFSET + 1 ) , B( PART_OFFSET + 1 ) , LLDB ,
700 $INFO )
701
702 ENDIF
703 * End of "IF( LSAME( TRANS, 'N' ) )"...
704
705 ENDIF
706 * End of "IF( LSAME( UPLO, 'L' ) )"...
707 1000 CONTINUE
708
709 * Free BLACS space used to hold standard - form grid.
710
711 IF( ICTXT_SAVE .NE. ICTXT_NEW ) THEN
711
712 CALL BLACS_GRIDEXIT( ICTXT_NEW )
713 ENDIF
714
715 1234 CONTINUE
716
717 * Restore saved input parameters
718
719 ICTXT = ICTXT_SAVE
720 NP = NP_SAVE
721
722 * Output minimum worksize
723
724 WORK( 1 ) = WORK_SIZE_MIN
725
726 RETURN
727
728 * End of PZPTTRSV
729
730 END168
19
|
|
Variables in Routine PZPTTRSV()
| Summary Report |
| Data Type | Quantity | Size(byte) |
| CHARACTER | 2 | 2 |
| COMPLEX*16 | 6 | ? |
| DOUBLE PRECISION | 3 | 12 |
| INTEGER | 26 | 104 |
| TOTAL | 37 | 118 |
List of Variables
CHARACTER
COMPLEX*16
| AF( * ) | B( * ) | CONE | CZERO | E( * ) |
| WORK( * ) | | | | |
DOUBLE PRECISION
INTEGER
| BIGNUM | BLOCK_CYCLIC_2D | CSRC_ | CTXT_ | DESCA( * ) |
| DESCB( * ) | DESCMULT | DLEN_ | DTYPE_ | IB |
| ICTXT | INFO | INT_ONE | JA | LAF |
| LEVEL_DIST | LLD_ | LWORK | M_ | MB_ |
| N | N_ | NB_ | NP | NRHS |
| RSRC_ | | | | |
Variables Dependence Graph Put the mouse over a right hand side variable to display the corresponding line of the dependence | | - | | - | - | | LEVEL_DIST | <--- | LEVEL_DISTLEVEL_DIST = LEVEL_DIST*2{2LEVEL_DIST = LEVEL_DIST*2, 3LEVEL_DIST = LEVEL_DIST/2} |
|
|
Analysis elements of the routine PZPTTRSV() Put the mouse over each element to display detailed matching information
Assigned variables |
| | | BIGNUM , BLOCK_CYCLIC_2D , CONE , CSRC_ , CTXT_ , CZERO , DESCMULT , DLEN_ , DTYPE_ , ICTXT , INFO , INT_ONE , JA , LEVEL_DIST , LLD_ , LWORK , M_ , MB_ , N , N_ , NB_ , NP , NRHS , ONE , RSRC_ , WORK , ZERO |
|
Active variables |
| | | AF , B , BIGNUM , BLOCK_CYCLIC_2D , CONE , CSRC_ , CTXT_ , CZERO , D , DESCA , DESCB , DESCMULT , DLEN_ , DTYPE_ , E , IB , ICTXT , INFO , INT_ONE , JA , LAF , LEVEL_DIST , LLD_ , LWORK , M_ , MB_ , N , N_ , NB_ , NP , NRHS , one , RSRC_ , TRANS , UPLO , WORK , ZERO |
|
Allocated variables [ statement : associated variable ] |
| | new | : a, or |
|
Desallocated variables [ statement : associated variable ] |
| | free | : BLACS |
|
Accessed arrays [ array name : associated index ] |
| | AF | : (ODD_SIZE)*1+1 , (ODD_SIZE)*1+1 , * , 1 , 1 , 1 , ODD_SIZE*1+2+1 , ODD_SIZE*1+2+1 , ODD_SIZE+2 , ODD_SIZE+2 |
| | B | : * , IB:IB+N-1, 1:NRHS , IB:IB+N-1, 1:NRHS , IB:IB+N-1, 1:NRHS , IB:IB+N-1, 1:NRHS , PART_OFFSET+1 , PART_OFFSET+1 , PART_OFFSET+1 , PART_OFFSET+ODD_SIZE + 1 , PART_OFFSET+ODD_SIZE + 1 , PART_OFFSET+ODD_SIZE + 1 , PART_OFFSET+ODD_SIZE , PART_OFFSET+ODD_SIZE+1 , PART_OFFSET+ODD_SIZE+1 , PART_OFFSET+ODD_SIZE+1 , PART_OFFSET+ODD_SIZE+1 , PART_OFFSET+ODD_SIZE+1 , PART_OFFSET+ODD_SIZE+1 , PART_OFFSET+ODD_SIZE+1 , PART_OFFSET+ODD_SIZE+1 , PART_OFFSET+ODD_SIZE+1 , PART_OFFSET+ODD_SIZE+1 |
| | D | : * , PART_OFFSET+1 |
| | DESCA | : * , 1 , 2 , 3 , 4 , 5 , 6 , 7 , CSRC_ , CTXT_ , DTYPE_ , LLD_ , M_ , MB_ , N_ , NB_ , NB_ , NB_ , RSRC_ |
| | DESCB | : * |
| | E | : * , PART_OFFSET+1 , PART_OFFSET+ODD_SIZE |
| | WORK | : * , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1+INT_ONE-1 |
|
Conditional statements [ statement : associated predicate ] |
| | do | : ( not support intercontext operations* so that ) , ( until this proc is needed to modify other procs' equations ) |
| | for | : ( the distributed matrix A. ) , ( full description and options. ) , ( the distributed matrix B. ) , ( full description and options. ) , ( matrices must be blocked , reflecting the nature ) , ( solving banded and tridiagonal linear systems ) , ( the case N / P >> bw are ) , ( tridiagonal matrices , it is obvious : N / P >> bw( = 1) , and so D&C ) , ( the factorization or 3 for the ) , ( the ) , ( a linear system , a parallel front solve ) , ( a linear system , a local backsubstitution is performed on ) , ( any 2D block cyclicly distributed array. ) , ( these quantities may be computed by : ) , ( 1D ) , ( tridiagonal matrices , since the objects being ) , ( tridiagonal matrices , ) , ( all array descriptors passed to that routine. ) , ( *both* DTYPE_A and DTYPE_B to be 2D_type(1) , as these lead ) , ( the orientation of the BLACS grid , ) , ( any 1D block cyclicly distributed array. ) , ( 1D grids , ) , ( tridiagonal matrices. ) , ( future use. ) , ( sending and receiving for each level starts ) , ( each level starts ) , ( the first level. ) , ( local computation phase. ) |
| | if | : ( 1D type (DTYPE_A = 501 or 502) , DLEN >= 7 ; ) , ( 2D type (DTYPE_A = 1) , DLEN >= 9. ) , ( 1D type (DTYPE_B = 502) , DLEN >= 7 ; ) , ( 2D type (DTYPE_B = 1) , DLEN >= 9. ) , ( a linear system ) , ( LAF is not large enough , an error code will be returned ) , ( LWORK is too small , the minimal acceptable size will be ) , ( the i-th argument is an array and the j - entry had ) , ( the i-th ) , ( the matrix spans more than one processor , the following ) , ( this is too small , divide and conquer ) , ( the factorization routine and the solve routine are to be called ) , ( K were distributed over the p processes of its ) , ( K were distributed over the q processes of ) , ( the grid were P - by - 1) or the ) , ( the grid were 1 - by - P). This choice is ) , ( the appropriate one - dimensional descriptor is DTYPEA = 501 ) , ( the appropriate one - dimensional descriptor is DTYPEA = 502 ) , ( MYCOL .NE. 0 ) , ( MYCOL .GT. 0 ) , ( MYCOL .LT. NPCOL - 1 ) , ( MYCOL .EQ. NPCOL - 1 ) , ( (MOD( (MYCOL + 1) / LEVEL_DIST , 2) .NE. 0 ) GOTO 41 ) , ( MYCOL-LEVEL_DIST .GE. 0 ) , ( MYCOL+LEVEL_DIST .LT. NPCOL - 1 ) , ( INFO.NE.0 ) , ( (MYCOL / LEVEL_DIST .LE. (NPCOL - 1) / LEVEL_DIST - 2 )THEN ) , ( ((MYCOL / LEVEL_DIST .GT. 0 ).AND. ) , ( MYCOL .EQ. NPCOL - 1 ) , ( (MOD( (MYCOL + 1) / LEVEL_DIST , 2) .NE. 0 ) GOTO 56 ) , ( ((MYCOL / LEVEL_DIST .GT. 0 ).AND. ) , ( (MYCOL / LEVEL_DIST .LE. (NPCOL - 1) / LEVEL_DIST - 2 )THEN ) , ( INFO.NE.0 ) , ( LEVEL_DIST .EQ. 1 ) , ( MYCOL+LEVEL_DIST .LT. NPCOL - 1 ) , ( MYCOL-LEVEL_DIST .GE. 0 ) , ( MYCOL .LT. NPCOL - 1 ) , ( MYCOL .GT. 0 ) , ( MYCOL .NE. 0 ) , ( MYCOL .LT. NP - 1 ) , ( ICTXT_SAVE .NE. ICTXT_NEW ) |
| | until | : ( this proc is needed to modify other procs' equations ) |
| | while | : ( others ) |
|
| List of variables | AF( * ) B( * ) BIGNUM BLOCK_CYCLIC_2D CONE CSRC_ CTXT_
| CZERO D( * ) DESCA( * ) DESCB( * ) DESCMULT DLEN_ DTYPE_ E( * )
| IB ICTXT INFO INT_ONE JA LAF LEVEL_DIST LLD_
| LWORK M_ MB_ N N_ NB_ NP NRHS
| ONE RSRC_ TRANS UPLO WORK( * ) ZERO | | close
| |
AF( * )
B( * )
BIGNUM
BLOCK_CYCLIC_2D
CONE
CSRC_
CTXT_
CZERO
D( * )
DESCA( * )
DESCB( * )
DESCMULT
DLEN_
DTYPE_
E( * )
IB
ICTXT
INFO
INT_ONE
JA
LAF
LEVEL_DIST
LLD_
LWORK
M_
MB_
N
N_
NB_
NP
NRHS
ONE
RSRC_
TRANS
UPLO
WORK( * )
ZERO
623
| |