|
|
| |
| # lines: |
1110 | | # code: |
1110 | | # comment: | 0 | |
# blank: | 0 |
| # Variables: | 30 |
| # Callers: | 1 |
| # Callings: | 0 |
| # Words: | 37 |
| # Keywords: | 18 |
|
|
|
|
|
..
.. Local Scalars ..
..
.. Local Arrays ..
..
.. External Subroutines ..
..
.. External Functions ..
..
.. Intrinsic Functions ..
..
.. Executable Statements ..
Test the input parameters
Convert descriptor into standard form for easy access to
parameters, check that grid is of right shape.
Get values out of descriptor for use in code.
Get grid parameters
Argument checking that is specific to Divide & Conquer routine
Check auxiliary storage size
put minimum value of laf into AF( 1 )
Check worksize
put minimum value of work into work( 1 )
Pack params and positions into arrays for global consistency check
Want to find errors with MIN( ), so if no error, set it to a big
number. If there already is an error, multiply by the the
descriptor multiplier.
Check consistency across processors
Prepare output: set info = 0 if no error, and divide by DESCMULT
if error is not in a descriptor entry.
Quick return if possible
Adjust addressing into matrix space to properly get into
the beginning part of the relevant data
Form a new BLACS grid (the "standard form" grid) with only procs
holding part of the matrix, of size 1xNP where NP is adjusted,
starting at csrc=0, with JA modified to reflect dropped procs.
First processor to hold part of the matrix:
Calculate new JA one while dropping off unused processors.
Save and compute new value of NP
Call utility routine that forms "standard-form" grid
Use new context from standard grid as context.
Get information about new grid.
Drop out processors that do not have part of the matrix.
********************************
Values reused throughout routine
User-input value of partition size
Number of columns in each processor
Offset in columns to beginning of main partition in each proc
Offset in elements
Size of main (or odd) partition in each processor
Zero out space for fillin
Begin main code
*******************************************************************
PHASE 1: Local computation phase.
*******************************************************************
Transfer triangle B_i of local matrix to next processor
for fillin. Overlap the send with the factorization of A_i.
The last processor does not need to send anything.
BIPTR = location of triangle B_i in memory
Factor main partition P_i A_i = L_i U_i on each processor
LBWL, LBWU: lower and upper bandwidth of local solver
Note that for MYCOL > 0 one has lower triangular blocks!
LM is the number of rows which is usually NB except for
MYCOL = 0 where it is BWU less and MYCOL=NPCOL-1 where it
is NR+BWU where NR is the number of columns on the last processor
Finally APTR is the pointer to the first element of A. As LAPACK
has a slightly different matrix format than Scalapack the pointer
has to be adjusted on processor MYCOL=0.
Update the last BW columns of A_i (code modified from ZGBTRS)
Only the eliminations of unknowns > LN-BW have an effect on
the last BW columns. Loop over them...
Element (L,LN+1) is swapped with element (J,LN+1) etc
Furthermore, the elements in the same row are LDB=LLDA-1 apart
The complicated formulas are to cope with the banded
data format:
LPTR is the pointer to the beginning of the
coefficients of L
Compute spike fill-in, L_i F_i = P_i B_{i-1}
Receive triangle B_{i-1} from previous processor
Permutation and forward elimination (triang. solve)
*******************************************************************
PHASE 2: Formation and factorization of Reduced System.
*******************************************************************
Define the initial dimensions of the diagonal blocks
The offdiagonal blocks (for MYCOL > 0) are of size BM by BW
Pointer to first element of block bidiagonal matrix in AF
Leading dimension of block bidiagonal system
Copy from A and AF into block bidiagonal matrix (tail of AF)
DBPTR = Pointer to diagonal blocks in A
Zero out any junk entries that were copied
ODPTR = Pointer to offdiagonal blocks in A
In this case the loop over the levels will not be
performed.
Loop over levels ... only occurs if npcol > 1
The two integers NPACT (nu. of active processors) and NPSTR
(stride between active processors) are used to control the
loop.
Begin loop over levels
Test if processor is active
Send/Receive blocks
This node will potentially do more work later
Last processor skips to next level
BM1 = M for 1st block on proc pair, BM2 2nd block
Copy diagonal block to align whole system
This node stops work after this stage -- an extra copy
is required to make the odd and even frontal matrices
look identical
Copy diagonal block to align whole system
LU factorization with partial pivoting
Backsolve left side
Use partial factors to update remainder
Backsolve right side
Use partial factors to update remainder
Test if processor is active in next round
Reset BM
Local copying in the block bidiagonal area
Zero out space that held original copy
Factor the final 2 by 2 block matrix
Last processor in an odd-sized NPACT skips to here
End loop over levels
If error was found in Phase 1, processors jump here.
Free BLACS space used to hold standard-form grid.
If this processor did not hold part of the grid it
jumps here.
Restore saved input parameters
Output worksize
|
|
|
|
001 SUBROUTINE PZGBTRF( N , BWL , BWU , A , JA , DESCA , IPIV , AF , LAF ,
002 $WORK , LWORK , INFO )
003
004 * -- ScaLAPACK routine(version 1.7) --
005 * University of Tennessee , Knoxville , Oak Ridge National Laboratory ,
006 * and University of California , Berkeley.
007 * November 15 , 1997
008
009 * .. Scalar Arguments ..
010 INTEGER BWL , BWU , INFO , JA , LAF , LWORK , N
011 * ..
012 * .. Array Arguments ..
013 INTEGER DESCA( * ) , IPIV( * )
014 COMPLEX*16 A( * ) , AF( * ) , WORK( * )
015 * ..
016
017 * Purpose
018 * === ====
019
020 * PZGBTRF computes a LU factorization
021 * of an N - by - N complex banded
022 * distributed matrix
023 * with bandwidth BWL , BWU : A(1 : N , JA : JA + N - 1).
024 * Reordering is used to increase parallelism in the factorization.
025 * This reordering results in factors that are DIFFERENT from those
026 * produced by equivalent sequential codes. These factors cannot
027 * be used directly by users ; however , they can be used in
028 * subsequent calls to PZGBTRS to solve linear systems.
029
030 * The factorization has the form
031
032 * P A(1 : N , JA : JA + N - 1) Q = L U
033
034 * where U is a banded upper triangular matrix and L is banded
035 * lower triangular , and P and Q are permutation matrices.
036 * The matrix Q represents reordering of columns
037 * for parallelism's sake , while P represents
038 * reordering of rows for numerical stability using
039 * classic partial pivoting.
040
041 * === ==================================================================
042
043 * Arguments
044 * === ======
045
046 * N(global input) INTEGER
047 * The number of rows and columns to be operated on , i.e. the
048 * order of the distributed submatrix A(1 : N , JA : JA + N - 1). N >= 0.
049
050 * BWL(global input) INTEGER
051 * Number of subdiagonals. 0 <= BWL <= N - 1
052
053 * BWU(global input) INTEGER
054 * Number of superdiagonals. 0 <= BWU <= N - 1
055
056 * A(local input / local output) COMPLEX*16 pointer into
057 * local memory to an array with first dimension
058 * LLD_A >=(2*bwl + 2*bwu + 1)(stored in DESCA).
059 * On entry , this array contains the local pieces of the
060 * N - by - N unsymmetric banded distributed matrix
061 * A(1 : N , JA : JA + N - 1) to be factored.
062 * This local portion is stored in the packed banded format
063 * used in LAPACK. Please see the Notes below and the
064 * ScaLAPACK manual for more detail on the format of
065 * distributed matrices.
066 * On exit , this array contains information containing details
067 * of the factorization.
068 * Note that permutations are performed on the matrix , so that
069 * the factors returned are different from those returned
070 * by LAPACK.
071
072 * JA(global input) INTEGER
073 * The index in the global array A that points to the start of
074 * the matrix to be operated on(which may be either all of A
075 * or a submatrix of A).
076
077 * DESCA(global and local input) INTEGER array of dimension DLEN.
078 * if 1D type(DTYPE_A = 501) , DLEN >= 7 ;
079 * if 2D type(DTYPE_A = 1) , DLEN >= 9 .
080 * The array descriptor for the distributed matrix A.
081 * Contains information of mapping of A to memory. Please
082 * see NOTES below for full description and options.
083
084 * IPIV(local output) INTEGER array , dimension >= DESCA( NB ).
085 * Pivot indices for local factorizations.
086 * Users *should not* alter the contents between
087 * factorization and solve.
088
089 * AF(local output) COMPLEX*16 array , dimension LAF.
090 * Auxiliary Fillin Space.
091 * Fillin is created during the factorization routine
092 * PZGBTRF and this is stored in AF. If a linear system
093 * is to be solved using PZGBTRS after the factorization
094 * routine , AF *must not be altered* after the factorization.
095
096 * LAF(local input) INTEGER
097 * Size of user - input Auxiliary Fillin space AF. Must be >=
098 * (NB + bwu)*(bwl + bwu) + 6*(bwl + bwu)*(bwl + 2*bwu)
099 * If LAF is not large enough , an error code will be returned
100 * and the minimum acceptable size will be returned in AF( 1 )
101
102 * WORK(local workspace / local output)
103 * COMPLEX*16 temporary workspace. This space may
104 * be overwritten in between calls to routines. WORK must be
105 * the size given in LWORK.
106 * On exit , WORK( 1 ) contains the minimal LWORK.
107
108 * LWORK(local input or global input) INTEGER
109 * Size of user - input workspace WORK.
110 * If LWORK is too small , the minimal acceptable size will be
111 * returned in WORK(1) and an error code is returned. LWORK >=
112 * 1
113
114 * INFO(global output) INTEGER
115 * = 0 : successful exit
116 * < 0 : If the i - th argument is an array and the j - entry had
117 * an illegal value , then INFO = - (i*100 + j) , if the i - th
118 * argument is a scalar and had an illegal value , then
119 * INFO = - i.
120 * > 0 : If INFO = K <= NPROCS , the submatrix stored on processor
121 * INFO and factored locally was not
122 * nonsingular , and
123 * the factorization was not completed.
124 * If INFO = K > NPROCS , the submatrix stored on processor
125 * INFO - NPROCS representing interactions with other
126 * processors was not
127 * nonsingular ,
128 * and the factorization was not completed.
129
130 * === ==================================================================
131
132 * Restrictions
133 * === =========
134
135 * The following are restrictions on the input parameters. Some of these
136 * are temporary and will be removed in future releases , while others
137 * may reflect fundamental technical limitations.
138
139 * Non - cyclic restriction : VERY IMPORTANT !
140 * P*NB >= mod(JA - 1 , NB) + N.
141 * The mapping for matrices must be blocked , reflecting the nature
142 * of the divide and conquer algorithm as a task - parallel algorithm.
143 * This formula in words is : no processor may have more than one
144 * chunk of the matrix.
145
146 * Blocksize cannot be too small :
147 * If the matrix spans more than one processor , the following
148 * restriction on NB , the size of each block on each processor ,
149 * must hold :
150 * NB >=(BWL + BWU) + 1
151 * The bulk of parallel computation is done on the matrix of size
152 * O(NB) on each processor. If this is too small , divide and conquer
153 * is a poor choice of algorithm.
154
155 * Submatrix reference :
156 * JA = IB
157 * Alignment restriction that prevents unnecessary communication.
158
159 * === ==================================================================
160
161 * Notes
162 * === ==
163
164 * If the factorization routine and the solve routine are to be called
165 * separately(to solve various sets of righthand sides using the same
166 * coefficient matrix) , the auxiliary space AF *must not be altered*
167 * between calls to the factorization routine and the solve routine.
168
169 * The best algorithm for solving banded and tridiagonal linear systems
170 * depends on a variety of parameters , especially the bandwidth.
171 * Currently , only algorithms designed for the case N / P >> bw are
172 * implemented. These go by many names , including Divide and Conquer ,
173 * Partitioning , domain decomposition - type , etc.
174
175 * Algorithm description : Divide and Conquer
176
177 * The Divide and Conqer algorithm assumes the matrix is narrowly
178 * banded compared with the number of equations. In this situation ,
179 * it is best to distribute the input matrix A one - dimensionally ,
180 * with columns atomic and rows divided amongst the processes.
181 * The basic algorithm divides the banded matrix up into
182 * P pieces with one stored on each processor ,
183 * and then proceeds in 2 phases for the factorization or 3 for the
184 * solution of a linear system.
185 * 1) Local Phase :
186 * The individual pieces are factored independently and in
187 * parallel. These factors are applied to the matrix creating
188 * fillin , which is stored in a non - inspectable way in auxiliary
189 * space AF. Mathematically , this is equivalent to reordering
190 * the matrix A as P A P^T and then factoring the principal
191 * leading submatrix of size equal to the sum of the sizes of
192 * the matrices factored on each processor. The factors of
193 * these submatrices overwrite the corresponding parts of A
194 * in memory.
195 * 2) Reduced System Phase :
196 * A small(max(bwl , bwu)* (P - 1)) system is formed representing
197 * interaction of the larger blocks , and is stored(as are its
198 * factors) in the space AF. A parallel Block Cyclic Reduction
199 * algorithm is used. For a linear system , a parallel front solve
200 * followed by an analagous backsolve , both using the structure
201 * of the factored matrix , are performed.
202 * 3) Backsubsitution Phase :
203 * For a linear system , a local backsubstitution is performed on
204 * each processor in parallel.
205
206 * Descriptors
207 * === ========
208
209 * Descriptors now have *types* and differ from ScaLAPACK 1.0.
210
211 * Note : banded codes can use either the old two dimensional
212 * or new one - dimensional descriptors , though the processor grid in
213 * both cases *must be one - dimensional*. We describe both types below.
214
215 * Each global data object is described by an associated description
216 * vector. This vector stores the information required to establish
217 * the mapping between an object element and its corresponding process
218 * and memory location.
219
220 * Let A be a generic term for any 2D block cyclicly distributed array.
221 * Such a global array has an associated description vector DESCA.
222 * In the following comments , the character _ should be read as
223 * "of the global array".
224
225 * NOTATION STORED IN EXPLANATION
226 * --- ------------ -------------- --------------------------------------
227 * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case ,
228 * DTYPE_A = 1.
229 * CTXT_A(global) DESCA( CTXT_ ) The BLACS context handle , indicating
230 * the BLACS process grid A is distribu -
231 * ted over. The context itself is glo -
232 * bal , but the handle(the integer
233 * value) may vary.
234 * M_A(global) DESCA( M_ ) The number of rows in the global
235 * array A.
236 * N_A(global) DESCA( N_ ) The number of columns in the global
237 * array A.
238 * MB_A(global) DESCA( MB_ ) The blocking factor used to distribute
239 * the rows of the array.
240 * NB_A(global) DESCA( NB_ ) The blocking factor used to distribute
241 * the columns of the array.
242 * RSRC_A(global) DESCA( RSRC_ ) The process row over which the first
243 * row of the array A is distributed.
244 * CSRC_A(global) DESCA( CSRC_ ) The process column over which the
245 * first column of the array A is
246 * distributed.
247 * LLD_A(local) DESCA( LLD_ ) The leading dimension of the local
248 * array. LLD_A >= MAX(1 , LOCr(M_A)).
249
250 * Let K be the number of rows or columns of a distributed matrix ,
251 * and assume that its process grid has dimension p x q.
252 * LOCr( K ) denotes the number of elements of K that a process
253 * would receive if K were distributed over the p processes of its
254 * process column.
255 * Similarly , LOCc( K ) denotes the number of elements of K that a
256 * process would receive if K were distributed over the q processes of
257 * its process row.
258 * The values of LOCr() and LOCc() may be determined via a call to the
259 * ScaLAPACK tool function , NUMROC :
260 * LOCr( M ) = NUMROC( M , MB_A , MYROW , RSRC_A , NPROW ) ,
261 * LOCc( N ) = NUMROC( N , NB_A , MYCOL , CSRC_A , NPCOL ).
262 * An upper bound for these quantities may be computed by :
263 * LOCr( M ) <= ceil( ceil(M / MB_A) / NPROW )*MB_A
264 * LOCc( N ) <= ceil( ceil(N / NB_A) / NPCOL )*NB_A
265
266 * One - dimensional descriptors :
267
268 * One - dimensional descriptors are a new addition to ScaLAPACK since
269 * version 1.0. They simplify and shorten the descriptor for 1D
270 * arrays.
271
272 * Since ScaLAPACK supports two - dimensional arrays as the fundamental
273 * object , we allow 1D arrays to be distributed either over the
274 * first dimension of the array(as if the grid were P - by - 1) or the
275 * 2nd dimension(as if the grid were 1 - by - P). This choice is
276 * indicated by the descriptor type(501 or 502)
277 * as described below.
278
279 * IMPORTANT NOTE : the actual BLACS grid represented by the
280 * CTXT entry in the descriptor may be *either* P - by - 1 or 1 - by - P
281 * irrespective of which one - dimensional descriptor type
282 * (501 or 502) is input.
283 * This routine will interpret the grid properly either way.
284 * ScaLAPACK routines *do not support intercontext operations* so that
285 * the grid passed to a single ScaLAPACK routine *must be the same*
286 * for all array descriptors passed to that routine.
287
288 * NOTE : In all cases where 1D descriptors are used , 2D descriptors
289 * may also be used , since a one - dimensional array is a special case
290 * of a two - dimensional array with one dimension of size unity.
291 * The two - dimensional array used in this case *must* be of the
292 * proper orientation :
293 * If the appropriate one - dimensional descriptor is DTYPEA = 501
294 * (1 by P type) , then the two dimensional descriptor must
295 * have a CTXT value that refers to a 1 by P BLACS grid ;
296 * If the appropriate one - dimensional descriptor is DTYPEA = 502
297 * (P by 1 type) , then the two dimensional descriptor must
298 * have a CTXT value that refers to a P by 1 BLACS grid.
299
300 * Summary of allowed descriptors , types , and BLACS grids :
301 * DTYPE 501 502 1 1
302 * BLACS grid 1xP or Px1 1xP or Px1 1xP Px1
303 * --- --------------------------------------------------
304 * A OK NO OK NO
305 * B NO OK NO OK
306
307 * Let A be a generic term for any 1D block cyclicly distributed array.
308 * Such a global array has an associated description vector DESCA.
309 * In the following comments , the character _ should be read as
310 * "of the global array".
311
312 * NOTATION STORED IN EXPLANATION
313 * --- ------------ ---------- ------------------------------------------
314 * DTYPE_A(global) DESCA( 1 ) The descriptor type. For 1D grids ,
315 * TYPE_A = 501 : 1 - by - P grid.
316 * TYPE_A = 502 : P - by - 1 grid.
317 * CTXT_A(global) DESCA( 2 ) The BLACS context handle , indicating
318 * the BLACS process grid A is distribu -
319 * ted over. The context itself is glo -
320 * bal , but the handle(the integer
321 * value) may vary.
322 * N_A(global) DESCA( 3 ) The size of the array dimension being
323 * distributed.
324 * NB_A(global) DESCA( 4 ) The blocking factor used to distribute
325 * the distributed dimension of the array.
326 * SRC_A(global) DESCA( 5 ) The process row or column over which the
327 * first row or column of the array
328 * is distributed.
329 * LLD_A(local) DESCA( 6 ) The leading dimension of the local array
330 * storing the local blocks of the distri -
331 * buted array A. Minimum value of LLD_A
332 * depends on TYPE_A.
333 * TYPE_A = 501 : LLD_A >=
334 * size of undistributed dimension , 1.
335 * TYPE_A = 502 : LLD_A >= NB_A , 1.
336 * Reserved DESCA( 7 ) Reserved for future use.
337
338 * === ==================================================================
339
340 * Implemented for ScaLAPACK by :
341 * Andrew J. Cleary , Livermore National Lab and University of Tenn. ,
342 * and Markus Hegland , Australian Natonal University. Feb. , 1997.
343 * Based on code written by : Peter Arbenz , ETH Zurich , 1996.
344
345 * === ==================================================================
346
347 * .. Parameters ..
348 DOUBLE PRECISION ONE , ZERO
349 PARAMETER( ONE = 1.0D + 0 )
350 PARAMETER( ZERO = 0.0D + 0 )
351 COMPLEX*16 CONE , CZERO
352 PARAMETER( CONE =( 1.0D + 0 , 0.0D + 0 ) )
353 PARAMETER( CZERO =( 0.0D + 0 , 0.0D + 0 ) )
354 INTEGER INT_ONE
355 PARAMETER( INT_ONE = 1 )
356 INTEGER DESCMULT , BIGNUM
357 PARAMETER( DESCMULT = 100 , BIGNUM = DESCMULT*DESCMULT )
358 INTEGER BLOCK_CYCLIC_2D , CSRC_ , CTXT_ , DLEN_ , DTYPE_ ,
359 $LLD_ , MB_ , M_ , NB_ , N_ , RSRC_
360 PARAMETER( BLOCK_CYCLIC_2D = 1 , DLEN_ = 9 , DTYPE_ = 1 ,
361 $CTXT_ = 2 , M_ = 3 , N_ = 4 , MB_ = 5 , NB_ = 6 ,
362 $RSRC_ = 7 , CSRC_ = 8 , LLD_ = 9 )
363 WORK( 1 ) = WORK_SIZE_MIN
364
365 * Make INFO consistent across processors
366
367 CALL IGAMX2D( ICTXT , 'A' , ' ' , 1 , 1 , INFO , 1 , INFO , INFO ,
368 $- 1 , 0 , 0 )
369
370 IF( MYCOL.EQ.0 ) THEN
370
371 CALL IGEBS2D( ICTXT , 'A' , ' ' , 1 , 1 , INFO , 1 )
372 ELSE
372
373 CALL IGEBR2D( ICTXT , 'A' , ' ' , 1 , 1 , INFO , 1 , 0 , 0 )
374 ENDIF
375
376 RETURN
377
378 * End of PZGBTRF
379
380 END
381 58
2
|
|
Variables in Routine PZGBTRF()
| Summary Report |
| Data Type | Quantity | Size(byte) |
| COMPLEX*16 | 5 | ? |
| DOUBLE PRECISION | 2 | 8 |
| INTEGER | 23 | 92 |
| TOTAL | 30 | 100 |
List of Variables
COMPLEX*16
| A( * ) | AF( * ) | CONE | CZERO | WORK( * ) |
DOUBLE PRECISION
INTEGER
| BIGNUM | BLOCK_CYCLIC_2D | BWL | BWU | CSRC_ |
| CTXT_ | DESCA( * ) | DESCMULT | DLEN_ | DTYPE_ |
| INFO | INT_ONE | IPIV( * ) | JA | LAF |
| LLD_ | LWORK | M_ | MB_ | N |
| N_ | NB_ | RSRC_ | | |
|
Analysis elements of the routine PZGBTRF() Put the mouse over each element to display detailed matching information
Assigned variables |
| | | BIGNUM , BLOCK_CYCLIC_2D , BWL , BWU , CONE , CSRC_ , CTXT_ , CZERO , DESCMULT , DLEN_ , DTYPE_ , INFO , INT_ONE , JA , LLD_ , LWORK , M_ , MB_ , N , N_ , NB_ , ONE , RSRC_ , WORK , ZERO |
|
Active variables |
| | | A , AF , BIGNUM , BLOCK_CYCLIC_2D , BWL , BWU , CONE , CSRC_ , CTXT_ , CZERO , DESCA , DESCMULT , DLEN_ , DTYPE_ , INFO , INT_ONE , IPIV , JA , LAF , LLD_ , LWORK , M_ , MB_ , N , N_ , NB_ , one , RSRC_ , WORK , ZERO |
|
Allocated variables [ statement : associated variable ] |
| | new | : a, or |
|
Accessed arrays [ array name : associated index ] |
| | A | : * , 1:N, JA:JA+N-1 , 1:N, JA:JA+N-1 , 1:N, JA:JA+N-1 , 1:N, JA:JA+N-1 |
| | AF | : * , 1 |
| | DESCA | : * , 1 , 2 , 3 , 4 , 5 , 6 , 7 , CSRC_ , CTXT_ , DTYPE_ , LLD_ , M_ , MB_ , N_ , NB , NB_ , RSRC_ |
| | IPIV | : * |
| | WORK | : * , 1 , 1 , 1 |
|
Conditional statements [ statement : associated predicate ] |
| | do | : ( not support intercontext operations* so that ) |
| | for | : ( parallelism's sake , while P represents ) , ( numerical stability using ) , ( more detail on the format of ) , ( the distributed matrix A. ) , ( full description and options. ) , ( local factorizations. ) , ( matrices must be blocked , reflecting the nature ) , ( solving banded and tridiagonal linear systems ) , ( the case N / P >> bw are ) , ( the factorization or 3 for the ) , ( the ) , ( a linear system , a parallel front solve ) , ( a linear system , a local backsubstitution is performed on ) , ( any 2D block cyclicly distributed array. ) , ( these quantities may be computed by : ) , ( 1D ) , ( all array descriptors passed to that routine. ) , ( any 1D block cyclicly distributed array. ) , ( 1D grids , ) , ( future use. ) , ( ScaLAPACK by : ) |
| | if | : ( 1D type (DTYPE_A = 501) , DLEN >= 7 ; ) , ( 2D type (DTYPE_A = 1) , DLEN >= 9 . ) , ( a linear system ) , ( LAF is not large enough , an error code will be returned ) , ( LWORK is too small , the minimal acceptable size will be ) , ( the i-th argument is an array and the j - entry had ) , ( the i-th ) , ( INFO = K <= NPROCS , the submatrix stored on processor ) , ( INFO = K > NPROCS , the submatrix stored on processor ) , ( the matrix spans more than one processor , the following ) , ( this is too small , divide and conquer ) , ( the factorization routine and the solve routine are to be called ) , ( K were distributed over the p processes of its ) , ( K were distributed over the q processes of ) , ( the grid were P - by - 1) or the ) , ( the grid were 1 - by - P). This choice is ) , ( the appropriate one - dimensional descriptor is DTYPEA = 501 ) , ( the appropriate one - dimensional descriptor is DTYPEA = 502 ) , ( MYCOL.EQ.0 ) |
| | while | : ( P represents ) , ( others ) |
|
| List of variables | A( * ) AF( * ) BIGNUM BLOCK_CYCLIC_2D BWL BWU CONE
| CSRC_ CTXT_ CZERO DESCA( * ) DESCMULT DLEN_ DTYPE_ INFO
| INT_ONE IPIV( * ) JA LAF LLD_ LWORK M_ MB_
| N N_ NB_ ONE RSRC_ WORK( * ) ZERO | | close
| |
A( * )
AF( * )
BIGNUM
BLOCK_CYCLIC_2D
BWL
BWU
CONE
CSRC_
CTXT_
CZERO
DESCA( * )
DESCMULT
DLEN_
DTYPE_
INFO
INT_ONE
IPIV( * )
JA
LAF
LLD_
LWORK
M_
MB_
N
N_
NB_
ONE
RSRC_
WORK( * )
ZERO
| |