|
|
| |
| # lines: |
779 | | # code: |
779 | | # comment: | 0 | |
# blank: | 0 |
| # Variables: | 55 |
| # Callers: | 1 |
| # Callings: | 1 |
| # Words: | 256 |
| # Keywords: | 164 |
|
|
|
|
|
..
.. Array Arguments ..
..
Purpose
=======
PSDTTRS solves a system of linear equations
A(1:N, JA:JA+N-1) * X = B(IB:IB+N-1, 1:NRHS)
or
A(1:N, JA:JA+N-1)' * X = B(IB:IB+N-1, 1:NRHS)
where A(1:N, JA:JA+N-1) is the matrix used to produce the factors
stored in A(1:N,JA:JA+N-1) and AF by PSDTTRF.
A(1:N, JA:JA+N-1) is an N-by-N real
tridiagonal diagonally dominant-like distributed
matrix.
Routine PSDTTRF MUST be called first.
=====================================================================
Arguments
=========
TRANS (global input) CHARACTER
= 'N': Solve with A(1:N, JA:JA+N-1);
= 'T' or 'C': Solve with A(1:N, JA:JA+N-1)^T;
N (global input) INTEGER
The number of rows and columns to be operated on, i.e. the
order of the distributed submatrix A(1:N, JA:JA+N-1). N >= 0.
NRHS (global input) INTEGER
The number of right hand sides, i.e., the number of columns
of the distributed submatrix B(IB:IB+N-1, 1:NRHS).
NRHS >= 0.
DL (local input/local output) REAL pointer to local
part of global vector storing the lower diagonal of the
matrix. Globally, DL(1) is not referenced, and DL must be
aligned with D.
Must be of size >= DESCA( NB_ ).
On exit, this array contains information containing the
factors of the matrix.
D (local input/local output) REAL pointer to local
part of global vector storing the main diagonal of the
matrix.
On exit, this array contains information containing the
factors of the matrix.
Must be of size >= DESCA( NB_ ).
DU (local input/local output) REAL pointer to local
part of global vector storing the upper diagonal of the
matrix. Globally, DU(n) is not referenced, and DU must be
aligned with D.
On exit, this array contains information containing the
factors of the matrix.
Must be of size >= DESCA( NB_ ).
JA (global input) INTEGER
The index in the global array A that points to the start of
the matrix to be operated on (which may be either all of A
or a submatrix of A).
DESCA (global and local input) INTEGER array of dimension DLEN.
if 1D type (DTYPE_A=501 or 502), DLEN >= 7;
if 2D type (DTYPE_A=1), DLEN >= 9.
The array descriptor for the distributed matrix A.
Contains information of mapping of A to memory. Please
see NOTES below for full description and options.
B (local input/local output) REAL pointer into
local memory to an array of local lead dimension lld_b>=NB.
On entry, this array contains the
the local pieces of the right hand sides
B(IB:IB+N-1, 1:NRHS).
On exit, this contains the local piece of the solutions
distributed matrix X.
IB (global input) INTEGER
The row index in the global array B that points to the first
row of the matrix to be operated on (which may be either
all of B or a submatrix of B).
DESCB (global and local input) INTEGER array of dimension DLEN.
if 1D type (DTYPE_B=502), DLEN >=7;
if 2D type (DTYPE_B=1), DLEN >= 9.
The array descriptor for the distributed matrix B.
Contains information of mapping of B to memory. Please
see NOTES below for full description and options.
AF (local output) REAL array, dimension LAF.
Auxiliary Fillin Space.
Fillin is created during the factorization routine
PSDTTRF and this is stored in AF. If a linear system
is to be solved using PSDTTRS after the factorization
routine, AF *must not be altered* after the factorization.
LAF (local input) INTEGER
Size of user-input Auxiliary Fillin space AF. Must be >=
2*(NB+2)
If LAF is not large enough, an error code will be returned
and the minimum acceptable size will be returned in AF( 1 )
WORK (local workspace/local output)
REAL temporary workspace. This space may
be overwritten in between calls to routines. WORK must be
the size given in LWORK.
On exit, WORK( 1 ) contains the minimal LWORK.
LWORK (local input or global input) INTEGER
Size of user-input workspace WORK.
If LWORK is too small, the minimal acceptable size will be
returned in WORK(1) and an error code is returned. LWORK>=
10*NPCOL+4*NRHS
INFO (local output) INTEGER
= 0: successful exit
< 0: If the i-th argument is an array and the j-entry had
an illegal value, then INFO = -(i*100+j), if the i-th
argument is a scalar and had an illegal value, then
INFO = -i.
=====================================================================
Restrictions
============
The following are restrictions on the input parameters. Some of these
are temporary and will be removed in future releases, while others
may reflect fundamental technical limitations.
Non-cyclic restriction: VERY IMPORTANT!
P*NB>= mod(JA-1,NB)+N.
The mapping for matrices must be blocked, reflecting the nature
of the divide and conquer algorithm as a task-parallel algorithm.
This formula in words is: no processor may have more than one
chunk of the matrix.
Blocksize cannot be too small:
If the matrix spans more than one processor, the following
restriction on NB, the size of each block on each processor,
must hold:
NB >= 2
The bulk of parallel computation is done on the matrix of size
O(NB) on each processor. If this is too small, divide and conquer
is a poor choice of algorithm.
Submatrix reference:
JA = IB
Alignment restriction that prevents unnecessary communication.
=====================================================================
Notes
=====
If the factorization routine and the solve routine are to be called
separately (to solve various sets of righthand sides using the same
coefficient matrix), the auxiliary space AF *must not be altered*
between calls to the factorization routine and the solve routine.
The best algorithm for solving banded and tridiagonal linear systems
depends on a variety of parameters, especially the bandwidth.
Currently, only algorithms designed for the case N/P >> bw are
implemented. These go by many names, including Divide and Conquer,
Partitioning, domain decomposition-type, etc.
For tridiagonal matrices, it is obvious: N/P >> bw(=1), and so D&C
algorithms are the appropriate choice.
Algorithm description: Divide and Conquer
The Divide and Conqer algorithm assumes the matrix is narrowly
banded compared with the number of equations. In this situation,
it is best to distribute the input matrix A one-dimensionally,
with columns atomic and rows divided amongst the processes.
The basic algorithm divides the tridiagonal matrix up into
P pieces with one stored on each processor,
and then proceeds in 2 phases for the factorization or 3 for the
solution of a linear system.
1) Local Phase:
The individual pieces are factored independently and in
parallel. These factors are applied to the matrix creating
fillin, which is stored in a non-inspectable way in auxiliary
space AF. Mathematically, this is equivalent to reordering
the matrix A as P A P^T and then factoring the principal
leading submatrix of size equal to the sum of the sizes of
the matrices factored on each processor. The factors of
these submatrices overwrite the corresponding parts of A
in memory.
2) Reduced System Phase:
A small ((P-1)) system is formed representing
interaction of the larger blocks, and is stored (as are its
factors) in the space AF. A parallel Block Cyclic Reduction
algorithm is used. For a linear system, a parallel front solve
followed by an analagous backsolve, both using the structure
of the factored matrix, are performed.
3) Backsubsitution Phase:
For a linear system, a local backsubstitution is performed on
each processor in parallel.
Descriptors
===========
Descriptors now have *types* and differ from ScaLAPACK 1.0.
Note: tridiagonal codes can use either the old two dimensional
or new one-dimensional descriptors, though the processor grid in
both cases *must be one-dimensional*. We describe both types below.
Each global data object is described by an associated description
vector. This vector stores the information required to establish
the mapping between an object element and its corresponding process
and memory location.
Let A be a generic term for any 2D block cyclicly distributed array.
Such a global array has an associated description vector DESCA.
In the following comments, the character _ should be read as
"of the global array".
NOTATION STORED IN EXPLANATION
--------------- -------------- --------------------------------------
DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case,
DTYPE_A = 1.
CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
the BLACS process grid A is distribu-
ted over. The context itself is glo-
bal, but the handle (the integer
value) may vary.
M_A (global) DESCA( M_ ) The number of rows in the global
array A.
N_A (global) DESCA( N_ ) The number of columns in the global
array A.
MB_A (global) DESCA( MB_ ) The blocking factor used to distribute
the rows of the array.
NB_A (global) DESCA( NB_ ) The blocking factor used to distribute
the columns of the array.
RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
row of the array A is distributed.
CSRC_A (global) DESCA( CSRC_ ) The process column over which the
first column of the array A is
distributed.
LLD_A (local) DESCA( LLD_ ) The leading dimension of the local
array. LLD_A >= MAX(1,LOCr(M_A)).
Let K be the number of rows or columns of a distributed matrix,
and assume that its process grid has dimension p x q.
LOCr( K ) denotes the number of elements of K that a process
would receive if K were distributed over the p processes of its
process column.
Similarly, LOCc( K ) denotes the number of elements of K that a
process would receive if K were distributed over the q processes of
its process row.
The values of LOCr() and LOCc() may be determined via a call to the
ScaLAPACK tool function, NUMROC:
LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
An upper bound for these quantities may be computed by:
LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
One-dimensional descriptors:
One-dimensional descriptors are a new addition to ScaLAPACK since
version 1.0. They simplify and shorten the descriptor for 1D
arrays.
Since ScaLAPACK supports two-dimensional arrays as the fundamental
object, we allow 1D arrays to be distributed either over the
first dimension of the array (as if the grid were P-by-1) or the
2nd dimension (as if the grid were 1-by-P). This choice is
indicated by the descriptor type (501 or 502)
as described below.
However, for tridiagonal matrices, since the objects being
distributed are the individual vectors storing the diagonals, we
have adopted the convention that both the P-by-1 descriptor and
the 1-by-P descriptor are allowed and are equivalent for
tridiagonal matrices. Thus, for tridiagonal matrices,
DTYPE_A = 501 or 502 can be used interchangeably
without any other change.
We require that the distributed vectors storing the diagonals of a
tridiagonal matrix be aligned with each other. Because of this, a
single descriptor, DESCA, serves to describe the distribution of
of all diagonals simultaneously.
IMPORTANT NOTE: the actual BLACS grid represented by the
CTXT entry in the descriptor may be *either* P-by-1 or 1-by-P
irrespective of which one-dimensional descriptor type
(501 or 502) is input.
This routine will interpret the grid properly either way.
ScaLAPACK routines *do not support intercontext operations* so that
the grid passed to a single ScaLAPACK routine *must be the same*
for all array descriptors passed to that routine.
NOTE: In all cases where 1D descriptors are used, 2D descriptors
may also be used, since a one-dimensional array is a special case
of a two-dimensional array with one dimension of size unity.
The two-dimensional array used in this case *must* be of the
proper orientation:
If the appropriate one-dimensional descriptor is DTYPEA=501
(1 by P type), then the two dimensional descriptor must
have a CTXT value that refers to a 1 by P BLACS grid;
If the appropriate one-dimensional descriptor is DTYPEA=502
(P by 1 type), then the two dimensional descriptor must
have a CTXT value that refers to a P by 1 BLACS grid.
Summary of allowed descriptors, types, and BLACS grids:
DTYPE 501 502 1 1
BLACS grid 1xP or Px1 1xP or Px1 1xP Px1
-----------------------------------------------------
A OK OK OK NO
B NO OK NO OK
Note that a consequence of this chart is that it is not possible
for *both* DTYPE_A and DTYPE_B to be 2D_type(1), as these lead
to opposite requirements for the orientation of the BLACS grid,
and as noted before, the *same* BLACS context must be used in
all descriptors in a single ScaLAPACK subroutine call.
Let A be a generic term for any 1D block cyclicly distributed array.
Such a global array has an associated description vector DESCA.
In the following comments, the character _ should be read as
"of the global array".
NOTATION STORED IN EXPLANATION
--------------- ---------- ------------------------------------------
DTYPE_A(global) DESCA( 1 ) The descriptor type. For 1D grids,
TYPE_A = 501: 1-by-P grid.
TYPE_A = 502: P-by-1 grid.
CTXT_A (global) DESCA( 2 ) The BLACS context handle, indicating
the BLACS process grid A is distribu-
ted over. The context itself is glo-
bal, but the handle (the integer
value) may vary.
N_A (global) DESCA( 3 ) The size of the array dimension being
distributed.
NB_A (global) DESCA( 4 ) The blocking factor used to distribute
the distributed dimension of the array.
SRC_A (global) DESCA( 5 ) The process row or column over which the
first row or column of the array
is distributed.
Ignored DESCA( 6 ) Ignored for tridiagonal matrices.
Reserved DESCA( 7 ) Reserved for future use.
=====================================================================
Code Developer: Andrew J. Cleary, University of Tennessee.
Current address: Lawrence Livermore National Labs.
=====================================================================
.. Parameters ..
|
|
|
|
001 SUBROUTINE PSDTTRS( TRANS , N , NRHS , DL , D , DU , JA , DESCA , B , IB ,
002 $DESCB , AF , LAF , WORK , LWORK , INFO )
003
004 * -- ScaLAPACK routine(version 1.7) --
005 * University of Tennessee , Knoxville , Oak Ridge National Laboratory ,
006 * and University of California , Berkeley.
007 * April 3 , 2000
008
009 * .. Scalar Arguments ..
010 CHARACTER TRANS
011 INTEGER IB , INFO , JA , LAF , LWORK , N , NRHS
012 INTEGER INT_ONE
013 PARAMETER( INT_ONE = 1 )
014 INTEGER DESCMULT , BIGNUM
015 PARAMETER( DESCMULT = 100 , BIGNUM = DESCMULT*DESCMULT )
016 INTEGER BLOCK_CYCLIC_2D , CSRC_ , CTXT_ , DLEN_ , DTYPE_ ,
017 $LLD_ , MB_ , M_ , NB_ , N_ , RSRC_
018 PARAMETER( BLOCK_CYCLIC_2D = 1 , DLEN_ = 9 , DTYPE_ = 1 ,
019 $CTXT_ = 2 , M_ = 3 , N_ = 4 , MB_ = 5 , NB_ = 6 ,
020 $RSRC_ = 7 , CSRC_ = 8 , LLD_ = 9 )
021 * ..
022 * .. Local Scalars ..
023 INTEGER CSRC , FIRST_PROC , ICTXT , ICTXT_NEW , ICTXT_SAVE ,
024 $IDUM2 , IDUM3 , JA_NEW , LLDA , LLDB , MYCOL , MYROW ,
025 $MY_NUM_COLS , NB , NP , NPCOL , NPROW , NP_SAVE ,
026 $ODD_SIZE , PART_OFFSET , PART_SIZE , RETURN_CODE ,
027 $STORE_M_B , STORE_N_A , TEMP , WORK_SIZE_MIN
028 * ..
029 * .. Local Arrays ..
030 INTEGER DESCA_1XP( 7 ) , DESCB_PX1( 7 ) ,
031 $PARAM_CHECK( 15 , 3 )
032 * ..
033 * .. External Subroutines ..
034 EXTERNAL BLACS_GRIDEXIT , BLACS_GRIDINFO , DESC_CONVERT ,
035 $GLOBCHK , PSDTTRSV , PXERBLA , RESHAPE
036 * ..
037 * .. External Functions ..
038 LOGICAL LSAME
039 INTEGER NUMROC
040 EXTERNAL LSAME , NUMROC
041 * ..
042 * .. Intrinsic Functions ..
043 INTRINSIC ICHAR , MOD
044 * ..
045 * .. Executable Statements ..
046
047 * Test the input parameters
048
049 INFO = 0
050
051 * Convert descriptor into standard form for easy access to
052 * parameters , check that grid is of right shape.
053
054 DESCA_1XP( 1 ) = 501
055 DESCB_PX1( 1 ) = 502
056
057 TEMP = DESCA( DTYPE_ )
058 IF( TEMP.EQ.502 ) THEN
059 * Temporarily set the descriptor type to 1xP type
059
060 DESCA( DTYPE_ ) = 501
061 END IF
062
063 CALL DESC_CONVERT( DESCA , DESCA_1XP , RETURN_CODE )
064
065 DESCA( DTYPE_ ) = TEMP
066
067 IF( RETURN_CODE.NE.0 ) THEN
067
068 INFO = - ( 8*100 + 2 )
069 END IF
070
071 CALL DESC_CONVERT( DESCB , DESCB_PX1 , RETURN_CODE )
072
073 IF( RETURN_CODE.NE.0 ) THEN
073
074 INFO = - ( 11*100 + 2 )
075 END IF
076
077 * Consistency checks for DESCA and DESCB.
078
079 * Context must be the same
080 IF( DESCA_1XP( 2 ).NE.DESCB_PX1( 2 ) ) THEN
080
081 INFO = - ( 11*100 + 2 )
082 END IF
083
084 * These are alignment restrictions that may or may not be removed
085 * in future releases. - Andy Cleary , April 14 , 1996.
086
087 * Block sizes must be the same
088 IF( DESCA_1XP( 4 ).NE.DESCB_PX1( 4 ) ) THEN
088
089 INFO = - ( 11*100 + 4 )
090 END IF
091
092 * Source processor must be the same
093
094 IF( DESCA_1XP( 5 ).NE.DESCB_PX1( 5 ) ) THEN
094
095 INFO = - ( 11*100 + 5 )
096 END IF
097
098 * Get values out of descriptor for use in code.
099
100 ICTXT = DESCA_1XP( 2 )
101 CSRC = DESCA_1XP( 5 )
102 NB = DESCA_1XP( 4 )
103 LLDA = DESCA_1XP( 6 )
104 STORE_N_A = DESCA_1XP( 3 )
105 LLDB = DESCB_PX1( 6 )
106 STORE_M_B = DESCB_PX1( 3 )
107
108 * Get grid parameters
109
110 CALL BLACS_GRIDINFO( ICTXT , NPROW , NPCOL , MYROW , MYCOL )
111 NP = NPROW*NPCOL
112
113 IF( LSAME( TRANS , 'N' ) ) THEN
113
114 IDUM2 = ICHAR( 'N' )
115 ELSE IF( LSAME( TRANS , 'T' ) ) THEN
115
116 IDUM2 = ICHAR( 'T' )
117 ELSE IF( LSAME( TRANS , 'C' ) ) THEN
117
118 IDUM2 = ICHAR( 'T' )
119 ELSE
119
120 INFO = - 1
121 END IF
122
123 IF( LWORK.LT. - 1 ) THEN
123
124 INFO = - 15
125 ELSE IF( LWORK.EQ. - 1 ) THEN
125
126 IDUM3 = - 1
127 ELSE
127
128 IDUM3 = 1
129 END IF
130
131 IF( N.LT.0 ) THEN
131
132 INFO = - 2
133 END IF
134
135 IF( N + JA - 1.GT.STORE_N_A ) THEN
135
136 INFO = - ( 8*100 + 6 )
137 END IF
138
139 IF( N + IB - 1.GT.STORE_M_B ) THEN
139
140 INFO = - ( 11*100 + 3 )
141 END IF
142
143 IF( LLDB.LT.NB ) THEN
143
144 INFO = - ( 11*100 + 6 )
145 END IF
146
147 IF( NRHS.LT.0 ) THEN
147
148 INFO = - 3
149 END IF
150
151 * Current alignment restriction
152
153 IF( JA.NE.IB ) THEN
153
154 INFO = - 7
155 END IF
156
157 * Argument checking that is specific to Divide & Conquer routine
158
159 IF( NPROW.NE.1 ) THEN
159
160 INFO = - ( 8*100 + 2 )
161 END IF
162
163 IF( N.GT.NP*NB - MOD( JA - 1 , NB ) ) THEN
163
164 INFO = - ( 2 )
165 CALL PXERBLA( ICTXT , 'PSDTTRS , D&C alg. : only 1 block per proc'
166 $ , - INFO )
167 RETURN
168 END IF
169
170 IF(( JA + N - 1.GT.NB ) .AND.( NB.LT.2*INT_ONE ) ) THEN
170
171 INFO = - ( 8*100 + 4 )
172 CALL PXERBLA( ICTXT , 'PSDTTRS , D&C alg. : NB too small' , - INFO )
173 RETURN
174 END IF
175
176 WORK_SIZE_MIN = 10*NPCOL + 4*NRHS
177
178 WORK( 1 ) = WORK_SIZE_MIN
179
180 IF( LWORK.LT.WORK_SIZE_MIN ) THEN
180
181 IF( LWORK.NE. - 1 ) THEN
181
182 INFO = - 15
183 CALL PXERBLA( ICTXT , 'PSDTTRS : worksize error' , - INFO )
184 END IF
185 RETURN
186 END IF
187
188 * Pack params and positions into arrays for global consistency check
189
190 PARAM_CHECK( 15 , 1 ) = DESCB( 5 )
191 PARAM_CHECK( 14 , 1 ) = DESCB( 4 )
192 PARAM_CHECK( 13 , 1 ) = DESCB( 3 )
193 PARAM_CHECK( 12 , 1 ) = DESCB( 2 )
194 PARAM_CHECK( 11 , 1 ) = DESCB( 1 )
195 PARAM_CHECK( 10 , 1 ) = IB
196 PARAM_CHECK( 9 , 1 ) = DESCA( 5 )
197 PARAM_CHECK( 8 , 1 ) = DESCA( 4 )
198 PARAM_CHECK( 7 , 1 ) = DESCA( 3 )
199 PARAM_CHECK( 6 , 1 ) = DESCA( 1 )
200 PARAM_CHECK( 5 , 1 ) = JA
201 PARAM_CHECK( 4 , 1 ) = NRHS
202 PARAM_CHECK( 3 , 1 ) = N
203 PARAM_CHECK( 2 , 1 ) = IDUM3
204 PARAM_CHECK( 1 , 1 ) = IDUM2
205
206 PARAM_CHECK( 15 , 2 ) = 1105
207 PARAM_CHECK( 14 , 2 ) = 1104
208 PARAM_CHECK( 13 , 2 ) = 1103
209 PARAM_CHECK( 12 , 2 ) = 1102
210 PARAM_CHECK( 11 , 2 ) = 1101
211 PARAM_CHECK( 10 , 2 ) = 10
212 PARAM_CHECK( 9 , 2 ) = 805
213 PARAM_CHECK( 8 , 2 ) = 804
214 PARAM_CHECK( 7 , 2 ) = 803
215 PARAM_CHECK( 6 , 2 ) = 801
216 PARAM_CHECK( 5 , 2 ) = 7
217 PARAM_CHECK( 4 , 2 ) = 3
218 PARAM_CHECK( 3 , 2 ) = 2
219 PARAM_CHECK( 2 , 2 ) = 15
220 PARAM_CHECK( 1 , 2 ) = 1
221
222 * Want to find errors with MIN( ) , so if no error , set it to a big
223 * number. If there already is an error , multiply by the the
224 * descriptor multiplier.
225
226 IF( INFO.GE.0 ) THEN
226
227 INFO = BIGNUM
228 ELSE IF( INFO.LT. - DESCMULT ) THEN
228
229 INFO = - INFO
230 ELSE
230
231 INFO = - INFO*DESCMULT
232 END IF
233
234 * Check consistency across processors
235
236 CALL GLOBCHK( ICTXT , 15 , PARAM_CHECK , 15 , PARAM_CHECK( 1 , 3 ) ,
237 $INFO )
238
239 * Prepare output : set info = 0 if no error , and divide by DESCMULT
240 * if error is not in a descriptor entry.
241
242 IF( INFO.EQ.BIGNUM ) THEN
242
243 INFO = 0
244 ELSE IF( MOD( INFO , DESCMULT ).EQ.0 ) THEN
244
245 INFO = - INFO / DESCMULT
246 ELSE
246
247 INFO = - INFO
248 END IF
249
250 IF( INFO.LT.0 ) THEN
250
251 CALL PXERBLA( ICTXT , 'PSDTTRS' , - INFO )
252 RETURN
253 END IF
254
255 * Quick return if possible
256
257 IF( N.EQ.0 )
257
258 $ RETURN
259
260 IF( NRHS.EQ.0 )
260
261 $ RETURN
262
263 * Adjust addressing into matrix space to properly get into
264 * the beginning part of the relevant data
265
266 PART_OFFSET = NB*(( JA - 1 ) / ( NPCOL*NB ) )
267
268 IF(( MYCOL - CSRC ).LT.( JA - PART_OFFSET - 1 ) / NB ) THEN
269 PART_OFFSET = PART_OFFSET + NB
270 END IF
271
272 IF( MYCOL.LT.CSRC ) THEN
272
273 PART_OFFSET = PART_OFFSET - NB
274 END IF
275
276 * Form a new BLACS grid(the "standard form" grid) with only procs
277 * holding part of the matrix , of size 1xNP where NP is adjusted ,
278 * starting at csrc = 0 , with JA modified to reflect dropped procs.
279
280 * First processor to hold part of the matrix :
281
282 FIRST_PROC = MOD(( JA - 1 ) / NB + CSRC , NPCOL )
283
284 * Calculate new JA one while dropping off unused processors.
285
286 JA_NEW = MOD( JA - 1 , NB ) + 1
287
288 * Save and compute new value of NP
289
290 NP_SAVE = NP
291 NP =( JA_NEW + N - 2 ) / NB + 1
292
293 * Call utility routine that forms "standard-form" grid
294
295 CALL RESHAPE( ICTXT , INT_ONE , ICTXT_NEW , INT_ONE , FIRST_PROC ,
296 $ INT_ONE , NP )
297
298 * Use new context from standard grid as context.
299
300 ICTXT_SAVE = ICTXT
301 ICTXT = ICTXT_NEW
302 DESCA_1XP( 2 ) = ICTXT_NEW
303 DESCB_PX1( 2 ) = ICTXT_NEW
304
305 * Get information about new grid.
306
307 CALL BLACS_GRIDINFO( ICTXT , NPROW , NPCOL , MYROW , MYCOL )
308
309 * Drop out processors that do not have part of the matrix.
310
311 IF( MYROW.LT.0 ) THEN
311
312 GO TO 20
313 END IF
314
315 * ********************************
316 * Values reused throughout routine
317
318 * User - input value of partition size
319
320 PART_SIZE = NB
321
322 * Number of columns in each processor
323
324 MY_NUM_COLS = NUMROC( N , PART_SIZE , MYCOL , 0 , NPCOL )
325
326 * Offset in columns to beginning of main partition in each proc
327
328 IF( MYCOL.EQ.0 ) THEN
328
329 PART_OFFSET = PART_OFFSET + MOD( JA_NEW - 1 , PART_SIZE )
330 MY_NUM_COLS = MY_NUM_COLS - MOD( JA_NEW - 1 , PART_SIZE )
331 END IF
332
333 * Size of main(or odd) partition in each processor
334
335 ODD_SIZE = MY_NUM_COLS
336 IF( MYCOL.LT.NP - 1 ) THEN
336
337 ODD_SIZE = ODD_SIZE - INT_ONE
338 END IF
339
340 * Begin main code
341
342 INFO = 0
343
344 * Call frontsolve routine
345
346 IF( LSAME( TRANS , 'N' ) ) THEN
347
347
348 CALL PSDTTRSV ( 'L' , 'N' , N , NRHS , DL( PART_OFFSET + 1 ) ,
349 $ D( PART_OFFSET + 1 ) , DU( PART_OFFSET + 1 ) , JA_NEW ,
350 $ DESCA_1XP , B , IB , DESCB_PX1 , AF , LAF , WORK ,
351 $ LWORK , INFO )
352
353 ELSE
354
354
355 CALL PSDTTRSV ( 'U' , 'T' , N , NRHS , DL( PART_OFFSET + 1 ) ,
356 $ D( PART_OFFSET + 1 ) , DU( PART_OFFSET + 1 ) , JA_NEW ,
357 $ DESCA_1XP , B , IB , DESCB_PX1 , AF , LAF , WORK ,
358 $ LWORK , INFO )
359
360 END IF
361
362 * Call backsolve routine
363
364 IF(( LSAME( TRANS , 'C' ) ) .OR.( LSAME( TRANS , 'T' ) ) ) THEN
365
365
366 CALL PSDTTRSV ( 'L' , 'T' , N , NRHS , DL( PART_OFFSET + 1 ) ,
367 $ D( PART_OFFSET + 1 ) , DU( PART_OFFSET + 1 ) , JA_NEW ,
368 $ DESCA_1XP , B , IB , DESCB_PX1 , AF , LAF , WORK ,
369 $ LWORK , INFO )
370
371 ELSE
372
372
373 CALL PSDTTRSV ( 'U' , 'N' , N , NRHS , DL( PART_OFFSET + 1 ) ,
374 $ D( PART_OFFSET + 1 ) , DU( PART_OFFSET + 1 ) , JA_NEW ,
375 $ DESCA_1XP , B , IB , DESCB_PX1 , AF , LAF , WORK ,
376 $ LWORK , INFO )
377
378 END IF
379 10 CONTINUE
380
381 * Free BLACS space used to hold standard - form grid.
382
383 IF( ICTXT_SAVE.NE.ICTXT_NEW ) THEN
383
384 CALL BLACS_GRIDEXIT( ICTXT_NEW )
385 END IF
386
387 20 CONTINUE
388
389 * Restore saved input parameters
390
391 ICTXT = ICTXT_SAVE
392 NP = NP_SAVE
393
394 * Output minimum worksize
395
396 WORK( 1 ) = WORK_SIZE_MIN
397
398 RETURN
399
400 * End of PSDTTRS
401
402 END103
42
|
|
Variables in Routine PSDTTRS()
| Summary Report |
| Data Type | Quantity | Size(byte) |
| CHARACTER | 1 | 1 |
| INTEGER | 51 | 316 |
| LOGICAL | 1 | 1 |
| REAL | 2 | 8 |
| TOTAL | 55 | 326 |
List of Variables
CHARACTER
INTEGER
| BIGNUM | BLOCK_CYCLIC_2D | CSRC | CSRC_ | CTXT_ |
| DESCA_1XP( 7 ) | DESCB_PX1( 7 ) | DESCMULT | DLEN_ | DTYPE_ |
| FIRST_PROC | IB | ICTXT | ICTXT_NEW | ICTXT_SAVE |
| IDUM2 | IDUM3 | INFO | INT_ONE | JA |
| JA_NEW | LAF | LLD_ | LLDA | LLDB |
| LWORK | M_ | MB_ | MY_NUM_COLS | MYCOL |
| MYROW | N | N_ | NB | NB_ |
| NP | NP_SAVE | NPCOL | NPROW | NRHS |
| NUMROC | ODD_SIZE | PARAM_CHECK( 15, 3 ) | PART_OFFSET | PART_SIZE |
| RETURN_CODE | RSRC_ | STORE_M_B | STORE_N_A | TEMP |
| WORK_SIZE_MIN | | | | |
LOGICAL
REAL
Variables Dependence Graph Put the mouse over a right hand side variable to display the corresponding line of the dependence | | - | | - | - | | CSRC | <--- | DESCA_1XPCSRC = DESCA_1XP( 5 ) |
| DESCA | <--- | TEMPDESCA( DTYPE_ ) = TEMP |
| DESCA_1XP | <--- | ICTXT_NEWDESCA_1XP( 2 ) = ICTXT_NEW |
| DESCB_PX1 | <--- | ICTXT_NEWDESCB_PX1( 2 ) = ICTXT_NEW |
| FIRST_PROC | <--- | JAFIRST_PROC = MOD( ( JA-1 ) / NB+CSRC, NPCOL ), CSRCFIRST_PROC = MOD( ( JA-1 ) / NB+CSRC, NPCOL ), NBFIRST_PROC = MOD( ( JA-1 ) / NB+CSRC, NPCOL ), NPCOLFIRST_PROC = MOD( ( JA-1 ) / NB+CSRC, NPCOL ) |
| ICTXT | <--- | ICTXT_NEWICTXT = ICTXT_NEW, ICTXT_SAVEICTXT = ICTXT_SAVE, DESCA_1XPICTXT = DESCA_1XP( 2 ) |
| ICTXT_SAVE | <--- | ICTXTICTXT_SAVE = ICTXT |
| IDUM2 | <--- | NIDUM2 = ICHAR( 'N' ) |
| INFO | <--- | BIGNUMINFO = BIGNUM, INFOINFO = -INFO{2INFO = -INFO*DESCMULT, 3INFO = -INFO / DESCMULT, 4INFO = -INFO}, DESCMULTINFO = -INFO*DESCMULT{2INFO = -INFO / DESCMULT} |
| JA_NEW | <--- | JAJA_NEW = MOD( JA-1, NB ) + 1, NBJA_NEW = MOD( JA-1, NB ) + 1 |
| LLDA | <--- | DESCA_1XPLLDA = DESCA_1XP( 6 ) |
| LLDB | <--- | DESCB_PX1LLDB = DESCB_PX1( 6 ) |
| MY_NUM_COLS | <--- | JA_NEWMY_NUM_COLS = MY_NUM_COLS - MOD( JA_NEW-1, PART_SIZE ), MY_NUM_COLSMY_NUM_COLS = MY_NUM_COLS - MOD( JA_NEW-1, PART_SIZE ), MYCOLMY_NUM_COLS = NUMROC( N, PART_SIZE, MYCOL, 0, NPCOL ), NMY_NUM_COLS = NUMROC( N, PART_SIZE, MYCOL, 0, NPCOL ), NPCOLMY_NUM_COLS = NUMROC( N, PART_SIZE, MYCOL, 0, NPCOL ), NUMROCMY_NUM_COLS = NUMROC( N, PART_SIZE, MYCOL, 0, NPCOL ), PART_SIZEMY_NUM_COLS = NUMROC( N, PART_SIZE, MYCOL, 0, NPCOL ){2MY_NUM_COLS = MY_NUM_COLS - MOD( JA_NEW-1, PART_SIZE )} |
| NB | <--- | DESCA_1XPNB = DESCA_1XP( 4 ) |
| NP | <--- | JA_NEWNP = ( JA_NEW+N-2 ) / NB + 1, NNP = ( JA_NEW+N-2 ) / NB + 1, NBNP = ( JA_NEW+N-2 ) / NB + 1, NP_SAVENP = NP_SAVE, NPCOLNP = NPROW*NPCOL, NPROWNP = NPROW*NPCOL |
| NP_SAVE | <--- | NPNP_SAVE = NP |
| ODD_SIZE | <--- | INT_ONEODD_SIZE = ODD_SIZE - INT_ONE, MY_NUM_COLSODD_SIZE = MY_NUM_COLS, ODD_SIZEODD_SIZE = ODD_SIZE - INT_ONE |
| PARAM_CHECK | <--- | IBPARAM_CHECK( 10, 1 ) = IB, IDUM2PARAM_CHECK( 1, 1 ) = IDUM2, IDUM3PARAM_CHECK( 2, 1 ) = IDUM3, JAPARAM_CHECK( 5, 1 ) = JA, NPARAM_CHECK( 3, 1 ) = N, NRHSPARAM_CHECK( 4, 1 ) = NRHS, DESCAPARAM_CHECK( 9, 1 ) = DESCA( 5 ){2PARAM_CHECK( 8, 1 ) = DESCA( 4 ), 3PARAM_CHECK( 7, 1 ) = DESCA( 3 ), 4PARAM_CHECK( 6, 1 ) = DESCA( 1 )} |
| PART_OFFSET | <--- | JAPART_OFFSET = NB*( ( JA-1 ) / ( NPCOL*NB ) ), JA_NEWPART_OFFSET = PART_OFFSET + MOD( JA_NEW-1, PART_SIZE ), NBPART_OFFSET = NB*( ( JA-1 ) / ( NPCOL*NB ) ){2PART_OFFSET = PART_OFFSET + NB, 3PART_OFFSET = PART_OFFSET - NB}, NPCOLPART_OFFSET = NB*( ( JA-1 ) / ( NPCOL*NB ) ), PART_OFFSETPART_OFFSET = PART_OFFSET + NB{2PART_OFFSET = PART_OFFSET - NB, 3PART_OFFSET = PART_OFFSET + MOD( JA_NEW-1, PART_SIZE )}, PART_SIZEPART_OFFSET = PART_OFFSET + MOD( JA_NEW-1, PART_SIZE ) |
| PART_SIZE | <--- | NBPART_SIZE = NB |
| STORE_M_B | <--- | DESCB_PX1STORE_M_B = DESCB_PX1( 3 ) |
| STORE_N_A | <--- | DESCA_1XPSTORE_N_A = DESCA_1XP( 3 ) |
| TEMP | <--- | DTYPE_TEMP = DESCA( DTYPE_ ), DESCATEMP = DESCA( DTYPE_ ) |
| WORK | <--- | WORK_SIZE_MINWORK( 1 ) = WORK_SIZE_MIN{2WORK( 1 ) = WORK_SIZE_MIN} |
| WORK_SIZE_MIN | <--- | NPCOLWORK_SIZE_MIN = 10*NPCOL + 4*NRHS, NRHSWORK_SIZE_MIN = 10*NPCOL + 4*NRHS |
|
|
Analysis elements of the routine PSDTTRS() Put the mouse over each element to display detailed matching information
Assigned variables |
| | | BIGNUM , BLOCK_CYCLIC_2D , CSRC , CSRC_ , CTXT_ , DESCA_1XP , DESCB_PX1 , DESCMULT , DLEN_ , DTYPE_ , FIRST_PROC , ICTXT , ICTXT_SAVE , IDUM2 , IDUM3 , INFO , INT_ONE , JA_NEW , LLD_ , LLDA , LLDB , M_ , MB_ , MY_NUM_COLS , N_ , NB , NB_ , NP , NP_SAVE , ODD_SIZE , PARAM_CHECK , PART_OFFSET , PART_SIZE , RSRC_ , STORE_M_B , STORE_N_A , TEMP , WORK , WORK_SIZE_MIN |
|
Active variables |
| | | AF , B , BIGNUM , BLOCK_CYCLIC_2D , CSRC , CSRC_ , CTXT_ , D , DESCA , DESCA_1XP , DESCB , DESCB_PX1 , DESCMULT , DL , DLEN_ , DTYPE_ , DU , FIRST_PROC , IB , ICTXT , ICTXT_NEW , ICTXT_SAVE , IDUM2 , IDUM3 , INFO , INT_ONE , JA , JA_NEW , LAF , LLD_ , LLDA , LLDB , LSAME , LWORK , M_ , MB_ , MY_NUM_COLS , MYCOL , MYROW , N , N_ , NB , NB_ , NP , NP_SAVE , NPCOL , NPROW , NRHS , NUMROC , ODD_SIZE , PARAM_CHECK , PART_OFFSET , PART_SIZE , RETURN_CODE , RSRC_ , STORE_M_B , STORE_N_A , TEMP , TRANS , WORK , WORK_SIZE_MIN |
|
Allocated variables [ statement : associated variable ] |
| | new | : a, about, Calculate, compute, Use |
|
Desallocated variables [ statement : associated variable ] |
| | free | : BLACS |
|
Accessed arrays [ array name : associated index ] |
| | D | : PART_OFFSET+1 , PART_OFFSET+1 , PART_OFFSET+1 , PART_OFFSET+1 |
| | DESCA | : 1 , 3 , 4 , 5 , DTYPE_ , DTYPE_ , DTYPE_ |
| | DESCA_1XP | : 1 , 2 , 2 , 2 , 3 , 4 , 4 , 5 , 5 , 6 , 7 |
| | DESCB | : 1 , 2 , 3 , 4 , 5 |
| | DESCB_PX1 | : 1 , 2 , 2 , 3 , 4 , 5 , 6 , 7 |
| | DL | : PART_OFFSET+1 , PART_OFFSET+1 , PART_OFFSET+1 , PART_OFFSET+1 |
| | DU | : PART_OFFSET+1 , PART_OFFSET+1 , PART_OFFSET+1 , PART_OFFSET+1 |
| | LSAME | : TRANS, 'C' , TRANS, 'C' , TRANS, 'N' , TRANS, 'N' , TRANS, 'T' , TRANS, 'T' |
| | NUMROC | : N, PART_SIZE, MYCOL, 0, NPCOL |
| | PARAM_CHECK | : 1, 1 , 1, 2 , 1, 3 , 10, 1 , 10, 2 , 11, 1 , 11, 2 , 12, 1 , 12, 2 , 13, 1 , 13, 2 , 14, 1 , 14, 2 , 15, 1 , 15, 2 , 15, 3 , 2, 1 , 2, 2 , 3, 1 , 3, 2 , 4, 1 , 4, 2 , 5, 1 , 5, 2 , 6, 1 , 6, 2 , 7, 1 , 7, 2 , 8, 1 , 8, 2 , 9, 1 , 9, 2 |
| | WORK | : 1 , 1 |
|
Conditional statements [ statement : associated predicate ] |
| | do | : ( not have part of the matrix. ) |
| | for | : ( easy access to ) , ( DESCA and DESCB. ) , ( use in code. ) , ( global consistency check ) |
| | if | : ( TEMP.EQ.502 ) , ( RETURN_CODE.NE.0 ) , ( RETURN_CODE.NE.0 ) , ( (DESCA_1XP( 2 ).NE.DESCB_PX1( 2 ) ) ) , ( (DESCA_1XP( 4 ).NE.DESCB_PX1( 4 ) ) ) , ( (DESCA_1XP( 5 ).NE.DESCB_PX1( 5 ) ) ) , ( (LSAME( TRANS , 'N' ) ) ) , ( (LSAME( TRANS , 'T' ) ) ) , ( (LSAME( TRANS , 'C' ) ) ) , ( LWORK.LT. - 1 ) , ( LWORK.EQ. - 1 ) , ( N.LT.0 ) , ( N+JA-1.GT.STORE_N_A ) , ( N+IB-1.GT.STORE_M_B ) , ( LLDB.LT.NB ) , ( NRHS.LT.0 ) , ( JA.NE.IB ) , ( NPROW.NE.1 ) , ( (N.GT.NP*NB - MOD( JA - 1 , NB ) ) ) , ( (( JA+N - 1.GT.NB ) .AND. ( NB.LT.2*INT_ONE ) ) ) , ( LWORK.LT.WORK_SIZE_MIN ) , ( LWORK.NE. - 1 ) , ( no error , set it to a big ) , ( there already is an error , multiply by the the ) , ( INFO.GE.0 ) , ( INFO.LT. - DESCMULT ) , ( no error , and divide by DESCMULT ) , ( error is not in a descriptor entry. ) , ( INFO.EQ.BIGNUM ) , ( (MOD( INFO , DESCMULT ).EQ.0 ) ) , ( INFO.LT.0 ) , ( possible ) , ( N.EQ.0 ) , ( NRHS.EQ.0 ) , ( (( MYCOL - CSRC ).LT.( JA - PART_OFFSET - 1 ) / NB ) ) , ( MYCOL.LT.CSRC ) , ( MYROW.LT.0 ) , ( MYCOL.EQ.0 ) , ( MYCOL.LT.NP - 1 ) , ( (LSAME( TRANS , 'N' ) ) ) , ( (( LSAME( TRANS , 'C' ) ) .OR. ( LSAME( TRANS , 'T' ) ) ) ) , ( ICTXT_SAVE.NE.ICTXT_NEW ) |
| | while | : ( dropping off unused processors. ) |
|
| List of variables | BIGNUM BLOCK_CYCLIC_2D CSRC CSRC_ CTXT_ DESCA DESCA_1XP( 7 )
| DESCB_PX1( 7 ) DESCMULT DLEN_ DTYPE_ FIRST_PROC IB ICTXT ICTXT_NEW
| ICTXT_SAVE IDUM2 IDUM3 INFO INT_ONE JA JA_NEW LAF
| LLD_ LLDA LLDB LSAME LWORK M_ MB_ MY_NUM_COLS
| MYCOL MYROW N N_ NB NB_ NP NP_SAVE
| NPCOL NPROW NRHS NUMROC ODD_SIZE PARAM_CHECK( 15, 3 ) PART_OFFSET PART_SIZE
| RETURN_CODE RSRC_ STORE_M_B STORE_N_A TEMP TRANS WORK WORK_SIZE_MIN | | close
| |
BIGNUM
BLOCK_CYCLIC_2D
CSRC
CSRC_
CTXT_
DESCA
DESCA_1XP( 7 )
DESCB_PX1( 7 )
DESCMULT
DLEN_
DTYPE_
FIRST_PROC
IB
ICTXT
ICTXT_NEW
ICTXT_SAVE
IDUM2
IDUM3
INFO
INT_ONE
JA
JA_NEW
LAF
LLD_
LLDA
LLDB
LSAME
LWORK
M_
MB_
MY_NUM_COLS
MYCOL
MYROW
N
N_
NB
NB_
NP
NP_SAVE
NPCOL
NPROW
NRHS
NUMROC
ODD_SIZE
PARAM_CHECK( 15, 3 )
PART_OFFSET
PART_SIZE
RETURN_CODE
RSRC_
STORE_M_B
STORE_N_A
TEMP
TRANS
WORK
WORK_SIZE_MIN
322
| |