|
|
| |
| # lines: |
735 | | # code: |
735 | | # comment: | 0 | |
# blank: | 0 |
| # Variables: | 50 |
| # Callers: | 1 |
| # Callings: | 1 |
| # Words: | 250 |
| # Keywords: | 160 |
|
|
|
|
|
..
.. Array Arguments ..
..
Purpose
=======
PDPBTRS solves a system of linear equations
A(1:N, JA:JA+N-1) * X = B(IB:IB+N-1, 1:NRHS)
where A(1:N, JA:JA+N-1) is the matrix used to produce the factors
stored in A(1:N,JA:JA+N-1) and AF by PDPBTRF.
A(1:N, JA:JA+N-1) is an N-by-N real
banded symmetric positive definite distributed
matrix with bandwidth BW.
Depending on the value of UPLO, A stores either U or L in the equn
A(1:N, JA:JA+N-1) = U'*U or L*L' as computed by PDPBTRF.
Routine PDPBTRF MUST be called first.
=====================================================================
Arguments
=========
UPLO (global input) CHARACTER
= 'U': Upper triangle of A(1:N, JA:JA+N-1) is stored;
= 'L': Lower triangle of A(1:N, JA:JA+N-1) is stored.
N (global input) INTEGER
The number of rows and columns to be operated on, i.e. the
order of the distributed submatrix A(1:N, JA:JA+N-1). N >= 0.
BW (global input) INTEGER
Number of subdiagonals in L or U. 0 <= BW <= N-1
NRHS (global input) INTEGER
The number of right hand sides, i.e., the number of columns
of the distributed submatrix B(IB:IB+N-1, 1:NRHS).
NRHS >= 0.
A (local input/local output) DOUBLE PRECISION pointer into
local memory to an array with first dimension
LLD_A >=(bw+1) (stored in DESCA).
On entry, this array contains the local pieces of the
N-by-N symmetric banded distributed Cholesky factor L or
L^T A(1:N, JA:JA+N-1).
This local portion is stored in the packed banded format
used in LAPACK. Please see the Notes below and the
ScaLAPACK manual for more detail on the format of
distributed matrices.
JA (global input) INTEGER
The index in the global array A that points to the start of
the matrix to be operated on (which may be either all of A
or a submatrix of A).
DESCA (global and local input) INTEGER array of dimension DLEN.
if 1D type (DTYPE_A=501), DLEN >= 7;
if 2D type (DTYPE_A=1), DLEN >= 9 .
The array descriptor for the distributed matrix A.
Contains information of mapping of A to memory. Please
see NOTES below for full description and options.
B (local input/local output) DOUBLE PRECISION pointer into
local memory to an array of local lead dimension lld_b>=NB.
On entry, this array contains the
the local pieces of the right hand sides
B(IB:IB+N-1, 1:NRHS).
On exit, this contains the local piece of the solutions
distributed matrix X.
IB (global input) INTEGER
The row index in the global array B that points to the first
row of the matrix to be operated on (which may be either
all of B or a submatrix of B).
DESCB (global and local input) INTEGER array of dimension DLEN.
if 1D type (DTYPE_B=502), DLEN >=7;
if 2D type (DTYPE_B=1), DLEN >= 9.
The array descriptor for the distributed matrix B.
Contains information of mapping of B to memory. Please
see NOTES below for full description and options.
AF (local output) DOUBLE PRECISION array, dimension LAF.
Auxiliary Fillin Space.
Fillin is created during the factorization routine
PDPBTRF and this is stored in AF. If a linear system
is to be solved using PDPBTRS after the factorization
routine, AF *must not be altered* after the factorization.
LAF (local input) INTEGER
Size of user-input Auxiliary Fillin space AF. Must be >=
(NB+2*bw)*bw
If LAF is not large enough, an error code will be returned
and the minimum acceptable size will be returned in AF( 1 )
WORK (local workspace/local output)
DOUBLE PRECISION temporary workspace. This space may
be overwritten in between calls to routines. WORK must be
the size given in LWORK.
On exit, WORK( 1 ) contains the minimal LWORK.
LWORK (local input or global input) INTEGER
Size of user-input workspace WORK.
If LWORK is too small, the minimal acceptable size will be
returned in WORK(1) and an error code is returned. LWORK>=
(bw*NRHS)
INFO (global output) INTEGER
= 0: successful exit
< 0: If the i-th argument is an array and the j-entry had
an illegal value, then INFO = -(i*100+j), if the i-th
argument is a scalar and had an illegal value, then
INFO = -i.
=====================================================================
Restrictions
============
The following are restrictions on the input parameters. Some of these
are temporary and will be removed in future releases, while others
may reflect fundamental technical limitations.
Non-cyclic restriction: VERY IMPORTANT!
P*NB>= mod(JA-1,NB)+N.
The mapping for matrices must be blocked, reflecting the nature
of the divide and conquer algorithm as a task-parallel algorithm.
This formula in words is: no processor may have more than one
chunk of the matrix.
Blocksize cannot be too small:
If the matrix spans more than one processor, the following
restriction on NB, the size of each block on each processor,
must hold:
NB >= 2*BW
The bulk of parallel computation is done on the matrix of size
O(NB) on each processor. If this is too small, divide and conquer
is a poor choice of algorithm.
Submatrix reference:
JA = IB
Alignment restriction that prevents unnecessary communication.
=====================================================================
Notes
=====
If the factorization routine and the solve routine are to be called
separately (to solve various sets of righthand sides using the same
coefficient matrix), the auxiliary space AF *must not be altered*
between calls to the factorization routine and the solve routine.
The best algorithm for solving banded and tridiagonal linear systems
depends on a variety of parameters, especially the bandwidth.
Currently, only algorithms designed for the case N/P >> bw are
implemented. These go by many names, including Divide and Conquer,
Partitioning, domain decomposition-type, etc.
Algorithm description: Divide and Conquer
The Divide and Conqer algorithm assumes the matrix is narrowly
banded compared with the number of equations. In this situation,
it is best to distribute the input matrix A one-dimensionally,
with columns atomic and rows divided amongst the processes.
The basic algorithm divides the banded matrix up into
P pieces with one stored on each processor,
and then proceeds in 2 phases for the factorization or 3 for the
solution of a linear system.
1) Local Phase:
The individual pieces are factored independently and in
parallel. These factors are applied to the matrix creating
fillin, which is stored in a non-inspectable way in auxiliary
space AF. Mathematically, this is equivalent to reordering
the matrix A as P A P^T and then factoring the principal
leading submatrix of size equal to the sum of the sizes of
the matrices factored on each processor. The factors of
these submatrices overwrite the corresponding parts of A
in memory.
2) Reduced System Phase:
A small (BW* (P-1)) system is formed representing
interaction of the larger blocks, and is stored (as are its
factors) in the space AF. A parallel Block Cyclic Reduction
algorithm is used. For a linear system, a parallel front solve
followed by an analagous backsolve, both using the structure
of the factored matrix, are performed.
3) Backsubsitution Phase:
For a linear system, a local backsubstitution is performed on
each processor in parallel.
Descriptors
===========
Descriptors now have *types* and differ from ScaLAPACK 1.0.
Note: banded codes can use either the old two dimensional
or new one-dimensional descriptors, though the processor grid in
both cases *must be one-dimensional*. We describe both types below.
Each global data object is described by an associated description
vector. This vector stores the information required to establish
the mapping between an object element and its corresponding process
and memory location.
Let A be a generic term for any 2D block cyclicly distributed array.
Such a global array has an associated description vector DESCA.
In the following comments, the character _ should be read as
"of the global array".
NOTATION STORED IN EXPLANATION
--------------- -------------- --------------------------------------
DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case,
DTYPE_A = 1.
CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
the BLACS process grid A is distribu-
ted over. The context itself is glo-
bal, but the handle (the integer
value) may vary.
M_A (global) DESCA( M_ ) The number of rows in the global
array A.
N_A (global) DESCA( N_ ) The number of columns in the global
array A.
MB_A (global) DESCA( MB_ ) The blocking factor used to distribute
the rows of the array.
NB_A (global) DESCA( NB_ ) The blocking factor used to distribute
the columns of the array.
RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
row of the array A is distributed.
CSRC_A (global) DESCA( CSRC_ ) The process column over which the
first column of the array A is
distributed.
LLD_A (local) DESCA( LLD_ ) The leading dimension of the local
array. LLD_A >= MAX(1,LOCr(M_A)).
Let K be the number of rows or columns of a distributed matrix,
and assume that its process grid has dimension p x q.
LOCr( K ) denotes the number of elements of K that a process
would receive if K were distributed over the p processes of its
process column.
Similarly, LOCc( K ) denotes the number of elements of K that a
process would receive if K were distributed over the q processes of
its process row.
The values of LOCr() and LOCc() may be determined via a call to the
ScaLAPACK tool function, NUMROC:
LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
An upper bound for these quantities may be computed by:
LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
One-dimensional descriptors:
One-dimensional descriptors are a new addition to ScaLAPACK since
version 1.0. They simplify and shorten the descriptor for 1D
arrays.
Since ScaLAPACK supports two-dimensional arrays as the fundamental
object, we allow 1D arrays to be distributed either over the
first dimension of the array (as if the grid were P-by-1) or the
2nd dimension (as if the grid were 1-by-P). This choice is
indicated by the descriptor type (501 or 502)
as described below.
IMPORTANT NOTE: the actual BLACS grid represented by the
CTXT entry in the descriptor may be *either* P-by-1 or 1-by-P
irrespective of which one-dimensional descriptor type
(501 or 502) is input.
This routine will interpret the grid properly either way.
ScaLAPACK routines *do not support intercontext operations* so that
the grid passed to a single ScaLAPACK routine *must be the same*
for all array descriptors passed to that routine.
NOTE: In all cases where 1D descriptors are used, 2D descriptors
may also be used, since a one-dimensional array is a special case
of a two-dimensional array with one dimension of size unity.
The two-dimensional array used in this case *must* be of the
proper orientation:
If the appropriate one-dimensional descriptor is DTYPEA=501
(1 by P type), then the two dimensional descriptor must
have a CTXT value that refers to a 1 by P BLACS grid;
If the appropriate one-dimensional descriptor is DTYPEA=502
(P by 1 type), then the two dimensional descriptor must
have a CTXT value that refers to a P by 1 BLACS grid.
Summary of allowed descriptors, types, and BLACS grids:
DTYPE 501 502 1 1
BLACS grid 1xP or Px1 1xP or Px1 1xP Px1
-----------------------------------------------------
A OK NO OK NO
B NO OK NO OK
Note that a consequence of this chart is that it is not possible
for *both* DTYPE_A and DTYPE_B to be 2D_type(1), as these lead
to opposite requirements for the orientation of the BLACS grid,
and as noted before, the *same* BLACS context must be used in
all descriptors in a single ScaLAPACK subroutine call.
Let A be a generic term for any 1D block cyclicly distributed array.
Such a global array has an associated description vector DESCA.
In the following comments, the character _ should be read as
"of the global array".
NOTATION STORED IN EXPLANATION
--------------- ---------- ------------------------------------------
DTYPE_A(global) DESCA( 1 ) The descriptor type. For 1D grids,
TYPE_A = 501: 1-by-P grid.
TYPE_A = 502: P-by-1 grid.
CTXT_A (global) DESCA( 2 ) The BLACS context handle, indicating
the BLACS process grid A is distribu-
ted over. The context itself is glo-
bal, but the handle (the integer
value) may vary.
N_A (global) DESCA( 3 ) The size of the array dimension being
distributed.
NB_A (global) DESCA( 4 ) The blocking factor used to distribute
the distributed dimension of the array.
SRC_A (global) DESCA( 5 ) The process row or column over which the
first row or column of the array
is distributed.
LLD_A (local) DESCA( 6 ) The leading dimension of the local array
storing the local blocks of the distri-
buted array A. Minimum value of LLD_A
depends on TYPE_A.
TYPE_A = 501: LLD_A >=
size of undistributed dimension, 1.
TYPE_A = 502: LLD_A >=NB_A, 1.
Reserved DESCA( 7 ) Reserved for future use.
=====================================================================
Code Developer: Andrew J. Cleary, University of Tennessee.
Current address: Lawrence Livermore National Labs.
=====================================================================
.. Parameters ..
|
|
|
|
001 SUBROUTINE PDPBTRS( UPLO , N , BW , NRHS , A , JA , DESCA , B , IB , DESCB ,
002 $AF , LAF , WORK , LWORK , INFO )
003
004 * -- ScaLAPACK routine(version 1.7) --
005 * University of Tennessee , Knoxville , Oak Ridge National Laboratory ,
006 * and University of California , Berkeley.
007 * April 3 , 2000
008
009 * .. Scalar Arguments ..
010 CHARACTER UPLO
011 INTEGER BW , IB , INFO , JA , LAF , LWORK , N , NRHS
012 INTEGER INT_ONE
013 PARAMETER( INT_ONE = 1 )
014 INTEGER DESCMULT , BIGNUM
015 PARAMETER( DESCMULT = 100 , BIGNUM = DESCMULT*DESCMULT )
016 INTEGER BLOCK_CYCLIC_2D , CSRC_ , CTXT_ , DLEN_ , DTYPE_ ,
017 $LLD_ , MB_ , M_ , NB_ , N_ , RSRC_
018 PARAMETER( BLOCK_CYCLIC_2D = 1 , DLEN_ = 9 , DTYPE_ = 1 ,
019 $CTXT_ = 2 , M_ = 3 , N_ = 4 , MB_ = 5 , NB_ = 6 ,
020 $RSRC_ = 7 , CSRC_ = 8 , LLD_ = 9 )
021 * ..
022 * .. Local Scalars ..
023 INTEGER CSRC , FIRST_PROC , ICTXT , ICTXT_NEW , ICTXT_SAVE ,
024 $IDUM1 , IDUM3 , JA_NEW , LLDA , LLDB , MYCOL , MYROW ,
025 $NB , NP , NPCOL , NPROW , NP_SAVE , PART_OFFSET ,
026 $RETURN_CODE , STORE_M_B , STORE_N_A ,
027 $WORK_SIZE_MIN
028 * ..
029 * .. Local Arrays ..
030 INTEGER DESCA_1XP( 7 ) , DESCB_PX1( 7 ) ,
031 $PARAM_CHECK( 16 , 3 )
032 * ..
033 * .. External Subroutines ..
034 EXTERNAL BLACS_GRIDEXIT , BLACS_GRIDINFO , DESC_CONVERT ,
035 $GLOBCHK , PDPBTRSV , PXERBLA , RESHAPE
036 * ..
037 * .. External Functions ..
038 LOGICAL LSAME
039 EXTERNAL LSAME
040 * ..
041 * .. Intrinsic Functions ..
042 INTRINSIC ICHAR , MOD
043 * ..
044 * .. Executable Statements ..
045
046 * Test the input parameters
047
048 INFO = 0
049
050 * Convert descriptor into standard form for easy access to
051 * parameters , check that grid is of right shape.
052
053 DESCA_1XP( 1 ) = 501
054 DESCB_PX1( 1 ) = 502
055
056 CALL DESC_CONVERT( DESCA , DESCA_1XP , RETURN_CODE )
057
058 IF( RETURN_CODE.NE.0 ) THEN
058
059 INFO = - ( 7*100 + 2 )
060 END IF
061
062 CALL DESC_CONVERT( DESCB , DESCB_PX1 , RETURN_CODE )
063
064 IF( RETURN_CODE.NE.0 ) THEN
064
065 INFO = - ( 10*100 + 2 )
066 END IF
067
068 * Consistency checks for DESCA and DESCB.
069
070 * Context must be the same
071 IF( DESCA_1XP( 2 ).NE.DESCB_PX1( 2 ) ) THEN
071
072 INFO = - ( 10*100 + 2 )
073 END IF
074
075 * These are alignment restrictions that may or may not be removed
076 * in future releases. - Andy Cleary , April 14 , 1996.
077
078 * Block sizes must be the same
079 IF( DESCA_1XP( 4 ).NE.DESCB_PX1( 4 ) ) THEN
079
080 INFO = - ( 10*100 + 4 )
081 END IF
082
083 * Source processor must be the same
084
085 IF( DESCA_1XP( 5 ).NE.DESCB_PX1( 5 ) ) THEN
085
086 INFO = - ( 10*100 + 5 )
087 END IF
088
089 * Get values out of descriptor for use in code.
090
091 ICTXT = DESCA_1XP( 2 )
092 CSRC = DESCA_1XP( 5 )
093 NB = DESCA_1XP( 4 )
094 LLDA = DESCA_1XP( 6 )
095 STORE_N_A = DESCA_1XP( 3 )
096 LLDB = DESCB_PX1( 6 )
097 STORE_M_B = DESCB_PX1( 3 )
098
099 * Get grid parameters
100
101 CALL BLACS_GRIDINFO( ICTXT , NPROW , NPCOL , MYROW , MYCOL )
102 NP = NPROW*NPCOL
103
104 IF( LSAME( UPLO , 'U' ) ) THEN
104
105 IDUM1 = ICHAR( 'U' )
106 ELSE IF( LSAME( UPLO , 'L' ) ) THEN
106
107 IDUM1 = ICHAR( 'L' )
108 ELSE
108
109 INFO = - 1
110 END IF
111
112 IF( LWORK.LT. - 1 ) THEN
112
113 INFO = - 14
114 ELSE IF( LWORK.EQ. - 1 ) THEN
114
115 IDUM3 = - 1
116 ELSE
116
117 IDUM3 = 1
118 END IF
119
120 IF( N.LT.0 ) THEN
120
121 INFO = - 2
122 END IF
123
124 IF( N + JA - 1.GT.STORE_N_A ) THEN
124
125 INFO = - ( 7*100 + 6 )
126 END IF
127
128 IF(( BW.GT.N - 1 ) .OR.( BW.LT.0 ) ) THEN
128
129 INFO = - 3
130 END IF
131
132 IF( LLDA.LT.( BW + 1 ) ) THEN
132
133 INFO = - ( 7*100 + 6 )
134 END IF
135
136 IF( NB.LE.0 ) THEN
136
137 INFO = - ( 7*100 + 4 )
138 END IF
139
140 IF( N + IB - 1.GT.STORE_M_B ) THEN
140
141 INFO = - ( 10*100 + 3 )
142 END IF
143
144 IF( LLDB.LT.NB ) THEN
144
145 INFO = - ( 10*100 + 6 )
146 END IF
147
148 IF( NRHS.LT.0 ) THEN
148
149 INFO = - 3
150 END IF
151
152 * Current alignment restriction
153
154 IF( JA.NE.IB ) THEN
154
155 INFO = - 6
156 END IF
157
158 * Argument checking that is specific to Divide & Conquer routine
159
160 IF( NPROW.NE.1 ) THEN
160
161 INFO = - ( 7*100 + 2 )
162 END IF
163
164 IF( N.GT.NP*NB - MOD( JA - 1 , NB ) ) THEN
164
165 INFO = - ( 2 )
166 CALL PXERBLA( ICTXT , 'PDPBTRS , D&C alg. : only 1 block per proc'
167 $ , - INFO )
168 RETURN
169 END IF
170
171 IF(( JA + N - 1.GT.NB ) .AND.( NB.LT.2*BW ) ) THEN
171
172 INFO = - ( 7*100 + 4 )
173 CALL PXERBLA( ICTXT , 'PDPBTRS , D&C alg. : NB too small' , - INFO )
174 RETURN
175 END IF
176
177 WORK_SIZE_MIN =( BW*NRHS )
178
179 WORK( 1 ) = WORK_SIZE_MIN
180
181 IF( LWORK.LT.WORK_SIZE_MIN ) THEN
181
182 IF( LWORK.NE. - 1 ) THEN
182
183 INFO = - 14
184 CALL PXERBLA( ICTXT , 'PDPBTRS : worksize error' , - INFO )
185 END IF
186 RETURN
187 END IF
188
189 * Pack params and positions into arrays for global consistency check
190
191 PARAM_CHECK( 16 , 1 ) = DESCB( 5 )
192 PARAM_CHECK( 15 , 1 ) = DESCB( 4 )
193 PARAM_CHECK( 14 , 1 ) = DESCB( 3 )
194 PARAM_CHECK( 13 , 1 ) = DESCB( 2 )
195 PARAM_CHECK( 12 , 1 ) = DESCB( 1 )
196 PARAM_CHECK( 11 , 1 ) = IB
197 PARAM_CHECK( 10 , 1 ) = DESCA( 5 )
198 PARAM_CHECK( 9 , 1 ) = DESCA( 4 )
199 PARAM_CHECK( 8 , 1 ) = DESCA( 3 )
200 PARAM_CHECK( 7 , 1 ) = DESCA( 1 )
201 PARAM_CHECK( 6 , 1 ) = JA
202 PARAM_CHECK( 5 , 1 ) = NRHS
203 PARAM_CHECK( 4 , 1 ) = BW
204 PARAM_CHECK( 3 , 1 ) = N
205 PARAM_CHECK( 2 , 1 ) = IDUM3
206 PARAM_CHECK( 1 , 1 ) = IDUM1
207
208 PARAM_CHECK( 16 , 2 ) = 1005
209 PARAM_CHECK( 15 , 2 ) = 1004
210 PARAM_CHECK( 14 , 2 ) = 1003
211 PARAM_CHECK( 13 , 2 ) = 1002
212 PARAM_CHECK( 12 , 2 ) = 1001
213 PARAM_CHECK( 11 , 2 ) = 9
214 PARAM_CHECK( 10 , 2 ) = 705
215 PARAM_CHECK( 9 , 2 ) = 704
216 PARAM_CHECK( 8 , 2 ) = 703
217 PARAM_CHECK( 7 , 2 ) = 701
218 PARAM_CHECK( 6 , 2 ) = 6
219 PARAM_CHECK( 5 , 2 ) = 4
220 PARAM_CHECK( 4 , 2 ) = 3
221 PARAM_CHECK( 3 , 2 ) = 2
222 PARAM_CHECK( 2 , 2 ) = 14
223 PARAM_CHECK( 1 , 2 ) = 1
224
225 * Want to find errors with MIN( ) , so if no error , set it to a big
226 * number. If there already is an error , multiply by the the
227 * descriptor multiplier.
228
229 IF( INFO.GE.0 ) THEN
229
230 INFO = BIGNUM
231 ELSE IF( INFO.LT. - DESCMULT ) THEN
231
232 INFO = - INFO
233 ELSE
233
234 INFO = - INFO*DESCMULT
235 END IF
236
237 * Check consistency across processors
238
239 CALL GLOBCHK( ICTXT , 16 , PARAM_CHECK , 16 , PARAM_CHECK( 1 , 3 ) ,
240 $INFO )
241
242 * Prepare output : set info = 0 if no error , and divide by DESCMULT
243 * if error is not in a descriptor entry.
244
245 IF( INFO.EQ.BIGNUM ) THEN
245
246 INFO = 0
247 ELSE IF( MOD( INFO , DESCMULT ).EQ.0 ) THEN
247
248 INFO = - INFO / DESCMULT
249 ELSE
249
250 INFO = - INFO
251 END IF
252
253 IF( INFO.LT.0 ) THEN
253
254 CALL PXERBLA( ICTXT , 'PDPBTRS' , - INFO )
255 RETURN
256 END IF
257
258 * Quick return if possible
259
260 IF( N.EQ.0 )
260
261 $ RETURN
262
263 IF( NRHS.EQ.0 )
263
264 $ RETURN
265
266 * Adjust addressing into matrix space to properly get into
267 * the beginning part of the relevant data
268
269 PART_OFFSET = NB*(( JA - 1 ) / ( NPCOL*NB ) )
270
271 IF(( MYCOL - CSRC ).LT.( JA - PART_OFFSET - 1 ) / NB ) THEN
272 PART_OFFSET = PART_OFFSET + NB
273 END IF
274
275 IF( MYCOL.LT.CSRC ) THEN
275
276 PART_OFFSET = PART_OFFSET - NB
277 END IF
278
279 * Form a new BLACS grid(the "standard form" grid) with only procs
280 * holding part of the matrix , of size 1xNP where NP is adjusted ,
281 * starting at csrc = 0 , with JA modified to reflect dropped procs.
282
283 * First processor to hold part of the matrix :
284
285 FIRST_PROC = MOD(( JA - 1 ) / NB + CSRC , NPCOL )
286
287 * Calculate new JA one while dropping off unused processors.
288
289 JA_NEW = MOD( JA - 1 , NB ) + 1
290
291 * Save and compute new value of NP
292
293 NP_SAVE = NP
294 NP =( JA_NEW + N - 2 ) / NB + 1
295
296 * Call utility routine that forms "standard-form" grid
297
298 CALL RESHAPE( ICTXT , INT_ONE , ICTXT_NEW , INT_ONE , FIRST_PROC ,
299 $ INT_ONE , NP )
300
301 * Use new context from standard grid as context.
302
303 ICTXT_SAVE = ICTXT
304 ICTXT = ICTXT_NEW
305 DESCA_1XP( 2 ) = ICTXT_NEW
306 DESCB_PX1( 2 ) = ICTXT_NEW
307
308 * Get information about new grid.
309
310 CALL BLACS_GRIDINFO( ICTXT , NPROW , NPCOL , MYROW , MYCOL )
311
312 * Drop out processors that do not have part of the matrix.
313
314 IF( MYROW.LT.0 ) THEN
314
315 GO TO 20
316 END IF
317
318 * Begin main code
319
320 INFO = 0
321
322 * Call frontsolve routine
323
324 IF( LSAME( UPLO , 'L' ) ) THEN
325
325
326 CALL PDPBTRSV ( 'L' , 'N' , N , BW , NRHS , A( PART_OFFSET + 1 ) ,
327 $ JA_NEW , DESCA_1XP , B , IB , DESCB_PX1 , AF , LAF ,
328 $ WORK , LWORK , INFO )
329
330 ELSE
331
331
332 CALL PDPBTRSV ( 'U' , 'T' , N , BW , NRHS , A( PART_OFFSET + 1 ) ,
333 $ JA_NEW , DESCA_1XP , B , IB , DESCB_PX1 , AF , LAF ,
334 $ WORK , LWORK , INFO )
335
336 END IF
337
338 * Call backsolve routine
339
340 IF( LSAME( UPLO , 'L' ) ) THEN
341
341
342 CALL PDPBTRSV ( 'L' , 'T' , N , BW , NRHS , A( PART_OFFSET + 1 ) ,
343 $ JA_NEW , DESCA_1XP , B , IB , DESCB_PX1 , AF , LAF ,
344 $ WORK , LWORK , INFO )
345
346 ELSE
347
347
348 CALL PDPBTRSV ( 'U' , 'N' , N , BW , NRHS , A( PART_OFFSET + 1 ) ,
349 $ JA_NEW , DESCA_1XP , B , IB , DESCB_PX1 , AF , LAF ,
350 $ WORK , LWORK , INFO )
351
352 END IF
353 10 CONTINUE
354
355 * Free BLACS space used to hold standard - form grid.
356
357 IF( ICTXT_SAVE.NE.ICTXT_NEW ) THEN
357
358 CALL BLACS_GRIDEXIT( ICTXT_NEW )
359 END IF
360
361 20 CONTINUE
362
363 * Restore saved input parameters
364
365 ICTXT = ICTXT_SAVE
366 NP = NP_SAVE
367
368 * Output minimum worksize
369
370 WORK( 1 ) = WORK_SIZE_MIN
371
372 RETURN
373
374 * End of PDPBTRS
375
376 END95
41
|
|
Variables in Routine PDPBTRS()
| Summary Report |
| Data Type | Quantity | Size(byte) |
| CHARACTER | 1 | 1 |
| INTEGER | 47 | 304 |
| LOGICAL | 1 | 1 |
| REAL | 1 | 4 |
| TOTAL | 50 | 310 |
List of Variables
CHARACTER
INTEGER
| BIGNUM | BLOCK_CYCLIC_2D | BW | CSRC | CSRC_ |
| CTXT_ | DESCA_1XP( 7 ) | DESCB_PX1( 7 ) | DESCMULT | DLEN_ |
| DTYPE_ | FIRST_PROC | IB | ICTXT | ICTXT_NEW |
| ICTXT_SAVE | IDUM1 | IDUM3 | INFO | INT_ONE |
| JA | JA_NEW | LAF | LLD_ | LLDA |
| LLDB | LWORK | M_ | MB_ | MYCOL |
| MYROW | N | N_ | NB | NB_ |
| NP | NP_SAVE | NPCOL | NPROW | NRHS |
| PARAM_CHECK( 16, 3 ) | PART_OFFSET | RETURN_CODE | RSRC_ | STORE_M_B |
| STORE_N_A | WORK_SIZE_MIN | | | |
LOGICAL
REAL
Variables Dependence Graph Put the mouse over a right hand side variable to display the corresponding line of the dependence | | - | | - | - | | CSRC | <--- | DESCA_1XPCSRC = DESCA_1XP( 5 ) |
| DESCA_1XP | <--- | ICTXT_NEWDESCA_1XP( 2 ) = ICTXT_NEW |
| DESCB_PX1 | <--- | ICTXT_NEWDESCB_PX1( 2 ) = ICTXT_NEW |
| FIRST_PROC | <--- | JAFIRST_PROC = MOD( ( JA-1 ) / NB+CSRC, NPCOL ), NBFIRST_PROC = MOD( ( JA-1 ) / NB+CSRC, NPCOL ), NPCOLFIRST_PROC = MOD( ( JA-1 ) / NB+CSRC, NPCOL ), CSRCFIRST_PROC = MOD( ( JA-1 ) / NB+CSRC, NPCOL ) |
| ICTXT | <--- | ICTXT_NEWICTXT = ICTXT_NEW, ICTXT_SAVEICTXT = ICTXT_SAVE, DESCA_1XPICTXT = DESCA_1XP( 2 ) |
| ICTXT_SAVE | <--- | ICTXTICTXT_SAVE = ICTXT |
| INFO | <--- | BIGNUMINFO = BIGNUM, INFOINFO = -INFO{2INFO = -INFO*DESCMULT, 3INFO = -INFO / DESCMULT, 4INFO = -INFO}, DESCMULTINFO = -INFO*DESCMULT{2INFO = -INFO / DESCMULT} |
| JA_NEW | <--- | JAJA_NEW = MOD( JA-1, NB ) + 1, NBJA_NEW = MOD( JA-1, NB ) + 1 |
| LLDA | <--- | DESCA_1XPLLDA = DESCA_1XP( 6 ) |
| LLDB | <--- | DESCB_PX1LLDB = DESCB_PX1( 6 ) |
| NB | <--- | DESCA_1XPNB = DESCA_1XP( 4 ) |
| NP | <--- | JA_NEWNP = ( JA_NEW+N-2 ) / NB + 1, NNP = ( JA_NEW+N-2 ) / NB + 1, NBNP = ( JA_NEW+N-2 ) / NB + 1, NP_SAVENP = NP_SAVE, NPCOLNP = NPROW*NPCOL, NPROWNP = NPROW*NPCOL |
| NP_SAVE | <--- | NPNP_SAVE = NP |
| PARAM_CHECK | <--- | IBPARAM_CHECK( 11, 1 ) = IB, IDUM1PARAM_CHECK( 1, 1 ) = IDUM1, IDUM3PARAM_CHECK( 2, 1 ) = IDUM3, JAPARAM_CHECK( 6, 1 ) = JA, BWPARAM_CHECK( 4, 1 ) = BW, NPARAM_CHECK( 3, 1 ) = N, NRHSPARAM_CHECK( 5, 1 ) = NRHS |
| PART_OFFSET | <--- | JAPART_OFFSET = NB*( ( JA-1 ) / ( NPCOL*NB ) ), NBPART_OFFSET = NB*( ( JA-1 ) / ( NPCOL*NB ) ){2PART_OFFSET = PART_OFFSET + NB, 3PART_OFFSET = PART_OFFSET - NB}, NPCOLPART_OFFSET = NB*( ( JA-1 ) / ( NPCOL*NB ) ), PART_OFFSETPART_OFFSET = PART_OFFSET + NB{2PART_OFFSET = PART_OFFSET - NB} |
| STORE_M_B | <--- | DESCB_PX1STORE_M_B = DESCB_PX1( 3 ) |
| STORE_N_A | <--- | DESCA_1XPSTORE_N_A = DESCA_1XP( 3 ) |
| WORK | <--- | WORK_SIZE_MINWORK( 1 ) = WORK_SIZE_MIN{2WORK( 1 ) = WORK_SIZE_MIN} |
| WORK_SIZE_MIN | <--- | BWWORK_SIZE_MIN = ( BW*NRHS ), NRHSWORK_SIZE_MIN = ( BW*NRHS ) |
|
|
Analysis elements of the routine PDPBTRS() Put the mouse over each element to display detailed matching information
Assigned variables |
| | | BIGNUM , BLOCK_CYCLIC_2D , CSRC , CSRC_ , CTXT_ , DESCA_1XP , DESCB_PX1 , DESCMULT , DLEN_ , DTYPE_ , FIRST_PROC , ICTXT , ICTXT_SAVE , IDUM1 , IDUM3 , INFO , INT_ONE , JA_NEW , LLD_ , LLDA , LLDB , M_ , MB_ , N_ , NB , NB_ , NP , NP_SAVE , PARAM_CHECK , PART_OFFSET , RSRC_ , STORE_M_B , STORE_N_A , WORK , WORK_SIZE_MIN |
|
Active variables |
| | | A , AF , B , BIGNUM , BLOCK_CYCLIC_2D , BW , CSRC , CSRC_ , CTXT_ , DESCA , DESCA_1XP , DESCB , DESCB_PX1 , DESCMULT , DLEN_ , DTYPE_ , FIRST_PROC , IB , ICTXT , ICTXT_NEW , ICTXT_SAVE , IDUM1 , IDUM3 , INFO , INT_ONE , JA , JA_NEW , LAF , LLD_ , LLDA , LLDB , LSAME , LWORK , M_ , MB_ , MYCOL , MYROW , N , N_ , NB , NB_ , NP , NP_SAVE , NPCOL , NPROW , NRHS , PARAM_CHECK , PART_OFFSET , RETURN_CODE , RSRC_ , STORE_M_B , STORE_N_A , UPLO , WORK , WORK_SIZE_MIN |
|
Allocated variables [ statement : associated variable ] |
| | new | : a, about, Calculate, compute, Use |
|
Desallocated variables [ statement : associated variable ] |
| | free | : BLACS |
|
Accessed arrays [ array name : associated index ] |
| | A | : PART_OFFSET+1 , PART_OFFSET+1 , PART_OFFSET+1 , PART_OFFSET+1 |
| | DESCA | : 1 , 3 , 4 , 5 |
| | DESCA_1XP | : 1 , 2 , 2 , 2 , 3 , 4 , 4 , 5 , 5 , 6 , 7 |
| | DESCB | : 1 , 2 , 3 , 4 , 5 |
| | DESCB_PX1 | : 1 , 2 , 2 , 3 , 4 , 5 , 6 , 7 |
| | LSAME | : UPLO, 'L' , UPLO, 'L' , UPLO, 'L' , UPLO, 'U' |
| | PARAM_CHECK | : 1, 1 , 1, 2 , 1, 3 , 10, 1 , 10, 2 , 11, 1 , 11, 2 , 12, 1 , 12, 2 , 13, 1 , 13, 2 , 14, 1 , 14, 2 , 15, 1 , 15, 2 , 16, 1 , 16, 2 , 16, 3 , 2, 1 , 2, 2 , 3, 1 , 3, 2 , 4, 1 , 4, 2 , 5, 1 , 5, 2 , 6, 1 , 6, 2 , 7, 1 , 7, 2 , 8, 1 , 8, 2 , 9, 1 , 9, 2 |
| | WORK | : 1 , 1 |
|
Conditional statements [ statement : associated predicate ] |
| | do | : ( not have part of the matrix. ) |
| | for | : ( easy access to ) , ( DESCA and DESCB. ) , ( use in code. ) , ( global consistency check ) |
| | if | : ( RETURN_CODE.NE.0 ) , ( RETURN_CODE.NE.0 ) , ( (DESCA_1XP( 2 ).NE.DESCB_PX1( 2 ) ) ) , ( (DESCA_1XP( 4 ).NE.DESCB_PX1( 4 ) ) ) , ( (DESCA_1XP( 5 ).NE.DESCB_PX1( 5 ) ) ) , ( (LSAME( UPLO , 'U' ) ) ) , ( (LSAME( UPLO , 'L' ) ) ) , ( LWORK.LT. - 1 ) , ( LWORK.EQ. - 1 ) , ( N.LT.0 ) , ( N+JA-1.GT.STORE_N_A ) , ( (( BW.GT.N - 1 ) .OR. ( BW.LT.0 ) ) ) , ( (LLDA.LT.( BW + 1 ) ) ) , ( NB.LE.0 ) , ( N+IB-1.GT.STORE_M_B ) , ( LLDB.LT.NB ) , ( NRHS.LT.0 ) , ( JA.NE.IB ) , ( NPROW.NE.1 ) , ( (N.GT.NP*NB - MOD( JA - 1 , NB ) ) ) , ( (( JA+N - 1.GT.NB ) .AND. ( NB.LT.2*BW ) ) ) , ( LWORK.LT.WORK_SIZE_MIN ) , ( LWORK.NE. - 1 ) , ( no error , set it to a big ) , ( there already is an error , multiply by the the ) , ( INFO.GE.0 ) , ( INFO.LT. - DESCMULT ) , ( no error , and divide by DESCMULT ) , ( error is not in a descriptor entry. ) , ( INFO.EQ.BIGNUM ) , ( (MOD( INFO , DESCMULT ).EQ.0 ) ) , ( INFO.LT.0 ) , ( possible ) , ( N.EQ.0 ) , ( NRHS.EQ.0 ) , ( (( MYCOL - CSRC ).LT.( JA - PART_OFFSET - 1 ) / NB ) ) , ( MYCOL.LT.CSRC ) , ( MYROW.LT.0 ) , ( (LSAME( UPLO , 'L' ) ) ) , ( (LSAME( UPLO , 'L' ) ) ) , ( ICTXT_SAVE.NE.ICTXT_NEW ) |
| | while | : ( dropping off unused processors. ) |
|
| List of variables | BIGNUM BLOCK_CYCLIC_2D BW CSRC CSRC_ CTXT_ DESCA_1XP( 7 )
| DESCB_PX1( 7 ) DESCMULT DLEN_ DTYPE_ FIRST_PROC IB ICTXT ICTXT_NEW
| ICTXT_SAVE IDUM1 IDUM3 INFO INT_ONE JA JA_NEW LAF
| LLD_ LLDA LLDB LSAME LWORK M_ MB_ MYCOL
| MYROW N N_ NB NB_ NP NP_SAVE NPCOL
| NPROW NRHS PARAM_CHECK( 16, 3 ) PART_OFFSET RETURN_CODE RSRC_ STORE_M_B STORE_N_A
| UPLO WORK WORK_SIZE_MIN | | close
| |
BIGNUM
BLOCK_CYCLIC_2D
BW
CSRC
CSRC_
CTXT_
DESCA_1XP( 7 )
DESCB_PX1( 7 )
DESCMULT
DLEN_
DTYPE_
FIRST_PROC
IB
ICTXT
ICTXT_NEW
ICTXT_SAVE
IDUM1
IDUM3
INFO
INT_ONE
JA
JA_NEW
LAF
LLD_
LLDA
LLDB
LSAME
LWORK
M_
MB_
MYCOL
MYROW
N
N_
NB
NB_
NP
NP_SAVE
NPCOL
NPROW
NRHS
PARAM_CHECK( 16, 3 )
PART_OFFSET
RETURN_CODE
RSRC_
STORE_M_B
STORE_N_A
UPLO
WORK
WORK_SIZE_MIN
277
| |