|
|
| |
| # lines: |
798 | | # code: |
798 | | # comment: | 0 | |
# blank: | 0 |
| # Variables: | 60 |
| # Callers: | 1 |
| # Callings: | 1 |
| # Words: | 243 |
| # Keywords: | 142 |
|
|
|
|
|
..
.. Array Arguments ..
..
Purpose
=======
PZPTTRS solves a system of linear equations
A(1:N, JA:JA+N-1) * X = B(IB:IB+N-1, 1:NRHS)
where A(1:N, JA:JA+N-1) is the matrix used to produce the factors
stored in A(1:N,JA:JA+N-1) and AF by PZPTTRF.
A(1:N, JA:JA+N-1) is an N-by-N complex
tridiagonal symmetric positive definite distributed
matrix.
Depending on the value of UPLO, A stores either U or L in the equn
A(1:N, JA:JA+N-1) = U'D *U or L*D L' as computed by PZPTTRF.
Routine PZPTTRF MUST be called first.
=====================================================================
Arguments
=========
UPLO (global input) CHARACTER
= 'U': Upper triangle of A(1:N, JA:JA+N-1) is stored;
= 'L': Lower triangle of A(1:N, JA:JA+N-1) is stored.
N (global input) INTEGER
The number of rows and columns to be operated on, i.e. the
order of the distributed submatrix A(1:N, JA:JA+N-1). N >= 0.
NRHS (global input) INTEGER
The number of right hand sides, i.e., the number of columns
of the distributed submatrix B(IB:IB+N-1, 1:NRHS).
NRHS >= 0.
D (local input/local output) COMPLEX*16 pointer to local
part of global vector storing the main diagonal of the
matrix.
On exit, this array contains information containing the
factors of the matrix.
Must be of size >= DESCA( NB_ ).
E (local input/local output) COMPLEX*16 pointer to local
part of global vector storing the upper diagonal of the
matrix. Globally, DU(n) is not referenced, and DU must be
aligned with D.
On exit, this array contains information containing the
factors of the matrix.
Must be of size >= DESCA( NB_ ).
JA (global input) INTEGER
The index in the global array A that points to the start of
the matrix to be operated on (which may be either all of A
or a submatrix of A).
DESCA (global and local input) INTEGER array of dimension DLEN.
if 1D type (DTYPE_A=501 or 502), DLEN >= 7;
if 2D type (DTYPE_A=1), DLEN >= 9.
The array descriptor for the distributed matrix A.
Contains information of mapping of A to memory. Please
see NOTES below for full description and options.
B (local input/local output) COMPLEX*16 pointer into
local memory to an array of local lead dimension lld_b>=NB.
On entry, this array contains the
the local pieces of the right hand sides
B(IB:IB+N-1, 1:NRHS).
On exit, this contains the local piece of the solutions
distributed matrix X.
IB (global input) INTEGER
The row index in the global array B that points to the first
row of the matrix to be operated on (which may be either
all of B or a submatrix of B).
IMPORTANT NOTE: The current version of this code supports
only IB=JA
DESCB (global and local input) INTEGER array of dimension DLEN.
if 1D type (DTYPE_B=502), DLEN >=7;
if 2D type (DTYPE_B=1), DLEN >= 9.
The array descriptor for the distributed matrix B.
Contains information of mapping of B to memory. Please
see NOTES below for full description and options.
AF (local output) COMPLEX*16 array, dimension LAF.
Auxiliary Fillin Space.
Fillin is created during the factorization routine
PZPTTRF and this is stored in AF. If a linear system
is to be solved using PZPTTRS after the factorization
routine, AF *must not be altered* after the factorization.
LAF (local input) INTEGER
Size of user-input Auxiliary Fillin space AF. Must be >=
(NB+2)
If LAF is not large enough, an error code will be returned
and the minimum acceptable size will be returned in AF( 1 )
WORK (local workspace/local output)
COMPLEX*16 temporary workspace. This space may
be overwritten in between calls to routines. WORK must be
the size given in LWORK.
On exit, WORK( 1 ) contains the minimal LWORK.
LWORK (local input or global input) INTEGER
Size of user-input workspace WORK.
If LWORK is too small, the minimal acceptable size will be
returned in WORK(1) and an error code is returned. LWORK>=
(10+2*min(100,NRHS))*NPCOL+4*NRHS
INFO (local output) INTEGER
= 0: successful exit
< 0: If the i-th argument is an array and the j-entry had
an illegal value, then INFO = -(i*100+j), if the i-th
argument is a scalar and had an illegal value, then
INFO = -i.
=====================================================================
Restrictions
============
The following are restrictions on the input parameters. Some of these
are temporary and will be removed in future releases, while others
may reflect fundamental technical limitations.
Non-cyclic restriction: VERY IMPORTANT!
P*NB>= mod(JA-1,NB)+N.
The mapping for matrices must be blocked, reflecting the nature
of the divide and conquer algorithm as a task-parallel algorithm.
This formula in words is: no processor may have more than one
chunk of the matrix.
Blocksize cannot be too small:
If the matrix spans more than one processor, the following
restriction on NB, the size of each block on each processor,
must hold:
NB >= 2
The bulk of parallel computation is done on the matrix of size
O(NB) on each processor. If this is too small, divide and conquer
is a poor choice of algorithm.
Submatrix reference:
JA = IB
Alignment restriction that prevents unnecessary communication.
=====================================================================
Notes
=====
If the factorization routine and the solve routine are to be called
separately (to solve various sets of righthand sides using the same
coefficient matrix), the auxiliary space AF *must not be altered*
between calls to the factorization routine and the solve routine.
The best algorithm for solving banded and tridiagonal linear systems
depends on a variety of parameters, especially the bandwidth.
Currently, only algorithms designed for the case N/P >> bw are
implemented. These go by many names, including Divide and Conquer,
Partitioning, domain decomposition-type, etc.
For tridiagonal matrices, it is obvious: N/P >> bw(=1), and so D&C
algorithms are the appropriate choice.
Algorithm description: Divide and Conquer
The Divide and Conqer algorithm assumes the matrix is narrowly
banded compared with the number of equations. In this situation,
it is best to distribute the input matrix A one-dimensionally,
with columns atomic and rows divided amongst the processes.
The basic algorithm divides the tridiagonal matrix up into
P pieces with one stored on each processor,
and then proceeds in 2 phases for the factorization or 3 for the
solution of a linear system.
1) Local Phase:
The individual pieces are factored independently and in
parallel. These factors are applied to the matrix creating
fillin, which is stored in a non-inspectable way in auxiliary
space AF. Mathematically, this is equivalent to reordering
the matrix A as P A P^T and then factoring the principal
leading submatrix of size equal to the sum of the sizes of
the matrices factored on each processor. The factors of
these submatrices overwrite the corresponding parts of A
in memory.
2) Reduced System Phase:
A small ((P-1)) system is formed representing
interaction of the larger blocks, and is stored (as are its
factors) in the space AF. A parallel Block Cyclic Reduction
algorithm is used. For a linear system, a parallel front solve
followed by an analagous backsolve, both using the structure
of the factored matrix, are performed.
3) Backsubsitution Phase:
For a linear system, a local backsubstitution is performed on
each processor in parallel.
Descriptors
===========
Descriptors now have *types* and differ from ScaLAPACK 1.0.
Note: tridiagonal codes can use either the old two dimensional
or new one-dimensional descriptors, though the processor grid in
both cases *must be one-dimensional*. We describe both types below.
Each global data object is described by an associated description
vector. This vector stores the information required to establish
the mapping between an object element and its corresponding process
and memory location.
Let A be a generic term for any 2D block cyclicly distributed array.
Such a global array has an associated description vector DESCA.
In the following comments, the character _ should be read as
"of the global array".
NOTATION STORED IN EXPLANATION
--------------- -------------- --------------------------------------
DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case,
DTYPE_A = 1.
CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
the BLACS process grid A is distribu-
ted over. The context itself is glo-
bal, but the handle (the integer
value) may vary.
M_A (global) DESCA( M_ ) The number of rows in the global
array A.
N_A (global) DESCA( N_ ) The number of columns in the global
array A.
MB_A (global) DESCA( MB_ ) The blocking factor used to distribute
the rows of the array.
NB_A (global) DESCA( NB_ ) The blocking factor used to distribute
the columns of the array.
RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
row of the array A is distributed.
CSRC_A (global) DESCA( CSRC_ ) The process column over which the
first column of the array A is
distributed.
LLD_A (local) DESCA( LLD_ ) The leading dimension of the local
array. LLD_A >= MAX(1,LOCr(M_A)).
Let K be the number of rows or columns of a distributed matrix,
and assume that its process grid has dimension p x q.
LOCr( K ) denotes the number of elements of K that a process
would receive if K were distributed over the p processes of its
process column.
Similarly, LOCc( K ) denotes the number of elements of K that a
process would receive if K were distributed over the q processes of
its process row.
The values of LOCr() and LOCc() may be determined via a call to the
ScaLAPACK tool function, NUMROC:
LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
An upper bound for these quantities may be computed by:
LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
One-dimensional descriptors:
One-dimensional descriptors are a new addition to ScaLAPACK since
version 1.0. They simplify and shorten the descriptor for 1D
arrays.
Since ScaLAPACK supports two-dimensional arrays as the fundamental
object, we allow 1D arrays to be distributed either over the
first dimension of the array (as if the grid were P-by-1) or the
2nd dimension (as if the grid were 1-by-P). This choice is
indicated by the descriptor type (501 or 502)
as described below.
However, for tridiagonal matrices, since the objects being
distributed are the individual vectors storing the diagonals, we
have adopted the convention that both the P-by-1 descriptor and
the 1-by-P descriptor are allowed and are equivalent for
tridiagonal matrices. Thus, for tridiagonal matrices,
DTYPE_A = 501 or 502 can be used interchangeably
without any other change.
We require that the distributed vectors storing the diagonals of a
tridiagonal matrix be aligned with each other. Because of this, a
single descriptor, DESCA, serves to describe the distribution of
of all diagonals simultaneously.
IMPORTANT NOTE: the actual BLACS grid represented by the
CTXT entry in the descriptor may be *either* P-by-1 or 1-by-P
irrespective of which one-dimensional descriptor type
(501 or 502) is input.
This routine will interpret the grid properly either way.
ScaLAPACK routines *do not support intercontext operations* so that
the grid passed to a single ScaLAPACK routine *must be the same*
for all array descriptors passed to that routine.
NOTE: In all cases where 1D descriptors are used, 2D descriptors
may also be used, since a one-dimensional array is a special case
of a two-dimensional array with one dimension of size unity.
The two-dimensional array used in this case *must* be of the
proper orientation:
If the appropriate one-dimensional descriptor is DTYPEA=501
(1 by P type), then the two dimensional descriptor must
have a CTXT value that refers to a 1 by P BLACS grid;
If the appropriate one-dimensional descriptor is DTYPEA=502
(P by 1 type), then the two dimensional descriptor must
have a CTXT value that refers to a P by 1 BLACS grid.
Summary of allowed descriptors, types, and BLACS grids:
DTYPE 501 502 1 1
BLACS grid 1xP or Px1 1xP or Px1 1xP Px1
-----------------------------------------------------
A OK OK OK NO
B NO OK NO OK
Note that a consequence of this chart is that it is not possible
for *both* DTYPE_A and DTYPE_B to be 2D_type(1), as these lead
to opposite requirements for the orientation of the BLACS grid,
and as noted before, the *same* BLACS context must be used in
all descriptors in a single ScaLAPACK subroutine call.
Let A be a generic term for any 1D block cyclicly distributed array.
Such a global array has an associated description vector DESCA.
In the following comments, the character _ should be read as
"of the global array".
NOTATION STORED IN EXPLANATION
--------------- ---------- ------------------------------------------
DTYPE_A(global) DESCA( 1 ) The descriptor type. For 1D grids,
TYPE_A = 501: 1-by-P grid.
TYPE_A = 502: P-by-1 grid.
CTXT_A (global) DESCA( 2 ) The BLACS context handle, indicating
the BLACS process grid A is distribu-
ted over. The context itself is glo-
bal, but the handle (the integer
value) may vary.
N_A (global) DESCA( 3 ) The size of the array dimension being
distributed.
NB_A (global) DESCA( 4 ) The blocking factor used to distribute
the distributed dimension of the array.
SRC_A (global) DESCA( 5 ) The process row or column over which the
first row or column of the array
is distributed.
Ignored DESCA( 6 ) Ignored for tridiagonal matrices.
Reserved DESCA( 7 ) Reserved for future use.
=====================================================================
Code Developer: Andrew J. Cleary, University of Tennessee.
Current address: Lawrence Livermore National Labs.
This version released: August, 2001.
=====================================================================
..
.. Parameters ..
|
|
|
|
001 SUBROUTINE PZPTTRS( UPLO , N , NRHS , D , E , JA , DESCA , B , IB , DESCB ,
002 $AF , LAF , WORK , LWORK , INFO )
003
004 * -- ScaLAPACK routine(version 1.7) --
005 * University of Tennessee , Knoxville , Oak Ridge National Laboratory ,
006 * and University of California , Berkeley.
007 * August 7 , 2001
008
009 * .. Scalar Arguments ..
010 CHARACTER UPLO
011 INTEGER IB , INFO , JA , LAF , LWORK , N , NRHS
012 DOUBLE PRECISION ONE , ZERO
013 PARAMETER( ONE = 1.0D + 0 )
014 PARAMETER( ZERO = 0.0D + 0 )
015 COMPLEX*16 CONE , CZERO
016 PARAMETER( CONE =( 1.0D + 0 , 0.0D + 0 ) )
017 PARAMETER( CZERO =( 0.0D + 0 , 0.0D + 0 ) )
018 INTEGER INT_ONE
019 PARAMETER( INT_ONE = 1 )
020 INTEGER DESCMULT , BIGNUM
021 PARAMETER(DESCMULT = 100 , BIGNUM = DESCMULT * DESCMULT)
022 INTEGER BLOCK_CYCLIC_2D , CSRC_ , CTXT_ , DLEN_ , DTYPE_ ,
023 $LLD_ , MB_ , M_ , NB_ , N_ , RSRC_
024 PARAMETER( BLOCK_CYCLIC_2D = 1 , DLEN_ = 9 , DTYPE_ = 1 ,
025 $CTXT_ = 2 , M_ = 3 , N_ = 4 , MB_ = 5 , NB_ = 6 ,
026 $RSRC_ = 7 , CSRC_ = 8 , LLD_ = 9 )
027 * ..
028 * .. Local Scalars ..
029 INTEGER CSRC , FIRST_PROC , I , ICTXT , ICTXT_NEW ,
030 $ICTXT_SAVE , IDUM1 , IDUM3 , JA_NEW , LLDA , LLDB ,
031 $MYCOL , MYROW , MY_NUM_COLS , NB , NP , NPCOL ,
032 $NPROW , NP_SAVE , ODD_SIZE ,
033 $PART_OFFSET , PART_SIZE , RETURN_CODE , STORE_M_B ,
034 $STORE_N_A , TEMP , WORK_SIZE_MIN
035 * ..
036 * .. Local Arrays ..
037 INTEGER DESCA_1XP( 7 ) , DESCB_PX1( 7 ) ,
038 $PARAM_CHECK( 15 , 3 )
039 * ..
040 * .. External Subroutines ..
041 EXTERNAL BLACS_GRIDINFO , DESC_CONVERT , GLOBCHK , PXERBLA ,
042 $PZPTTRSV , RESHAPE
043 * ..
044 * .. External Functions ..
045 LOGICAL LSAME
046 INTEGER NUMROC
047 EXTERNAL LSAME , NUMROC
048 * ..
049 * .. Intrinsic Functions ..
050 INTRINSIC ICHAR , MIN , MOD
051 * ..
052 * .. Executable Statements ..
053
054 * Test the input parameters
055
056 INFO = 0
057
058 * Convert descriptor into standard form for easy access to
059 * parameters , check that grid is of right shape.
060
061 DESCA_1XP( 1 ) = 501
062 DESCB_PX1( 1 ) = 502
063
064 TEMP = DESCA( DTYPE_ )
065 IF( TEMP .EQ. 502 ) THEN
066 * Temporarily set the descriptor type to 1xP type
066
067 DESCA( DTYPE_ ) = 501
068 ENDIF
069
070 CALL DESC_CONVERT( DESCA , DESCA_1XP , RETURN_CODE )
071
072 DESCA( DTYPE_ ) = TEMP
073
074 IF( RETURN_CODE .NE. 0) THEN
074
075 INFO = - ( 6*100 + 2 )
076 ENDIF
077
078 CALL DESC_CONVERT( DESCB , DESCB_PX1 , RETURN_CODE )
079
080 IF( RETURN_CODE .NE. 0) THEN
080
081 INFO = - ( 9*100 + 2 )
082 ENDIF
083
084 * Consistency checks for DESCA and DESCB.
085
086 * Context must be the same
087 IF( DESCA_1XP( 2 ) .NE. DESCB_PX1( 2 ) ) THEN
087
088 INFO = - ( 9*100 + 2 )
089 ENDIF
090
091 * These are alignment restrictions that may or may not be removed
092 * in future releases. - Andy Cleary , April 14 , 1996.
093
094 * Block sizes must be the same
095 IF( DESCA_1XP( 4 ) .NE. DESCB_PX1( 4 ) ) THEN
095
096 INFO = - ( 9*100 + 4 )
097 ENDIF
098
099 * Source processor must be the same
100
101 IF( DESCA_1XP( 5 ) .NE. DESCB_PX1( 5 ) ) THEN
101
102 INFO = - ( 9*100 + 5 )
103 ENDIF
104
105 * Get values out of descriptor for use in code.
106
107 ICTXT = DESCA_1XP( 2 )
108 CSRC = DESCA_1XP( 5 )
109 NB = DESCA_1XP( 4 )
110 LLDA = DESCA_1XP( 6 )
111 STORE_N_A = DESCA_1XP( 3 )
112 LLDB = DESCB_PX1( 6 )
113 STORE_M_B = DESCB_PX1( 3 )
114
115 * Get grid parameters
116
117 CALL BLACS_GRIDINFO( ICTXT , NPROW , NPCOL , MYROW , MYCOL )
118 NP = NPROW * NPCOL
119
120 IF( LSAME( UPLO , 'U' ) ) THEN
120
121 IDUM1 = ICHAR( 'U' )
122 ELSE IF( LSAME( UPLO , 'L' ) ) THEN
122
123 IDUM1 = ICHAR( 'L' )
124 ELSE
124
125 INFO = - 1
126 END IF
127
128 IF( LWORK .LT. - 1) THEN
128
129 INFO = - 13
130 ELSE IF( LWORK .EQ. - 1 ) THEN
130
131 IDUM3 = - 1
132 ELSE
132
133 IDUM3 = 1
134 ENDIF
135
136 IF( N .LT. 0 ) THEN
136
137 INFO = - 2
138 ENDIF
139
140 IF( N + JA - 1 .GT. STORE_N_A ) THEN
140
141 INFO = - ( 6*100 + 6 )
142 ENDIF
143
144 IF( N + IB - 1 .GT. STORE_M_B ) THEN
144
145 INFO = - ( 9*100 + 3 )
146 ENDIF
147
148 IF( LLDB .LT. NB ) THEN
148
149 INFO = - ( 9*100 + 6 )
150 ENDIF
151
152 IF( NRHS .LT. 0 ) THEN
152
153 INFO = - 3
154 ENDIF
155
156 * Current alignment restriction
157
158 IF( JA .NE. IB) THEN
158
159 INFO = - 5
160 ENDIF
161
162 * Argument checking that is specific to Divide & Conquer routine
163
164 IF( NPROW .NE. 1 ) THEN
164
165 INFO = - ( 6*100 + 2 )
166 ENDIF
167
168 IF( N .GT. NP*NB - MOD( JA - 1 , NB )) THEN
168
169 INFO = - ( 2 )
170 CALL PXERBLA( ICTXT ,
171 $ 'PZPTTRS , D&C alg. : only 1 block per proc' ,
172 $ - INFO )
173 RETURN
174 ENDIF
175
176 IF((JA + N - 1.GT.NB) .AND.( NB.LT.2*INT_ONE )) THEN
176
177 INFO = - ( 6*100 + 4 )
178 CALL PXERBLA( ICTXT ,
179 $ 'PZPTTRS , D&C alg. : NB too small' ,
180 $ - INFO )
181 RETURN
182 ENDIF
183
184 WORK_SIZE_MIN =
185 $(10 + 2*MIN(100 , NRHS))*NPCOL + 4*NRHS
186
187 WORK( 1 ) = WORK_SIZE_MIN
188
189 IF( LWORK .LT. WORK_SIZE_MIN ) THEN
189
190 IF( LWORK .NE. - 1 ) THEN
190
191 INFO = - 13
192 CALL PXERBLA( ICTXT ,
193 $ 'PZPTTRS : worksize error' ,
194 $ - INFO )
195 ENDIF
196 RETURN
197 ENDIF
198
199 * Pack params and positions into arrays for global consistency check
200
201 PARAM_CHECK( 15 , 1 ) = DESCB(5)
202 PARAM_CHECK( 14 , 1 ) = DESCB(4)
203 PARAM_CHECK( 13 , 1 ) = DESCB(3)
204 PARAM_CHECK( 12 , 1 ) = DESCB(2)
205 PARAM_CHECK( 11 , 1 ) = DESCB(1)
206 PARAM_CHECK( 10 , 1 ) = IB
207 PARAM_CHECK( 9 , 1 ) = DESCA(5)
208 PARAM_CHECK( 8 , 1 ) = DESCA(4)
209 PARAM_CHECK( 7 , 1 ) = DESCA(3)
210 PARAM_CHECK( 6 , 1 ) = DESCA(1)
211 PARAM_CHECK( 5 , 1 ) = JA
212 PARAM_CHECK( 4 , 1 ) = NRHS
213 PARAM_CHECK( 3 , 1 ) = N
214 PARAM_CHECK( 2 , 1 ) = IDUM3
215 PARAM_CHECK( 1 , 1 ) = IDUM1
216
217 PARAM_CHECK( 15 , 2 ) = 905
218 PARAM_CHECK( 14 , 2 ) = 904
219 PARAM_CHECK( 13 , 2 ) = 903
220 PARAM_CHECK( 12 , 2 ) = 902
221 PARAM_CHECK( 11 , 2 ) = 901
222 PARAM_CHECK( 10 , 2 ) = 8
223 PARAM_CHECK( 9 , 2 ) = 605
224 PARAM_CHECK( 8 , 2 ) = 604
225 PARAM_CHECK( 7 , 2 ) = 603
226 PARAM_CHECK( 6 , 2 ) = 601
227 PARAM_CHECK( 5 , 2 ) = 5
228 PARAM_CHECK( 4 , 2 ) = 3
229 PARAM_CHECK( 3 , 2 ) = 2
230 PARAM_CHECK( 2 , 2 ) = 13
231 PARAM_CHECK( 1 , 2 ) = 1
232
233 * Want to find errors with MIN( ) , so if no error , set it to a big
234 * number. If there already is an error , multiply by the the
235 * descriptor multiplier.
236
237 IF( INFO.GE.0 ) THEN
237
238 INFO = BIGNUM
239 ELSE IF( INFO.LT. - DESCMULT ) THEN
239
240 INFO = - INFO
241 ELSE
241
242 INFO = - INFO * DESCMULT
243 END IF
244
245 * Check consistency across processors
246
247 CALL GLOBCHK( ICTXT , 15 , PARAM_CHECK , 15 ,
248 $PARAM_CHECK( 1 , 3 ) , INFO )
249
250 * Prepare output : set info = 0 if no error , and divide by DESCMULT
251 * if error is not in a descriptor entry.
252
253 IF( INFO.EQ.BIGNUM ) THEN
253
254 INFO = 0
255 ELSE IF( MOD( INFO , DESCMULT ) .EQ. 0 ) THEN
255
256 INFO = - INFO / DESCMULT
257 ELSE
257
258 INFO = - INFO
259 END IF
260
261 IF( INFO.LT.0 ) THEN
261
262 CALL PXERBLA( ICTXT , 'PZPTTRS' , - INFO )
263 RETURN
264 END IF
265
266 * Quick return if possible
267
268 IF( N.EQ.0 )
268
269 $ RETURN
270
271 IF( NRHS.EQ.0 )
271
272 $ RETURN
273
274 * Adjust addressing into matrix space to properly get into
275 * the beginning part of the relevant data
276
277 PART_OFFSET = NB*((JA - 1) / (NPCOL*NB) )
278
279 IF((MYCOL - CSRC) .LT.(JA - PART_OFFSET - 1) / NB ) THEN
280 PART_OFFSET = PART_OFFSET + NB
281 ENDIF
282
283 IF( MYCOL .LT. CSRC ) THEN
283
284 PART_OFFSET = PART_OFFSET - NB
285 ENDIF
286
287 * Form a new BLACS grid(the "standard form" grid) with only procs
288 * holding part of the matrix , of size 1xNP where NP is adjusted ,
289 * starting at csrc = 0 , with JA modified to reflect dropped procs.
290
291 * First processor to hold part of the matrix :
292
293 FIRST_PROC = MOD(( JA - 1 ) / NB + CSRC , NPCOL )
294
295 * Calculate new JA one while dropping off unused processors.
296
297 JA_NEW = MOD( JA - 1 , NB ) + 1
298
299 * Save and compute new value of NP
300
301 NP_SAVE = NP
302 NP =( JA_NEW + N - 2 ) / NB + 1
303
304 * Call utility routine that forms "standard-form" grid
305
306 CALL RESHAPE( ICTXT , INT_ONE , ICTXT_NEW , INT_ONE ,
307 $ FIRST_PROC , INT_ONE , NP )
308
309 * Use new context from standard grid as context.
310
311 ICTXT_SAVE = ICTXT
312 ICTXT = ICTXT_NEW
313 DESCA_1XP( 2 ) = ICTXT_NEW
314 DESCB_PX1( 2 ) = ICTXT_NEW
315
316 * Get information about new grid.
317
318 CALL BLACS_GRIDINFO( ICTXT , NPROW , NPCOL , MYROW , MYCOL )
319
320 * Drop out processors that do not have part of the matrix.
321
322 IF( MYROW .LT. 0 ) THEN
322
323 GOTO 1234
324 ENDIF
325
326 * ********************************
327 * Values reused throughout routine
328
329 * User - input value of partition size
330
331 PART_SIZE = NB
332
333 * Number of columns in each processor
334
335 MY_NUM_COLS = NUMROC( N , PART_SIZE , MYCOL , 0 , NPCOL )
336
337 * Offset in columns to beginning of main partition in each proc
338
339 IF( MYCOL .EQ. 0 ) THEN
339
340 PART_OFFSET = PART_OFFSET + MOD( JA_NEW - 1 , PART_SIZE )
341 MY_NUM_COLS = MY_NUM_COLS - MOD(JA_NEW - 1 , PART_SIZE )
342 ENDIF
343
344 * Size of main(or odd) partition in each processor
345
346 ODD_SIZE = MY_NUM_COLS
347 IF( MYCOL .LT. NP - 1 ) THEN
347
348 ODD_SIZE = ODD_SIZE - INT_ONE
349 ENDIF
350
351 * Begin main code
352
353 INFO = 0
354
355 * Call frontsolve routine
356
357 IF( LSAME( UPLO , 'L' ) ) THEN
358
358
359 CALL PZPTTRSV ( 'L' , 'N' , N , NRHS , D( PART_OFFSET + 1 ) ,
360 $ E( PART_OFFSET + 1 ) , JA_NEW , DESCA_1XP , B , IB ,
361 $ DESCB_PX1 , AF , LAF , WORK , LWORK , INFO )
362
363 ELSE
364
364
365 CALL PZPTTRSV ( 'U' , 'C' , N , NRHS , D( PART_OFFSET + 1 ) ,
366 $ E( PART_OFFSET + 1 ) , JA_NEW , DESCA_1XP , B , IB ,
367 $ DESCB_PX1 , AF , LAF , WORK , LWORK , INFO )
368
369 ENDIF
370
371 * Divide by the main diagonal : B <- D^{ - 1} B
372
373 * The main partition is first
374
375 DO 10 I = PART_OFFSET + 1 , PART_OFFSET + ODD_SIZE
376 CALL ZSCAL( NRHS , DCMPLX( CONE / D( I ) ) , B( I ) , LLDB )
377 10 CONTINUE
378
379 * Reduced system is next
380
380
381 IF( MYCOL .LT. NPCOL - 1 ) THEN
381
382 I = PART_OFFSET + ODD_SIZE + 1
383 CALL ZSCAL( NRHS , CONE / AF( ODD_SIZE + 2 ) , B( I ) , LLDB )
384 ENDIF
385
386 * Call backsolve routine
387
388 IF( LSAME( UPLO , 'L' ) ) THEN
389
389
390 CALL PZPTTRSV ( 'L' , 'C' , N , NRHS , D( PART_OFFSET + 1 ) ,
391 $ E( PART_OFFSET + 1 ) , JA_NEW , DESCA_1XP , B , IB ,
392 $ DESCB_PX1 , AF , LAF , WORK , LWORK , INFO )
393
394 ELSE
395
395
396 CALL PZPTTRSV ( 'U' , 'N' , N , NRHS , D( PART_OFFSET + 1 ) ,
397 $ E( PART_OFFSET + 1 ) , JA_NEW , DESCA_1XP , B , IB ,
398 $ DESCB_PX1 , AF , LAF , WORK , LWORK , INFO )
399
400 ENDIF
401 1000 CONTINUE
402
403 * Free BLACS space used to hold standard - form grid.
404
405 IF( ICTXT_SAVE .NE. ICTXT_NEW ) THEN
405
406 CALL BLACS_GRIDEXIT( ICTXT_NEW )
407 ENDIF
408
409 1234 CONTINUE
410
411 * Restore saved input parameters
412
413 ICTXT = ICTXT_SAVE
414 NP = NP_SAVE
415
416 * Output minimum worksize
417
418 WORK( 1 ) = WORK_SIZE_MIN
419
420 RETURN
421
422 * End of PZPTTRS
423
424 END108
43
|
|
Variables in Routine PZPTTRS()
| Summary Report |
| Data Type | Quantity | Size(byte) |
| CHARACTER | 1 | 1 |
| COMPLEX*16 | 2 | ? |
| DOUBLE PRECISION | 2 | 8 |
| INTEGER | 52 | 320 |
| LOGICAL | 1 | 1 |
| REAL | 2 | 8 |
| TOTAL | 60 | 338 |
List of Variables
CHARACTER
COMPLEX*16
DOUBLE PRECISION
INTEGER
| BIGNUM | BLOCK_CYCLIC_2D | CSRC | CSRC_ | CTXT_ |
| DESCA_1XP( 7 ) | DESCB_PX1( 7 ) | DESCMULT | DLEN_ | DTYPE_ |
| FIRST_PROC | I | IB | ICTXT | ICTXT_NEW |
| ICTXT_SAVE | IDUM1 | IDUM3 | INFO | INT_ONE |
| JA | JA_NEW | LAF | LLD_ | LLDA |
| LLDB | LWORK | M_ | MB_ | MY_NUM_COLS |
| MYCOL | MYROW | N | N_ | NB |
| NB_ | NP | NP_SAVE | NPCOL | NPROW |
| NRHS | NUMROC | ODD_SIZE | PARAM_CHECK( 15, 3 ) | PART_OFFSET |
| PART_SIZE | RETURN_CODE | RSRC_ | STORE_M_B | STORE_N_A |
| TEMP | WORK_SIZE_MIN | | | |
LOGICAL
REAL
Variables Dependence Graph Put the mouse over a right hand side variable to display the corresponding line of the dependence | | - | | - | - | | CSRC | <--- | DESCA_1XPCSRC = DESCA_1XP( 5 ) |
| DESCA | <--- | TEMPDESCA( DTYPE_ ) = TEMP |
| DESCA_1XP | <--- | ICTXT_NEWDESCA_1XP( 2 ) = ICTXT_NEW |
| DESCB_PX1 | <--- | ICTXT_NEWDESCB_PX1( 2 ) = ICTXT_NEW |
| FIRST_PROC | <--- | JAFIRST_PROC = MOD( ( JA-1 )/NB+CSRC, NPCOL ), NBFIRST_PROC = MOD( ( JA-1 )/NB+CSRC, NPCOL ), CSRCFIRST_PROC = MOD( ( JA-1 )/NB+CSRC, NPCOL ), NPCOLFIRST_PROC = MOD( ( JA-1 )/NB+CSRC, NPCOL ) |
| I | <--- | ODD_SIZEDO 10 I=PART_OFFSET+1, PART_OFFSET+ODD_SIZE{2I=PART_OFFSET+ODD_SIZE+1}, PART_OFFSETDO 10 I=PART_OFFSET+1, PART_OFFSET+ODD_SIZE{2I=PART_OFFSET+ODD_SIZE+1} |
| ICTXT | <--- | ICTXT_NEWICTXT = ICTXT_NEW, ICTXT_SAVEICTXT = ICTXT_SAVE, DESCA_1XPICTXT = DESCA_1XP( 2 ) |
| ICTXT_SAVE | <--- | ICTXTICTXT_SAVE = ICTXT |
| INFO | <--- | BIGNUMINFO = BIGNUM, DESCMULTINFO = -INFO * DESCMULT{2INFO = -INFO / DESCMULT}, INFOINFO = -INFO{2INFO = -INFO * DESCMULT, 3INFO = -INFO / DESCMULT, 4INFO = -INFO} |
| JA_NEW | <--- | JAJA_NEW = MOD( JA-1, NB ) + 1, NBJA_NEW = MOD( JA-1, NB ) + 1 |
| LLDA | <--- | DESCA_1XPLLDA = DESCA_1XP( 6 ) |
| LLDB | <--- | DESCB_PX1LLDB = DESCB_PX1( 6 ) |
| MY_NUM_COLS | <--- | JA_NEWMY_NUM_COLS = MY_NUM_COLS - MOD(JA_NEW-1, PART_SIZE ), MY_NUM_COLSMY_NUM_COLS = MY_NUM_COLS - MOD(JA_NEW-1, PART_SIZE ), MYCOLMY_NUM_COLS = NUMROC( N, PART_SIZE, MYCOL, 0, NPCOL ), NMY_NUM_COLS = NUMROC( N, PART_SIZE, MYCOL, 0, NPCOL ), NPCOLMY_NUM_COLS = NUMROC( N, PART_SIZE, MYCOL, 0, NPCOL ), NUMROCMY_NUM_COLS = NUMROC( N, PART_SIZE, MYCOL, 0, NPCOL ), PART_SIZEMY_NUM_COLS = NUMROC( N, PART_SIZE, MYCOL, 0, NPCOL ){2MY_NUM_COLS = MY_NUM_COLS - MOD(JA_NEW-1, PART_SIZE )} |
| NB | <--- | DESCA_1XPNB = DESCA_1XP( 4 ) |
| NP | <--- | JA_NEWNP = ( JA_NEW+N-2 )/NB + 1, NNP = ( JA_NEW+N-2 )/NB + 1, NBNP = ( JA_NEW+N-2 )/NB + 1, NP_SAVENP = NP_SAVE, NPCOLNP = NPROW * NPCOL, NPROWNP = NPROW * NPCOL |
| NP_SAVE | <--- | NPNP_SAVE = NP |
| ODD_SIZE | <--- | INT_ONEODD_SIZE = ODD_SIZE - INT_ONE, MY_NUM_COLSODD_SIZE = MY_NUM_COLS, ODD_SIZEODD_SIZE = ODD_SIZE - INT_ONE |
| PARAM_CHECK | <--- | IBPARAM_CHECK( 10, 1 ) = IB, IDUM1PARAM_CHECK( 1, 1 ) = IDUM1, IDUM3PARAM_CHECK( 2, 1 ) = IDUM3, JAPARAM_CHECK( 5, 1 ) = JA, NPARAM_CHECK( 3, 1 ) = N, NRHSPARAM_CHECK( 4, 1 ) = NRHS, DESCAPARAM_CHECK( 9, 1 ) = DESCA(5){2PARAM_CHECK( 8, 1 ) = DESCA(4), 3PARAM_CHECK( 7, 1 ) = DESCA(3), 4PARAM_CHECK( 6, 1 ) = DESCA(1)} |
| PART_OFFSET | <--- | JAPART_OFFSET = NB*( (JA-1)/(NPCOL*NB) ), JA_NEWPART_OFFSET = PART_OFFSET+MOD( JA_NEW-1, PART_SIZE ), NBPART_OFFSET = NB*( (JA-1)/(NPCOL*NB) ){2PART_OFFSET = PART_OFFSET + NB, 3PART_OFFSET = PART_OFFSET - NB}, NPCOLPART_OFFSET = NB*( (JA-1)/(NPCOL*NB) ), PART_OFFSETPART_OFFSET = PART_OFFSET + NB{2PART_OFFSET = PART_OFFSET - NB, 3PART_OFFSET = PART_OFFSET+MOD( JA_NEW-1, PART_SIZE )}, PART_SIZEPART_OFFSET = PART_OFFSET+MOD( JA_NEW-1, PART_SIZE ) |
| PART_SIZE | <--- | NBPART_SIZE = NB |
| STORE_M_B | <--- | DESCB_PX1STORE_M_B = DESCB_PX1( 3 ) |
| STORE_N_A | <--- | DESCA_1XPSTORE_N_A = DESCA_1XP( 3 ) |
| TEMP | <--- | DTYPE_TEMP = DESCA( DTYPE_ ), DESCATEMP = DESCA( DTYPE_ ) |
| WORK | <--- | WORK_SIZE_MINWORK( 1 ) = WORK_SIZE_MIN{2WORK( 1 ) = WORK_SIZE_MIN} |
| WORK_SIZE_MIN | <--- | NPCOLWORK_SIZE_MIN =, NRHSWORK_SIZE_MIN = |
|
|
Analysis elements of the routine PZPTTRS() Put the mouse over each element to display detailed matching information
Assigned variables |
| | | BIGNUM , BLOCK_CYCLIC_2D , CONE , CSRC , CSRC_ , CTXT_ , CZERO , DESCA_1XP , DESCB_PX1 , DESCMULT , DLEN_ , DTYPE_ , FIRST_PROC , I , ICTXT , ICTXT_SAVE , IDUM1 , IDUM3 , INFO , INT_ONE , JA_NEW , LLD_ , LLDA , LLDB , M_ , MB_ , MY_NUM_COLS , N_ , NB , NB_ , NP , NP_SAVE , ODD_SIZE , ONE , PARAM_CHECK , PART_OFFSET , PART_SIZE , RSRC_ , STORE_M_B , STORE_N_A , TEMP , WORK , WORK_SIZE_MIN , ZERO |
|
Active variables |
| | | AF , B , BIGNUM , BLOCK_CYCLIC_2D , CONE , CSRC , CSRC_ , CTXT_ , CZERO , D , DESCA , DESCA_1XP , DESCB , DESCB_PX1 , DESCMULT , DLEN_ , DTYPE_ , E , FIRST_PROC , I , IB , ICTXT , ICTXT_NEW , ICTXT_SAVE , IDUM1 , IDUM3 , INFO , INT_ONE , JA , JA_NEW , LAF , LLD_ , LLDA , LLDB , LSAME , LWORK , M_ , MB_ , MY_NUM_COLS , MYCOL , MYROW , N , N_ , NB , NB_ , NP , NP_SAVE , NPCOL , NPROW , NRHS , NUMROC , ODD_SIZE , ONE , PARAM_CHECK , PART_OFFSET , PART_SIZE , RETURN_CODE , RSRC_ , STORE_M_B , STORE_N_A , TEMP , UPLO , WORK , WORK_SIZE_MIN , ZERO |
|
Allocated variables [ statement : associated variable ] |
| | new | : a, about, Calculate, compute, Use |
|
Desallocated variables [ statement : associated variable ] |
| | free | : BLACS |
|
Accessed arrays [ array name : associated index ] |
| | AF | : ODD_SIZE+2 |
| | B | : I , I |
| | D | : I , PART_OFFSET+1 , PART_OFFSET+1 , PART_OFFSET+1 , PART_OFFSET+1 |
| | DESCA | : 1 , 3 , 4 , 5 , DTYPE_ , DTYPE_ , DTYPE_ |
| | DESCA_1XP | : 1 , 2 , 2 , 2 , 3 , 4 , 4 , 5 , 5 , 6 , 7 |
| | DESCB | : 1 , 2 , 3 , 4 , 5 |
| | DESCB_PX1 | : 1 , 2 , 2 , 3 , 4 , 5 , 6 , 7 |
| | E | : PART_OFFSET+1 , PART_OFFSET+1 , PART_OFFSET+1 , PART_OFFSET+1 |
| | LSAME | : UPLO, 'L' , UPLO, 'L' , UPLO, 'L' , UPLO, 'U' |
| | NUMROC | : N, PART_SIZE, MYCOL, 0, NPCOL |
| | PARAM_CHECK | : 1, 1 , 1, 2 , 1, 3 , 10, 1 , 10, 2 , 11, 1 , 11, 2 , 12, 1 , 12, 2 , 13, 1 , 13, 2 , 14, 1 , 14, 2 , 15, 1 , 15, 2 , 15, 3 , 2, 1 , 2, 2 , 3, 1 , 3, 2 , 4, 1 , 4, 2 , 5, 1 , 5, 2 , 6, 1 , 6, 2 , 7, 1 , 7, 2 , 8, 1 , 8, 2 , 9, 1 , 9, 2 |
| | WORK | : 1 , 1 |
|
Conditional statements [ statement : associated predicate ] |
| | do | : ( not have part of the matrix. ) , ( 10 I=PART_OFFSET + 1 , PART_OFFSET + ODD_SIZE ) |
| | for | : ( easy access to ) , ( DESCA and DESCB. ) , ( use in code. ) , ( global consistency check ) |
| | if | : ( TEMP .EQ. 502 ) , ( RETURN_CODE .NE. 0 ) , ( RETURN_CODE .NE. 0 ) , ( (DESCA_1XP( 2 ) .NE. DESCB_PX1( 2 ) ) ) , ( (DESCA_1XP( 4 ) .NE. DESCB_PX1( 4 ) ) ) , ( (DESCA_1XP( 5 ) .NE. DESCB_PX1( 5 ) ) ) , ( (LSAME( UPLO , 'U' ) ) ) , ( (LSAME( UPLO , 'L' ) ) ) , ( LWORK .LT. - 1 ) , ( LWORK .EQ. - 1 ) , ( N .LT. 0 ) , ( N+JA-1 .GT. STORE_N_A ) , ( N+IB-1 .GT. STORE_M_B ) , ( LLDB .LT. NB ) , ( NRHS .LT. 0 ) , ( JA .NE. IB ) , ( NPROW .NE. 1 ) , ( (N .GT. NP*NB - MOD( JA - 1 , NB )) ) , ( ((JA+N - 1.GT.NB) .AND. ( NB.LT.2*INT_ONE )) ) , ( LWORK .LT. WORK_SIZE_MIN ) , ( LWORK .NE. - 1 ) , ( no error , set it to a big ) , ( there already is an error , multiply by the the ) , ( INFO.GE.0 ) , ( INFO.LT. - DESCMULT ) , ( no error , and divide by DESCMULT ) , ( error is not in a descriptor entry. ) , ( INFO.EQ.BIGNUM ) , ( (MOD( INFO , DESCMULT ) .EQ. 0 ) ) , ( INFO.LT.0 ) , ( possible ) , ( N.EQ.0 ) , ( NRHS.EQ.0 ) , ( ((MYCOL - CSRC) .LT. (JA - PART_OFFSET - 1) / NB ) ) , ( MYCOL .LT. CSRC ) , ( MYROW .LT. 0 ) , ( MYCOL .EQ. 0 ) , ( MYCOL .LT. NP - 1 ) , ( (LSAME( UPLO , 'L' ) ) ) , ( MYCOL .LT. NPCOL - 1 ) , ( (LSAME( UPLO , 'L' ) ) ) , ( ICTXT_SAVE .NE. ICTXT_NEW ) |
| | while | : ( dropping off unused processors. ) |
|
| List of variables | BIGNUM BLOCK_CYCLIC_2D CONE CSRC CSRC_ CTXT_ CZERO
| DESCA DESCA_1XP( 7 ) DESCB_PX1( 7 ) DESCMULT DLEN_ DTYPE_ FIRST_PROC I
| IB ICTXT ICTXT_NEW ICTXT_SAVE IDUM1 IDUM3 INFO INT_ONE
| JA JA_NEW LAF LLD_ LLDA LLDB LSAME LWORK
| M_ MB_ MY_NUM_COLS MYCOL MYROW N N_ NB
| NB_ NP NP_SAVE NPCOL NPROW NRHS NUMROC ODD_SIZE
| ONE PARAM_CHECK( 15, 3 ) PART_OFFSET PART_SIZE RETURN_CODE RSRC_ STORE_M_B STORE_N_A
| TEMP UPLO WORK WORK_SIZE_MIN ZERO | | close
| |
BIGNUM
BLOCK_CYCLIC_2D
CONE
CSRC
CSRC_
CTXT_
CZERO
DESCA
DESCA_1XP( 7 )
DESCB_PX1( 7 )
DESCMULT
DLEN_
DTYPE_
FIRST_PROC
I
IB
ICTXT
ICTXT_NEW
ICTXT_SAVE
IDUM1
IDUM3
INFO
INT_ONE
JA
JA_NEW
LAF
LLD_
LLDA
LLDB
LSAME
LWORK
M_
MB_
MY_NUM_COLS
MYCOL
MYROW
N
N_
NB
NB_
NP
NP_SAVE
NPCOL
NPROW
NRHS
NUMROC
ODD_SIZE
ONE
PARAM_CHECK( 15, 3 )
PART_OFFSET
PART_SIZE
RETURN_CODE
RSRC_
STORE_M_B
STORE_N_A
TEMP
UPLO
WORK
WORK_SIZE_MIN
ZERO
571
| |