Next: MPI I/O
Up: MPI version
Previous: MPI version
  Contents
The MPI mchf implementation is based on the program
structure of the serial code with the most CPU
intensive operations modified for parallel execution.
The program initialization is similar to the serial version:
Node 0 in mpi_mchf_sun() processes the parameters
provided by the user in interactive mode, and broadcast
them to the other processors. mpi_data() calls
wavefn(), which reads the input
wave function estimate from wfn.inp if it is present in the
working directory. Or, wavefn() creates hydrogenic estimates.
All initial parameters are broadcast from node 0 to the
rest of the nodes. Then, mpi_data() proceeds with
calling mpi_spintgrl(), which allocates memory for
each node (calling mpi_spalcsts()), and reads the
the angular data files supplied from nonh_mpi.
mpi_spalcsts()
has sufficient information for the size of the arrays, and attempts
to allocate heap memory for all arrays. If the
calculation is too large, then the coefficient data from c.lst.nnn
are read from disk on each scf() iteration. In this
case, considerably smaller arrays are allocated and they are used
to buffer the input data. The parameter LSDIM=30000
controls the size of the
buffer, and its size can be adjusted for efficient I/O processing.
This is important only in the case when coefficient data is on
disk, since on each scf() iteration the entire list coefficients
is processed. However, the user should avoid computing on
disk and by increasing the number of processors, all coefficient
and pointer data may be stored in memory.
The scf() iterations has
exactly the same structure as described for the serial mchf.
The first phase solves the
differential equation for each radial function with the
following sequence, Figure 6.14. Then, during the second phase,
diag() solves the
eigenvalue problem and updates the integral coefficients
of the radial functions, Figure 6.16.
mchf performs
complicated computational tasks, including parallel and
serial I/O, complex arithmetic loops, and matrix algebra.
The efficiency of mchf is a function of the number of
processors, it significantly drops below 0.6 when more than
16-24 processors are used. The time consuming operations are
coefficient updates and matrix diagonalization, and in the exchange
procedure. The parallel version, mchf_mpi, is
structurally similar to the serial mchf program. However,
it has only two level of memory allocation: Level1, all arrays
are in memory, and Level 2, coeff, inptr are on disk
and hmx, ih, ico are stored in memory. It is assumed that
the number of processor can be increased as needed so that
all data is stored in memory. The speed of iteration may
show a considerable decrease when all of the data is on disk,
as opposed to have all data in memory.
Next: MPI I/O
Up: MPI version
Previous: MPI version
  Contents
2001-10-11