next up previous contents
Next: MPI I/O Up: MPI version Previous: MPI version   Contents

Introduction

The MPI mchf implementation is based on the program structure of the serial code with the most CPU intensive operations modified for parallel execution. The program initialization is similar to the serial version: Node 0 in mpi_mchf_sun() processes the parameters provided by the user in interactive mode, and broadcast them to the other processors. mpi_data() calls wavefn(), which reads the input wave function estimate from wfn.inp if it is present in the working directory. Or, wavefn() creates hydrogenic estimates. All initial parameters are broadcast from node 0 to the rest of the nodes. Then, mpi_data() proceeds with calling mpi_spintgrl(), which allocates memory for each node (calling mpi_spalcsts()), and reads the the angular data files supplied from nonh_mpi. mpi_spalcsts() has sufficient information for the size of the arrays, and attempts to allocate heap memory for all arrays. If the calculation is too large, then the coefficient data from c.lst.nnn are read from disk on each scf() iteration. In this case, considerably smaller arrays are allocated and they are used to buffer the input data. The parameter LSDIM=30000 controls the size of the buffer, and its size can be adjusted for efficient I/O processing. This is important only in the case when coefficient data is on disk, since on each scf() iteration the entire list coefficients is processed. However, the user should avoid computing on disk and by increasing the number of processors, all coefficient and pointer data may be stored in memory.

The scf() iterations has exactly the same structure as described for the serial mchf. The first phase solves the differential equation for each radial function with the following sequence, Figure  6.14. Then, during the second phase, diag() solves the eigenvalue problem and updates the integral coefficients of the radial functions, Figure  6.16.

mchf performs complicated computational tasks, including parallel and serial I/O, complex arithmetic loops, and matrix algebra. The efficiency of mchf is a function of the number of processors, it significantly drops below 0.6 when more than 16-24 processors are used. The time consuming operations are coefficient updates and matrix diagonalization, and in the exchange procedure. The parallel version, mchf_mpi, is structurally similar to the serial mchf program. However, it has only two level of memory allocation: Level1, all arrays are in memory, and Level 2, coeff, inptr are on disk and hmx, ih, ico are stored in memory. It is assumed that the number of processor can be increased as needed so that all data is stored in memory. The speed of iteration may show a considerable decrease when all of the data is on disk, as opposed to have all data in memory.


next up previous contents
Next: MPI I/O Up: MPI version Previous: MPI version   Contents
2001-10-11