next up previous contents
Next: Data analysis Up: Parallel atsp2K Previous: Parallel atsp2K on Linux   Contents

Parallel atsp2K on IBM/SP

seaborg uses poe (A Parallel Operating Environment) for management of MPI jobs and each MPI application is called by poe. In addition to the name of the application, poe requires the number of nodes and processors as arguments:

poe ${ATSP}/bin/nonh_mpi \
        -nodes ${NN} -procs ${NP}  # fur ibmSP use poe
poe ${ATSP}/bin/mchf_mpi -nodes ${NN} -procs ${NP} \
               > out_${s}.${Z}-${n} << EOF

and bp_ang_mpi, bp_mat_mpi, and bp_eiv_mpi:

poe ${ATSP}/bin/bp_ang_mpi -nodes ${NN} -procs ${NP} \
                     <in_ang_${D}  # generate angular data
poe ${ATSP}/bin/bp_mat_mpi -nodes ${NN} -procs ${NP} \
                     <in_mat_${D}   #  compute all contributions
poe ${ATSP}/bin/bp_eiv_mpi -nodes ${NN} -procs ${NP} \
                     <in_eiv_${D}_${Z}   #  compute eigenvectors

In addition to the variable $ATSP (described in the beginning of this section), each script uses a number of local variables s, Z, n, NN, NP.

In order to run batch jobs, the user will need to address several issues:

  1. Proper setup of the environment. This is accomplished with editing .cshrc.ext in the $HOME directory. The minimum .cshrc.ext is shown below:

    # start .cshrc
    if ($?tcsh) then
    # start .cshrc.ext
    
    if ($?tcsh) then
       set modules_shell="tcsh"
    else
       set modules_shell="csh"
    endif
    #alias module 'eval `/opt/modules/modules/bin/modulecmd $modules_shell \!*`'
    
    set path = ( $path /u2/georgio/SPII/atsp2K/bin \
                       /u2/georgio/graspVU/bin \
                       ${HOME}/atsp2K/bin )
    
    
    source /usr/common/usg/Modules/3.1.1/init/csh #initialize module env
    module load gnu KCC       # load modules 
    # put any user defined aliases here
    setenv FC "xlf"           # the fortran compiler
    setenv FC_MPI "mpxlf"     # FORTRAN compiler for MPI
    setenv FFLAGS "-O3 "      # FORTRAN flags
    setenv MALLOC ibmSP       # memory allocation routines
    setenv MPI_FFLAGS         #
    setenv LDFLAGS            #
    setenv CC KCC             #
    setenv CCFLAGS "-O3"      #
    setenv lapack "/usr/common/usg/LAPACK/3.0a/lapack_SP.a"  #
    setenv blas
    setenv ATSP ${HOME}/atsp2K    # this can be set to /usr/common/homes/g/georgio/atsp2K"
    #setenv XLFRTEOPTS "buffering=disable_all"  # a debugging option, slows down the appl
    
    alias   l        'ls -l'
    alias   ll       'ls -la'
    alias   vim      'vi Makefile'
    alias   cds      'cd /scratch/scratchdirs/${USER}/'
    alias   llh      'llqs | head -30'
    alias   llg      'llqs | grep ${USER}'
    
    # end .cshrc.ext
    

  2. Use the command llsubmit and a special batch script to submit jobs. The MPI tests are started with:
    cd atsp2K/run/N_like/
    llsubmit ll_bp
    

    "llsubmit ll_bp", will submit the script ll_bp to the batch queue. ll_bp is only an initializing script. It defines a number of parameters required for batch jobs, and it contains the following information:

    #!/usr/bin/csh
    #@ job_name        = mpi_test         # job identifier
    #@ output          = mpi_test.out     # where stdout is redirected
    #@ error           = mpi_test.err     # stderr
    #@ job_type        = parallel         # parallel job
    #@ class           = premium          # regular, premium, debug
    #@ environment     = COPY_ALL         # use the env variables
    #@ tasks_per_node  = 16               #
    #@ node            = 2                #
    #@ wall_clock_limit= 0:60:00          #
    #@ notification    = never            #
    #@ network.MPI     = css0,not_shared,us  #
    #@ node_usage      = not_shared          #
    #@ queue
    
    cd /scratch/scratchdirs/georgio/atsp2K/run/N_like
    echo "changed directory to " `pwd`
    echo "starting script po_breit at " `date` " ..."
    ./sh_ALL_mpi_ibmSP
    echo " time is: " `date`
    echo "  at " `date` " script po_breit finished!"
    

    After submitting the job, the system responds with:

    % llsubmit ll_bp
    subfilter: default repo mp52 will be charged
    llsubmit: Processed command file through Submit Filter: "/usr/common/nsg/etc/subfilter".
    llsubmit: The job "s03513.nersc.gov.531" has been submitted.
    

  3. Monitoring the queue and job progress. To check if the job has been correctly submitted type:
    % llstat | grep ${USER}
    s03513.531.0     mpi_test        georgio  premium I    2 01:00:00  9/30 09:47
    
    The first table entry is the node from which the job has been submitted. Next entry is the job identifier, then the user. The job priority is shown as a premium. Normally, the jobs are submitted with regular priority. For debugging this entry is debug. The status of the job is showed by I, which means the job is in the queue. Running jobs have an R entry. The number of requested nodes (2) and remaining time, which is equal to the requested, for jobs not yet running, (1 hr) are shown after the job status. The last two entries are the date and time submitted. The user may cancel the job by sending "llcancel s03513.531". Occasionally, it is helpful to monitor the status of the queue: llqs | head 30 gives the first 30 jobs:

    % llqs | head -20
    Step Id          JobName         UserName  Class  ST NDS WallClck Submit Time
    ---------------- --------------- -------- ------- -- --- -------- -----------
    s02901.470.0     xCmod1          xu       regular R    4 07:56:22  9/25 14:00
    s02813.471.0     s02813.nersc.go kogut    regular R    4 03:20:57  9/26 06:13
    s02901.475.0     s02901.nersc.go kogut    regular R    4 03:36:22  9/26 06:16
    ....
    

    The user may monitor a running job by inspecting the stderr, stdout files, which have been redirected to mpi_test.err and mpi_test.out.


next up previous contents
Next: Data analysis Up: Parallel atsp2K Previous: Parallel atsp2K on Linux   Contents
2001-10-11