poe ${ATSP}/bin/nonh_mpi \ -nodes ${NN} -procs ${NP} # fur ibmSP use poe poe ${ATSP}/bin/mchf_mpi -nodes ${NN} -procs ${NP} \ > out_${s}.${Z}-${n} << EOF
and bp_ang_mpi, bp_mat_mpi, and bp_eiv_mpi:
poe ${ATSP}/bin/bp_ang_mpi -nodes ${NN} -procs ${NP} \ <in_ang_${D} # generate angular data poe ${ATSP}/bin/bp_mat_mpi -nodes ${NN} -procs ${NP} \ <in_mat_${D} # compute all contributions poe ${ATSP}/bin/bp_eiv_mpi -nodes ${NN} -procs ${NP} \ <in_eiv_${D}_${Z} # compute eigenvectors
In addition to the variable $ATSP (described in the beginning of this section), each script uses a number of local variables s, Z, n, NN, NP.
In order to run batch jobs, the user will need to address several issues:
# start .cshrc if ($?tcsh) then # start .cshrc.ext if ($?tcsh) then set modules_shell="tcsh" else set modules_shell="csh" endif #alias module 'eval `/opt/modules/modules/bin/modulecmd $modules_shell \!*`' set path = ( $path /u2/georgio/SPII/atsp2K/bin \ /u2/georgio/graspVU/bin \ ${HOME}/atsp2K/bin ) source /usr/common/usg/Modules/3.1.1/init/csh #initialize module env module load gnu KCC # load modules # put any user defined aliases here setenv FC "xlf" # the fortran compiler setenv FC_MPI "mpxlf" # FORTRAN compiler for MPI setenv FFLAGS "-O3 " # FORTRAN flags setenv MALLOC ibmSP # memory allocation routines setenv MPI_FFLAGS # setenv LDFLAGS # setenv CC KCC # setenv CCFLAGS "-O3" # setenv lapack "/usr/common/usg/LAPACK/3.0a/lapack_SP.a" # setenv blas setenv ATSP ${HOME}/atsp2K # this can be set to /usr/common/homes/g/georgio/atsp2K" #setenv XLFRTEOPTS "buffering=disable_all" # a debugging option, slows down the appl alias l 'ls -l' alias ll 'ls -la' alias vim 'vi Makefile' alias cds 'cd /scratch/scratchdirs/${USER}/' alias llh 'llqs | head -30' alias llg 'llqs | grep ${USER}' # end .cshrc.ext
cd atsp2K/run/N_like/ llsubmit ll_bp
"llsubmit ll_bp", will submit the script ll_bp to the batch queue. ll_bp is only an initializing script. It defines a number of parameters required for batch jobs, and it contains the following information:
#!/usr/bin/csh #@ job_name = mpi_test # job identifier #@ output = mpi_test.out # where stdout is redirected #@ error = mpi_test.err # stderr #@ job_type = parallel # parallel job #@ class = premium # regular, premium, debug #@ environment = COPY_ALL # use the env variables #@ tasks_per_node = 16 # #@ node = 2 # #@ wall_clock_limit= 0:60:00 # #@ notification = never # #@ network.MPI = css0,not_shared,us # #@ node_usage = not_shared # #@ queue cd /scratch/scratchdirs/georgio/atsp2K/run/N_like echo "changed directory to " `pwd` echo "starting script po_breit at " `date` " ..." ./sh_ALL_mpi_ibmSP echo " time is: " `date` echo " at " `date` " script po_breit finished!"
After submitting the job, the system responds with:
% llsubmit ll_bp subfilter: default repo mp52 will be charged llsubmit: Processed command file through Submit Filter: "/usr/common/nsg/etc/subfilter". llsubmit: The job "s03513.nersc.gov.531" has been submitted.
% llstat | grep ${USER} s03513.531.0 mpi_test georgio premium I 2 01:00:00 9/30 09:47The first table entry is the node from which the job has been submitted. Next entry is the job identifier, then the user. The job priority is shown as a premium. Normally, the jobs are submitted with regular priority. For debugging this entry is debug. The status of the job is showed by I, which means the job is in the queue. Running jobs have an R entry. The number of requested nodes (2) and remaining time, which is equal to the requested, for jobs not yet running, (1 hr) are shown after the job status. The last two entries are the date and time submitted. The user may cancel the job by sending "llcancel s03513.531". Occasionally, it is helpful to monitor the status of the queue: llqs | head 30 gives the first 30 jobs:
% llqs | head -20 Step Id JobName UserName Class ST NDS WallClck Submit Time ---------------- --------------- -------- ------- -- --- -------- ----------- s02901.470.0 xCmod1 xu regular R 4 07:56:22 9/25 14:00 s02813.471.0 s02813.nersc.go kogut regular R 4 03:20:57 9/26 06:13 s02901.475.0 s02901.nersc.go kogut regular R 4 03:36:22 9/26 06:16 ....
The user may monitor a running job by inspecting the stderr, stdout files, which have been redirected to mpi_test.err and mpi_test.out.