Elsevier Science Home
Computer Physics Communications Program Library
Full text online from Science Direct
Programs in Physics & Physical Chemistry
CPC Home

Manuscript Title: FLY MPI-2: a parallel tree code for LSS
Authors: U. Becciani, M. Comparato, V. Antonuccio-Delogu
Program title: FLY 3.1
Catalogue identifier: ADSC_v2_0
Distribution format: tar.gz
Journal reference: Comput. Phys. Commun. 174(2006)605
Programming language: Fortran 90, C.
Computer: Beowulf cluster, PC, MPP systems.
Operating system: Linux, Aix.
RAM: 100M words
Keywords: Tree N-body code, Parallel Computing, MPI-2, Cosmological simulations, Astrophysics.
PACS: 95.75.Pq, 95.75.-z, 98.80.Bp.
Classification: 1.9.

Does the new version supersede the previous version?: Yes

Nature of problem:
FLY is a parallel collisionless N-body code for the calculation of the gravitational force.

Solution method:
FLY is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986)

Reasons for new version:
The new version of FLY is implemented by using the MPI-2 standard: the distributed version 3.1 was developed by using the MPICH2 library on a PC Linux cluster. Today the FLY performance allows us to consider the FLY code among the most powerful parallel codes for tree N-body simulations.
Another important new feature regards the availability of an interface with hydrodynamical Paramesh based codes. Simulations must follow a box large enough to accurately represent the power spectrum of fluctuations on very large scales so that we may hope to compare them meaningfully with real data. The number of particles then sets the mass resolution of the simulation, which we would like to make as fine as possible. The idea to build an interface between two codes, that have different and complementary cosmological tasks, allows us to execute complex cosmological simulations with FLY, specialized for DM evolution, and a code specialized for hydrodynamical components that uses a Paramesh block structure.

Summary of revisions:
The parallel communication schema was totally changed. The new version adopts the MPICH2 library. Now FLY can be executed on all Unix systems having an MPI-2 standard library. The main data structure, is declared in a module procedure of FLY (fly_h.F90 routine). FLY creates the MPI Window object for one-sided communication for all the shared arrays, with a call like the following:

CALL MPI_WIN_CREATE (pos, size, real8, MPI_INFO_NULL, MPI_COMM_WORLD,win_pos,ierr)

the following main window objects are created:
  1. win_pos, win_vel, win_acc: particles positions velocities and accelerations
  2. win_pos_cell, win_mass_cell, win_quad, win_subp, win_grouping: cells positions, masses, quadrupole momenta, tree structure and grouping cells.
Other windows are created for dynamic load balance and global counters.

Restrictions:
The program uses the leapfrog integrator schema, but could be changed by the user.

Unusual features:
FLY uses the MPI-2 standard: the MPICH2 library on Linux systems was adopted. To run this version of FLY the working directory must be shared among all the processors that execute FLY.

Additional comments:
Full documentation for the program is included in the distribution in the form of a README file, a User Guide and a Reference manuscript.

Running time:
IBM Linux Cluster 1350, 512 nodes with 2 Processors for each node and 2GB Ram for each processor, at Cineca, was adopted to make performance tests. Processor type: Intel Xeon Pentium IV 3.0 Ghz and 512 KB cache (128 nodes have Nocona processors). Internal Network : Myricom LAN Card "C" Version and "D" Version. Operating System:: Linux SuSE SLES 8. The code was compiled using the mpif90 compiler version 8.1 and with basic optimization options in order to have performances that could be useful compared with other generic clusters.

Processors Elapsed time
162630.98
241790.89
321427.42
481015.41
64822.64

The table shows the elapsed time in seconds for each time step, running a simulation with 64 Million particles in the Linux Cluster system.