| ||||||
JINR used DIRAC platform to triple speed of scientific tasks implementationSince 2016, JINR has created and is developing a service for unified access to heterogeneous distributed computing resources based on the DIRAC Interware open platform, which includes all the main computing resources of JINR. The created platform has already made it possible to accelerate the processing of large sets of jobs by about three times. The speed of computing is provided, among other things, by the integration of the clouds of scientific organizations of the JINR Member States, the cluster of the National Autonomous University in Mexico, and the resources of the National Research Computer Network of Russia, NIKS, which provides access to the network infrastructure to more than 200 organizations of higher education and science. At the moment, the DIRAC-based service is used to solve the tasks of collaborations of all three experiments at the Accelerator Complex of the NICA Megascience Project: MPD, BM@N, and SPD, as well as the Baikal-GVD Neutrino Telescope.“In fact, integration made it possible to combine all the large computing resources of JINR with each other. If there was no integration, users would most likely have to choose one of the resources and configure all workflows for its operation. Using a coherent infrastructure allows one not to be attached to a specific resource, but to use all the resources that are available. For large sets of jobs, this speeds up the execution by about three times,” a researcher of the Distributed Systems Sector at MLIT Igor Pelevanyuk said. He noted that without the DIRAC Interware-based service, the relevant tasks of mass generation of the MPD Experiment data would be considered longer and would completely occupy a separate computing resource for several months. As of January 2023, thanks to the integration of resources using DIRAC, 1.9 million jobs were completed on the capacities of the distributed platform. The number of calculations performed is estimated at 13 million HEPSPEC2006 days, which is the equivalent of 1,900 years of calculations on one core of the modern central processor. Thus, the average duration of the execution of one task in the system was almost 9 hours. To predict the speed of calculations, the scientists used the HEPSPEC2006 Benchmark, a programme for checking the processor speed, which measures at what speed the processor performs calculations similar to Monte Carlo data generation. Knowledge of the benchmark results on different resources makes it possible to connect all the processors involved in solving tasks with different computing speeds, and to evaluate the contribution of each of the resources involved in distributed computing. “The more resources we have, the faster we can do a certain amount of calculations on average. A total of 45% of calculations were processed on Tier1 and Tier2 resources. The Govorun Supercomputer has done about the same amount of work, but we manage to perform those jobs on it that are particularly demanding of RAM and free disk space. Such tasks often cannot be effectively processed on other resources available to us,” Igor Pelevanyuk commented. The DIRAC Interware at JINR is used mainly where the processing of large amount of calculations is required, which can be divided between tens of thousands of independent jobs. As a rule, for scientific computations where the total amount of calculations is not so large, it is enough for scientists to use one resource. On these resources, part of the computing power is allocated to perform tasks sent to DIRAC. This share is determined in accordance with the policy of a particular resource, its current load and the amount of work that needs to be done in a certain period of time. Thus, the largest shares are at Tier-1 and Tier-2 centres, where two thousand cores are allocated for the DIRAC, up to two thousand cores (depending on the load) are at the Govorun Supercomputer, and at the cloud, where up to 500 cores are allocated for the DIRAC tasks. According to Igor Pelevanyuk, if new experiments that can effectively use the DIRAC Interware-based service appear, it will be easier for them to launch, since the basic schemes of work within the distributed infrastructure have already been developed and tested. “Many researchers work within one of the infrastructures, i.e. the supercomputer, the cloud, the NICA Computing Cluster, etc. This is enough for a number of jobs, and if there are no prerequisites for a significant increase in the computing load in the future, then, most likely, the transition to the system that we have created will not be required. it is not a universal replacement for standard approaches. However, for complex computational tasks, the created platform provides a new approach that allows one to reach a new level of complexity and increase the amount of resources that can be used for research by an order of magnitude,” the scientist said. The most active user of the created infrastructure at the moment is the collaboration of the experiment at the MPD Facility. This, one of the two detectors at the NICA Collider, accounts for 85% of the calculations performed. The current calculations for MPD, while the detector has not started operation yet, are devoted to Monte-Carlo simulation. With the help of special computer programmes that “collide” particles virtually and trace decay products through the matter of the experimental facility, it is possible to debug and configure the operation of reconstruction algorithms and detectors data analysis, while helping to form a scientific programme. Model data continues to be collected during experimental runs. “We use a set of generators that allows us to create such events, then we run a real experiment and collect two sets of data: real ones from the detector and those generated by us. And if there is no significant difference between them for the selected and well-studied physical process, then the experimental data we have collected correspond to the reality. Together with the set of software created in the experiment for data recovery and analysis, these data can be used to search for new physics. Some of the jobs in this project were performed by colleagues from Mexico, which in 2019 officially joined the implementation of the NICA Megascience Project. The participation of the computing cluster of the National Autonomous University of Mexico, a collaborator of the MPD Experiment at NICA, showed that the developed service can also be used for the integration of resources, including outside JINR. The integration of cloud infrastructures of the JINR Member States deserves special attention. To implement it, it was necessary to develop a special software module that made it possible to integrate cloud resources running on the basis of OpenNebula software into the DIRAC system. “The integration of external resources is a window of opportunity for other countries that would like to participate in computing for such large scientific collaborations as Baikal–GVD, MPD, SPD, BM@N. If the participants decide to provide part of their contribution with computing, then their resources can be integrated into the existing system, and the question will be how many resources they are able to allocate,” the scientist said. A series of works “Development and implementation of a unified access to heterogeneous distributed resources of JINR and the Member States on the DIRAC platform” was awarded the Second JINR Prize for 2021 in the nomination “Scientific-research and scientific-technical papers”. The research was carried out jointly at the Meshcheryakov Laboratory of Information Technologies, Veksler and Baldin Laboratory of High Energy Physics, JINR, and the Center for Particle Physics, Aix-Marseille University (Marseille, France) by a team of authors: Vladimir Korenkov, Nikolay Kutovskiy, Valery Mitsyn, Andrey Moshkin, Igor Pelevanyuk, Dmitry Podgainy, Oleg Rogachevskiy, Vladimir Trofimov, Andrey Tsaregorodtsev. www.jinr.ru
|
|