iPIC3D: implicit Particle-in-Cell code for Space Weather Applications (KTH)
Space Weather is the study of the processes originating on the Sun and propagating through the Solar System, which effects people and technology in Space and on the Earth, ranging from auroras in the polar regions to electromagnetic disturbances causing disruptive currents in infrastructure such as power and communication lines.
KTH, together with KU Leuven, has implemented the massively parallel Particle-in-Cell code, iPIC3D, as a C++ program using MPI and OpenMP. iPIC3D simulates the interaction of Solar Wind and Solar Storms with the Earth’s Magnetosphere. The magnetosphere is a large system with many complex physical processes, requiring realistic domain sizes and billions of computational particles. In the PIC model, plasma particles from the solar wind are mimicked by computational particles. At each computational cycle, the velocity and location of each particle are updated by solving the equation of motion, the current and charge density are interpolated to the mesh grid, and Maxwell’s equations are solved.
Figure 1: Distribution of the plasma particles interacting with the Earth’s magnetic field
iPIC3D, as the AllScale pilot application, focuses on the plasma particles interacting with the Earth’s magnetic field. Some highly energetic particles are trapped within the Van Allen radiation belts while others escape the confinement of the Earth’s magnetic field. This leads to large load imbalances since most of the particles are concentrate close to the Earth (violet cloud in the figure), while few particles are located in other space regions (almost not visible particle outside the violet cloud in the figure).
The current implementation of iPIC3D has considerable load imbalance problems. In AllScale, the numerical scheme of iPIC3D is redesigned to exploit the nested recursive parallelism, in order to benefit from the dynamic load balancing and resilience functionality, as supported by the AllScale Environment.
By redesigning the numerical scheme for iPIC3D to enable task-based nested recursive parallelism, we aim at leveraging nested recursive parallelism based on the AllScale API as well as reducing the total amount of global communication and synchronization by 80%, which will eventually lead to a much better scaling behavior for iPIC3D on massively parallel computing systems. Furthermore, the AllScale Environment will enable reduction of either energy or resource consumption, by at least 25%, while not causing an execution penalty of more than 10%.
As an example of such numerical scheme redesign, we provide the particle mover. Many PIC implementations try to use explicit solvers – such as leapfrog approximations or Tajima’s scheme – to solve the equation of motion. Obviously, these approaches have their advantages, however they might not ensure numerical stability as shown in Figure 1.
Figure 2: Incorrect result obtained with the forward integration and the explicit Tajima’s scheme of vxB rotation. The particle is gaining numerical energy, as shown by its orbit spiraling away. The analytical result is a closed circular orbit at the Larmor radius, which is shown by the solid blue line.
To secure numerical stability, we propose to employ a second order scheme, which is called the Boris mover, for solving the equation of motion. A code snippet of the Boris mover implementation for computing velocity is shown in Figure 2 (a). Another benefit that our implementation brings is the preservation of the mathematical structure of equations (this means that mathematical equations can be easily mapped to the lines of code) – thanks to the power of C++, in particular to composed data structures and overloaded operations. This feature makes it easier to read and understand the code. To note, the particle position, see Figure 2 (b), is simply updated by a product of the calculated velocity and the time step.
This scheme from 70s involves also computations on a half time step points and strengthens numerical stability. In addition, it is well suited for task-based nested recursive parallelism as each particle is processed independently.
(a) Computing particle velocity
(b) Computing particle position
Figure 3: A code snippet of the Boris mover implementation in the AllScale iPIC3D code
The Boris mover implementation using the AllScale Toolchain proves to work properly (see Figure 3). These results are similar to the ones in Figure 1, but with the correct circular orbit at the Larmor radius.
Figure 4: Larmor radius computations using the Boris mover making use of the AllScale Toolchain
Relative scalability using the AllScale Toolchain To note the adjustment of the iPIC3D pilot towards better utilization of the AllScale API/ SDK capabilities is still ongoing. Currently, we expose the code to new scheduling policies as well as perform first distributed-memory runs, harvesting preliminary results. Here, we present our preliminary scalability results for the particle mover part of the code. This part is the most compute-intensive segment of the code, especially in the Earth’s radiation belts simulations. For both the original iPIC3D and the AllScale iPIC3D particle mover, we used the same input file that corresponds to the benchmark, aiming to simulate the Van Allen radiation belts. But, the compilers were of different versions: gcc v.4.9.2 for the original particle mover since the code failed to compile with the most recent version of gcc; gcc v.6.2.0 for the AllScale particle mover. The experiments were carried out on an Intel Ivy Bridge CPU with 1TB RAM per node on the Tegner cluster at PDC Center for High-Performance Computing at KTH. We fixed the problem size and varied the number of cores used for the AllScale implementation – the strong scaling test. The same was applied to the original code although with a slight difference in the 3D configuration of MPI processes. For instance, we used a configuration of eight MPI processes (2x2x2) on eight cores. These results indicate that AllScale is capable of boosting the performance of the iPIC3D compute-intensive part. The next section presents more detailed study as well as large simulations, involving the entire pilot.
Figure 5: Strong scaling of the particle mover part of the iPIC3D code implemented with the AllScale API/SDK and compared against the original version on an Intel Ivy Bridge node at PDC