iPIC3D: implicit Particle-in-Cell code for Space Weather Applications (KTH)

Space Weather is the study of the processes originating on the Sun and propagating through the Solar System, which effects people and technology in Space and on the Earth, ranging from auroras in the polar regions to electromagnetic disturbances causing disruptive currents in infrastructure such as power and communication lines.

KTH, together with KU Leuven,  has implemented the massively parallel Particle-in-Cell code, iPIC3D, as a C++ program using MPI and OpenMP.

iPIC3D simulates the interaction of Solar Wind and Solar Storms with the Earth’s Magnetosphere. The magnetosphere is a large system with many complex physical processes, requiring realistic domain sizes and billions of computational particles. In the PIC model, plasma particles from the solar wind are mimicked by computational particles. At each computational cycle, the velocity and location of each particle are updated by solving the equation of motion, the current and charge density are interpolated to the mesh grid, and Maxwell’s equations are solved.

iPIC3D, as the AllScale pilot application, focuses on the plasma particles interacting with the Earth’s magnetic field. Some highly energetic particles are trapped within the Van Allen radiation belts while others escape the confinement of the Earth’s magnetic field. This leads to large load imbalances since most of the particles are concentrate close to the Earth (violet cloud in the figure), while few particles are located in other space regions (almost not visible particle outside the violet cloud in the figure).

The current implementation of iPIC3D has considerable load imbalance problems. In AllScale, the numerical scheme of iPIC3D is redesigned to exploit the nested recursive parallelism, in order to benefit from the dynamic load balancing and resilience functionality, as supported by the AllScale Environment.


By redesigning the numerical scheme for iPIC3D to enable task-based nested recursive parallelism, we aim at leveraging nested recursive parallelism based on the AllScale API as well as reducing the total amount of global communication and synchronization by 80%, which will eventually lead to a much better scaling behavior for iPIC3D on massively parallel computing systems. Furthermore, the AllScale Environment will enable reduction of either energy or resource consumption, by at least 25%, while not causing an execution penalty of more than 10%. Finally, the AllScale iPIC3D will allow us to perform the PIC simulation of the Earth’s radiation belts formation, on a grid with 5,000x5,000x5,000 cells with at least 10,000 particles per cell, requiring 10²²-10²³ FLOPS or 27-hour-simulation on an ExaScale super-computer.

AllScale Potential

The current implementation of iPIC3D suffers from considerable load imbalance problems. As part of the AllScale project, we restructured the iPIC3D code to exploit recursive parallelism that can then benefit from the dynamic load balancing and resilience functionality supported by the AllScale Environment.

We already changed the numerical scheme of iPIC3D, switching from the implicit formulation of the PIC numerical scheme to the explicit formulation, in order to allow for the recursive parallelism of AllScale to be introduced at global level. Our major effort was on the following two components: the field solver for the Maxwell’s equation and the particle mover for the equation of motion. Consequently, we started developing a prototype version of iPIC3D allowing us to avoid global synchronization points and to use the AllScale Environment potential.

Using AllScale API

All four stages of iPIC3D are essentially parallel loops iterating over all cells in the 3D-space grid and such parallel loops can be directly mapped to a recursive formulation. 
To implement a parallel loop using nested recursive parallel tasks, each tasks covers a range of elements. The initial task covers the entire loop range. If split, the task's range is split in half, producing two new tasks. This general principle is encoded into a generic pfor operator offering the same abstract interface like a conventional loop.

Thus, by utilizing the pfor operation on the structure of the 3D grid of the PIC code, tasks covering sub-regions of the overall 3D grid are generated and distributed throughout the system. Furthermore, operations on particles (blue dots on the figure) within cells are additionally parallelized by utilizing parallel loops over the corresponding lists of particles – based on the same, generic pfor operator. However, within the system, all tasks are following the nested recursive paradigm and may those also be managed as such. 

The grid of cells needs to be adapted to facilitate its dynamic distribution among multiple address spaces. To this end, facilities to address sub-regions of the overall grid have to be provided to the runtime system to be able to address fractions of the overall data block as well as to manage and manipulate the data distribution. While several different formats for addressing sub-regions are conceivable, for the PIC simulation unions of axis-aligned boxes are utilized. Thus, arbitrary sub-blocks of the grid may be moved between address spaces by the AllScale runtime system.

Since n-D grids are a common concept, the AllScale infrastructure provides a predefined, generic grid implementation covering the corresponding implementation details similar to containers in the standard template library (STL) provided by any C++ implementation. Thus, the adaptation of the data structure narrows down to using a container type instead of a raw array within the simulation code.

Example of code for the particle mover (Boris mover algorithm)

void moveParticlesBorisStyle(const UniverseProperties & universeProperties, Cell & cell, const utils::Coordinate < 3 > & pos, const Field & field) {

  assert_true(pos.dominatedBy(universeProperties.size)) << "Position " << pos << " is outside universe of size " << universeProperties.size;

  // quick-check
  if (cell.particles.empty()) return;

  // -- move the particles in space --

  // extract forces
  Vector3 < double > Es[2][2][2];
  Vector3 < double > Bs[2][2][2];
  for (int i = 0; i < 2; i++) {
   for (int j = 0; j < 2; j++) {
    for (int k = 0; k < 2; k++) {
     utils::Coordinate < 3 > cur({
      pos[0] + i,
      pos[1] + j,
      pos[2] + k
     cur += utils::Coordinate < 3 > (1); // shift because of the boundary cells
     Es[i][j][k] = field[cur].E;
     Bs[i][j][k] = field[cur].B;

  const auto cellCenter = getCenterOfCell(pos, universeProperties);

  double vol = universeProperties.cellWidth.x * universeProperties.cellWidth.y * universeProperties.cellWidth.z;

  // update particles
  allscale::api::user::pfor(cell.particles, [ & ](Particle & p) {
   // Docu: https://www.particleincell.com/2011/vxb-rotation/
   // Code: https://www.particleincell.com/wp-content/uploads/2011/07/ParticleIntegrator.java

   // get relative position of particle within cell
   const auto relPos = allscale::api::user::data::elementwiseDivision((p.position - (cellCenter - universeProperties.cellWidth * 0.5)), (universeProperties.cellWidth));

   // interpolate
   auto E = trilinearInterpolationF2P(Es, relPos, vol);
   auto B = trilinearInterpolationF2P(Bs, relPos, vol);

   // update velocity
   p.updateVelocityBorisStyle(E, B, universeProperties.dt);

   // update position



 void updateVelocityBorisStyle(const Force & E,
  const Force & B, double dt) {

  auto k = q / mass * 0.5 * dt;

  auto t = k * B;

  auto t_mag2 = allscale::api::user::data::sumOfSquares(t);

  auto s = (2.0 * t) / (1 + t_mag2);

  auto v_minus = velocity + k * E;

  auto v_prime = v_minus + crossProduct(v_minus, t);

  auto v_plus = v_minus + crossProduct(v_prime, s);

  velocity = v_plus + k * E;


 void updatePosition(double dt) {
  position += velocity * dt;


This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671603

Contact Details

General Coordinator

Thomas Fahringer

Scientific Coordinator

Herbert Jordan