Specification and design of the AllScale environment
Early prototype of the AllScale environment and pilot applications
Advanced prototype of the AllScale environment and pilot applications
Final prototype of the AllScale environment and pilot applications
Tested, evaluated and tuned AllScale environment; project completion
WP2: Requirements and overall system architecture design
The main objective of WP2 is to identify and collect application and system requirements, as well as the developments of a consistent overall architecture for the AllScale Environment, satisfying those requirements. The stated requirements and architecture details are continuously monitored and updated over the course of the project development. The major results of this effort are deliverables documenting requirements and objectives, external technological developments, system architecture details, and the design of the AllScale API interface. The later is the interface of the AllScale Environment offered to the AllScale pilot applications as well as future applications to be built upon the results of the AllScale project.
The main tasks of WP2 over 2 years of the project have been the following :
Definition of requirements;
Collection of external technological developments;
Revision of the AllScale Objectives to more precisely capture and represent the project goals;
Establishment of the overall system design, including details on the various interfaces between involved components,in paticular;
Interface for the AllScale API.
This effort led to an accurate definition of requirements and external project-related developments described in Deliverable D2.1 submitted on month 14. The updated and final version of the document, Deliverable D2.2, was delivered on month 22.
All deliverables have been successfully produced and submitted on time, as indicated in the following table:
Relationship with other WP-s
This work package has direct links with the technical work packages in the project. Such links were established right from the beginning of the project in order to derive requirements from the pilot applications (WP6) towards the AllScale Toolchain (WPs3-5) and inverse. For instance, requirements regarding external libraries support imposed by the AllScale Compiler (WP3) and the AllScale Runtime System (WP4) towards the pilots. This effort was extended towards definition of the AllScale system architecture with the interfaces between the AllScale Toolchain components – such as compiler-runtime, runtime-monitoring, monitoring-resilience, scheduler-monitoring, etc. This, therefore, was also accompanied by the definition of both User- and Core-Level AllScale APIs, again involving the whole stack of the technical work packages (WPs2-6).
These linkages are clearly reflected in the system architecture diagram shown here, a diagram which reflects the layered architecture of the AllScale system.
No work has been imported from outside the project. There is considerable linkage between the other tech WPs in the project and WP2.
The initial requirements on the AllScale Environment were establish within the first several months of the project to drive and facilitate the project development. At first, a detailed investigation of the pilot applications has been carried out, resulting in a preliminary description of the application-specific requirements for the AllScale Environment. This work was followed by the definition of the interface and functionality requirements component-wise within the AllScale Environment. In addition, the AllScale Environment also imposed requirements towards the pilot applications. The requirements and constrains definition has been regularly discussed and revised during WP2 bi-weekly conference calls as well as during the cross-reviewing phases. Moreover, some of the AllScale objectives and their measure of success have been revised to more precisely capture and represent the project goals. Furthermore, a corresponding technology watch function was set up and the extended technological developments have been collected to ensure that the AllScale Environment adapts and responds to both hardware and software evolutions.
The AllScale Architecture has been gradually refined starting from a first design outlined in the project proposal. Thereby, the runtime application model, comprising the concept of work and data items – developed as part of this project – as well as a hardware model has been concretized in a bi-lateral meeting in Erlangen in yearly 2016. An additional overhaul of the design of those core concepts, plus the presentation of concepts to the project partners involved in runtime-related components (scheduler, resilience manager, and monitoring component) has been conducted during a technical meeting in Erlangen in February 2016. Additionally, derived interfaces for the runtime-related components have been drafted. Following the meeting, various details of the design have been evaluated for their feasibility using prototype implementations. Those prototypes serve as the evaluation platform for the development of the system’s architecture. Deliverable D2.3 documenting the architecture has been submitted on month 16.
In addition to the internal architecture of the AllScale Toolchain (compiler and runtime system), the visible interface to the end users, the design of the AllScale API, is the other major obligations of this work package. As covered in the original proposal, the AllScale API is subdivided into the Core and User-Level API. Starting from a shared-memory based prototype implementation of the Core API the fully working prototypes of the AllScale Pilot applications have been created in C++. From that, the User-Level API component requirements have been deduced, and the corresponding solutions have been added to the User-Level API design. Additionally, the Core API abilities have been extended by 2 major improvements to support the implementation of required User-Level components:
Support for decomposable futures, to realize fine grained, scalable task synchronization
C++ concepts like elements to specify the traits of data structures to be distributed and managed by the AllScale Runtime System
Both of those have been integrated into the design of the AllScale Core API.
Starting from October 2016, we had one new activity in WP2 – we have extended our efforts on the external technology watch to derive a task-based taxonomy for the HPC technologies – such as programming API, runtime systems, scheduling approaches, monitoring frameworks, and fault tolerance mechanisms – due to the fact that there is no comprehensive overview or classification of task-based technologies for HPC. In our article, appeared in the Proceedings of the PPAM conference, we also demonstrated the usefulness of our taxonomy by classifying some of the state-of-the-art task-based environments in use today. We have extended our work to the full journal article by introducing a new clustering of API characteristics, grouping related characteristics and clarifying their interdependencies; providing clarification regarding the association between APIs and one or more runtime systems; extending the list of runtimes with more examples such as Qthreads and Argobots; including more detailed descriptions and clarifications on API and runtime classification.
- The pilots descriptions have been updated.
- Update required modifications to HPX to realise failure tolerance; originally HPX was presumed to be resilient, turned out to require extra modifications.
- Resilience manager interface description updated to fit the updated software design for this component.
- Objective function specification format for multi-objective optimiser has be updated to match format developed for the updated version of the runtime scheduler component.
We have produced the final definition of requirements and a solid overview of external technological developments; completed the AllScale API specification; established detailed software architecture of the AllScale Environment comprising all its components, their interfaces and responsibilities.
A Taxonomy of Task-Based Parallel Programming Technologies for High-Performance Computing :
- Conference article : springer.com
WP3: High-level parallel API and API-aware optimising source-to-source compiler
WP3 covered research and development operations in the context of the AllScale compiler. It developed the Core and User API layers to facilitate the recursive description of algorithms and to establish compiler awareness regarding the parallel structure of processed programs. Furthermore, research and implementation work on AllScale-specific code optimisation techniques, multi-versioning solutions, and automatic serialization code generation for user-defined types was performed, enabling the automated porting of input codes to distributed memory systems. Finally, a major part of the compiler work in WP3 was the development of a sophisticated set of analyses deriving data requirement functions for application-defined work items.
Over the course of the project, the API and compiler components progressed towards the full implementation of the AllScale system. What follows is a short overview of the highlights of these components
In the API:
- The core API, covering the recursive parallel operator and treeture-based synchronization.
- Full feature-completeness in the user API layer as required by the pilots. This includes adaptive grid and unstructured multi-resolution mesh data structures and high-level operands for them.
- The introduction of additional user API data structures beyond those required by pilots, including binary trees and k-D trees.
- Extensive online documentation of the API, including several tutorials of different levels of complexity.
In the compiler:
- Distributed code generation by integrating data requirements and accesses to the Runtime System’s data item manager as required.
- Auto-serialization support for user-defined C++ types, which alleviates application programmers from the responsibilities of dealing with serialization of their types for distributed memory execution.
- Over a factor of 100-compilation performance improvement for large-scale C++ codes over the course of the project. This was achieved by various algorithmic improvements in core components of the compiler, as well as a significant optimization effort in the analysis framework.
- A significant extension to the unit testing coverage for all compiler components.
- Extensions to the AllScale compiler frontend, backend and core for improved support of pilot (and other application) compilation.
- AllScale-specific optimization passes, primarily related to hoisting data item access requests out of loops and some types of nested functions, in order to reduce the workload for the runtime data item manager component
- Implementation of many features to improve user interaction, including interactive HTML reporting, as well as analysis debugging tools. These tools now also support reporting of distributed memory version generation issues.
In the progress of this work, close collaboration with WP4 was required in order to correctly and efficiently utilizes the decentralized data manager component API within generated compiler output cod. Also, the continuous development of the AllScale API was carried out based on feedback from the pilot application groups within WP6.
The full AllScale compiler prototype was delivered in project month 31, and is available (with on-going further refinements) on github. The current API specification and implementation has been available on github for several months.
Figure 1: User Interface Snapshot
Figure 1 above shows a snapshot of the conversion report generated by the AllScale compiler. It offers insight into the conversion of each parallel recursive execution block, including the analysis reports.
Figure 2: User Code and Variable References as well as Hint Generation
In the case of failures, the current compiler version is also capable of tracing out the complete reasoning for many errors, and even generate hints for resolving common issues. Figure 2 shows an example, where the type and name of the variable being captured is derived and dispalyed, as well as the path, over which it is captured into the work item. Additionally, the compiler shows a hint on how to commonly resolve this type of error.
Figure 3: Unit, Integration and Distribution Testing
As API and compiler are central parts in AllScale, they have been kept stable on several configurations throughout, despite very active development and use of a large-scale testing infrastructure. The screenshot in Figure 3 gives an overview of all the integration-testing jobs , which validated the AllScale compiler and API.
All the 4 planned deliverables have been produced in WP3, and they are:
Deliverable D3.1 : API Implementation for Recursive Parallelism (a) describes in generic terms the first AllScale User and Core API implementation and the associated source-code maintained in GitLab. D3.1 was evaluated during the first interim review and no major changes were required. However, the deliverable was updated following the reviewer’s feedbacks.
Deliverable D3.2 : API Implementation for Recursive Parallelism (b) is a source code deliverable, providing implementations of the AllScale API.
Deliverable D3.3 : AllScale Compiler Prototype is a very short deliverable and corresponds to the description of the AllScale compiler prototype. The compiler prototype is freely available on github.
Deliverable D3.4 : Full AllScale Compiler Prototype is a source code deliverable plus accompanying overview of the implementation and components of the full AllScale compiler prototype.
WP4: Unified runtime system for extreme scale systems
WP4 covers the research and development of the scalable AllScale runtime system for any scale of parallel computing systems to effectively support dynamic, nested (recursive) parallelism as well as providing sophisticated support for data migration. The latter serves as the foundation for both dynamic load balancing across address space boundaries, as well as resiliency support by moving data from and to persistent storage devices. Furthermore, the innovative techniques to incorporate alternative parallel code versions – targeting specific hardware (e.g. accelerators) and multi-objective auto-tuning, are applied.
Relationship with other WP-s
WP4 is the backbone of the AllScale Project. While the Pilot applications (WP6) don't directly interact with the Runtime system, their code get's compiled (WP3) using the outcome of WP4, as a result. WP5 is as well heavily depending on the runtime and uses the provided interfaces directly.
7 deliverables are produced:
Deliverable D4.1 “Allscale runtime system interface specification” has been evaluated during the first interim review and clearly describes the runtime interface specification with an emphasis on the strategies developed for optimizing task-based recursive parallelism. There is an effective collaboration with WP5 to include resilience strategies into the decisions of the scheduler.
Deliverable D4.2 “Early AllScale runtime system prototype is a source code deliverable providing the first early prototype of the AllScale runtime system.
Deliverable D4.3 “AllScale runtime system monitoring infrastructure” One of the key aspects of the AllScale runtime system is the interaction between the monitoring and the runtime component for non-functional parameters of an application. In order to accomplish goals like dynamic load balancing, a monitoring infrastructure needs to be provided.
Deliverable D4.4 “Data Management Support” Efficient data management is also needed to accomplish goals like dynamic load balancing and resiliency as these features also depend on the efficient handling of data dependencies of work items.
Deliverable D4.5 “Resource Management Support” Within Allscale, we envisioned the capabilities for the runtime to monitor and configure dynamically hardware parameters such as core frequency to reduce the ongoing power consumption. This would enable the scheduler to dynamically select the set of resources and hardware configuration that achieves the user’s requested objectives.
Deliverable D4.6 “Multi-Objective Dynamic Optimizer (a)” is all about designing and implementing a multi-objective dynamic optimizer component within the AllScale architecture.
The Allscale runtime system interface has been presented at the workshop “MTAGS16: 9th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers” co-located with the Supercomputing Conference 2016 in Salt Lake City.
The first prototype has been developed and thanks to good interaction with other workpackages, further refinements have been done. There is a noticeable effort to reach standardization bodies in the hope to influence on-going C++ standardization to match with the project runtime system development needs.
No update is available on the status of Data Management Infrastructure within the deliverables, but an update has been given during the review meeting, showing no deviations to the DoA.
WP5: Cross layer resilience and online analysis for non-functional parameters
WP5 develops language and tool support for continuous monitoring of application performance and error resilience, as well as support for application-specific, algorithmic error detection and recovery from both errors and performance. The activity in the WP has been divided into 5 tasks, which work together to deliver the objectives.
T5.1 Application Specific Resilience Analysis
T5.2 On-demand, on-line introspection of non-functional parameters
T5.3 Resilience Primitives
T5.4 Development of application specific resilience techniques
T5.5 Resilience manager
A key objective is to develop scalable event detection and notification mechanisms that attempt to localise the impact of execution-time anomalies due to errors or performance variability in hardware.
Relationship with other WPs
WP5 has direct links with the technical work packages in the project. It contributes to WP2, defining the overall system architecture for AllScale. In addition the code implemented in WP6 links to the compiler prototype and incorporation of resilience and monitoring into the compiler. Furthermore, WP6 is inextricably linked to the research and development carried out in WP4, the overarching activity which extends the HPX runtime system in order to create the AllScale runtime system prototype.
These linkages are clearly reflected in the system architecture diagram, which reflects the layered architecture of the AllScale system. The WP5 tasks correlate to the purple shaded boxes to the middle right of the figure. WP5 is implicitly linked to the application exemplars in WP6 through the use of the compiler toolchain and runtime system to enable the execution of these applications. Thus the resilience and monitoring components are not directly visible to the application, but these components nevertheless influence the execution of the applications.
There is considerable linkage between the other tech WPs in the project and WP5.
Deliverable D5.1 “Application-specific resilience strategy” has been produced. It contains the detailed results of the questionnaire gathering resilience requirement from the 3 pilot applications. The results show that application developers lack knowledge and understanding of what a fault is and what the benefits and possible approaches of application-specific resilience are. It therefore clearly limits the benefit of the questionnaire and its ability to properly gather user requirements and demonstrates the need to better explain the work of AllScale on resiliency aspects. This lack of understanding should be taken into account for the dissemination of the project results.
Deliverable D5.2 “On-Demand, On-Line Monitoring Infrastructure (a)” is a source code deliverable that provides the first prototype of the AllScale performance monitoring infrastructure.
Deliverable D5.4 “Resilience Primitives” The main objectives of the task are:
1. Identify requirements towards the AllScale Runtime System to implement resilience primitives.
2. Provide an implementation for resilience primitives in the AllScale Runtime System.
3. Create a cost model for task-level checkpoint-restart functionality.
Deliverable D5.5 “Implementation and Evaluation of Application Specific Resilience Techniques (a)” This deliverable will detail the ongoing work on a generic task-level resilience protocol.
Deliverable D5.6 “Implementation and Evaluation of Application Specific Resilience Techniques (b)” This deliverable follows from the work already reported in project deliverables D5.1 and D5.5. In D5.1, which was a horizon scanning type activity that took place in the first six months of the project, we reviewed the literature on resilience techniques as applicable to ExaScale computing and surveyed the AllScale pilot applications on their current resiliency mechanisms and on their plans for the emerging ExaScale systems. It emerged that checkpoint re-start for hard faults was the preferred solution for the three pilot applications in AllScale.
Deliverable D5.7 “Resilience Manager” The resilience manager needs to implement both the detection and recovery strategies, following the protocols detailed in D5.5.
Work in progress
M7 – M20 T5.3 Resilience Primitives
There is a close coupling between the resilience manager and the scheduling component. For example, when a task dies, the scheduler can no longer activate that task and should be informed of the task state by the resilience manager. Furthermore, while the scheduler is being actively developed there are bugs in the code, which can impact testing of the resilience manager. The scheduler work performed in WP6 is built upon HPX.
Thomas Heller has built an extension to the HPX scheduler, reliant on treetures, making that component specific to the recursive decomposition in AllScale. However WP6, led by IBM, is developing more advanced scheduling on top of the AllScale treeture based scheduler mention in the last sentence. To date the issues faced by the WP5 resilience manager relate to the treeture based component and not the more advanced WP6 components.
A first prototype of the distributed monitoring component has been implemented. This prototype is currently being tested and improved. There is also on-going work testing the tracing capabilities of the shared memory (non-distributed) version of the monitoring component with the iPIC3D pilot applications.
Final results envisaged
At the end of the project the software generated by WP6 will be instantiated as a component within the complete toolchain and runtime that the project will be deliver. While these components will be identifiable, they will be intrinsically linked to the templates and APIs used the AllScale system. This means that the components will not immediately be available for use in other runtime environments without significant porting efforts. Nevertheless the components will enable substantial experiments to be conducted using the AllScale system and so contribute to results suitable for academic publication.
WP6: Integration, testing and pilot applications
WP6 establishes the AllScale computing infrastructure comprising a range of computing systems provided by various partners and prepares pilot applications that are used to drive the development, tuning and proof of value of the AllScale environment. WP6 has been divided into 4 tasks, which work together to deliver the targeted objectives.
T6.1 AllScale computing infrastructure deployment and management
T6.2 Preparation and porting of pilot applications
T6.3 Assembling the AllScale environment and pilot applications
T6.4 Testing, evaluation and tuning of the AllScale environment with the pilot applications
Fine/Open large industrial unsteady CFD simulations preparation and evaluation (a)
Fine/Open large industrial unsteady CFD simulations preparation and evaluation (b)
Testing, evaluation and tuning of the AllScale environment with the pilot applications
Relationship with other WPs
WP6 has direct links with the complete AllScale toolchain with ultimate focus on testing, evaluation and tuning of the AllScale environment with the pilot applications. As a starting point, this WP involves all partners to identify characteristics of partners’ computing infrastructure and the desired external computing infrastructure to enable successful ExaScale implementation. Interaction with WP2 focused on providing user requirements from the 3 pilot applications to guide architecture and system design of the AllScale environment. Deliverables 2.1 and 2.2 summarise these requirements.
As part of Task 5.1, partners liaised with WP5 to analyse pilot applications and identify the most appropriate resiliency strategies for AllScale developments. Objectives included :
Understand the vulnerability of application tasks and data structures and
Design application-independent checkpointing strategies.
AllScale selected existing industry standard pilot applications codes from different domains (Space-weather, CFD, environment).
WP6 results cover 3 different areas:
Requirements and infrastructure,
Pilot application development and porting and
Testing and evaluation of the AllScale environment.
Deliverable 6.1 worked with partners to identify available computing infrastructure and necessary external supercomputers. Further, appropriate continuous integration workflow was implemented to enable efficient development of the AllScale environment and pilot applications (IPIC3D, Fine/Open and AMDADOS. These developments are described in detail in deliverables 6.2, 6.4 and 6.6 respectively. In January 2017 (M16), the first successful implementation of the AllScale environment was achieved to reach MS2 of the project. Deliverable 6.8 details the status and structure of the achieved installation, integration, and deployment of the AllScale environment and pilot applications. This was complemented by an advanced prototype of the AllScale environment in M24.
5 deliverables were produced.
Deliverable D6.1 "AllScale Computing Infrastructure" gives an overview of the computing infrastructure available for the AllScale project.
Deliverable D6.2 "iPIC3D implicit particle-in-cell code for space weather applications preparation and evaluation (a)" describes the work done with one use case iPIC3D;
Deliverable D6.4 provides information on the work done with CFD code FINE while deliverable D6.6 with AMADOS.
Very often a new paradigm is used with a completely new algorithm and it would be valuable to understand how much easier/faster it is to implement them with the AllScale API compared to already widely used programming models. This aspect is important, especially for the exploitation of the project results: SMEs or more generally commercial companies would be more interested in using the new AllScale paradigm if they can save a substantial amount of time (and therefore money) and however reach Exascale too.
In all cases, scalability results are for now very limited and constraint to shared-memory architectures. The impact of further optimizations in the AllScale runtime, the implementation of the pilot applications and the implementation of the appropriate structures in the User API should be evaluated.
Deliverable D6.6 "AMDADOS Deepwater Horizon application preparation and evaluation (a)" In this pilot application, DA and AM are used jointly and embedded in a modelling implementation of the advection diffusion equations for simulating the Deepwater Horizon accident.
Deliverable D6.7 "AMDADOS Deepwater Horizon application preparation and evaluation (b)" The acronym AMDADOS stands for Adaptive Meshing and Data Assimilation for Dispersion of Oil Spills.
Deliverable D6.8 "Installation Integration Deployment AllScale Environment Pilot Applications" describes the technical infrastructure in place to share and develop the various parts of the software stack.
Work in progress
This report is being written midway through M27 of the project. The start and end dates for each project task are show below
M4 – M36 T6.1 AllScale computing infrastructure deployment and management
M4 – M32 T6.2 Preparation and porting of pilot applications
M15 – M36 T6.3 Assembling the AllScale environment and pilot applications
M16– M36 T6.4 Testing, evaluation and tuning of the AllScale environment with the pilot applications
Current work is focused on assembling the AllScale environment and pilot applications and testing of this with the pilot applications
Final results envisaged
Developing an efficient and effective computing environment such as AllScale requires continuous testing from both software and hardware perspective together with high quality demonstrations of its capabilities. WP6 aims on providing such for AllScale. The final goal is to provide ExaScale level demonstrations of the AllScale environment across a number of high value, simulation domains (space-weather, CFD and environment).