Workpackages

 

Milestone number

Milestone title

Date

MS1

Specification and design of the AllScale environment

M7

MS2

Early prototype of the AllScale environment and pilot applications

M16

MS3

Advanced prototype of the AllScale environment and pilot applications

M24

MS4

Final prototype of the AllScale environment and pilot applications

M32

MS5

Tested, evaluated and tuned AllScale environment; project completion

M36

 

WP2 WP3 WP4 WP5  WP6

 

WP2: Requirements and overall system architecture design

Objectives

The main objective of WP2 is to identify and collect application and system requirements, as well as the developments of a consistent overall architecture for the AllScale Environment, satisfying those requirements. The stated requirements and architecture details are continuously monitored and updated over the course of the project development. The major results of this effort are deliverables documenting requirements and objectives, external technological developments, system architecture details, and the design of the AllScale API interface. The later is the interface of the AllScale Environment offered to the AllScale pilot applications as well as future applications to be built upon the results of the AllScale project.

The main tasks of WP2 over 2 years of the project have been the following :

  1. Definition of requirements;

  2. Collection of external technological developments;

  3. Revision of the AllScale Objectives to more precisely capture and represent the project goals;

  4. Establishment of the overall system design, including details on the various interfaces between involved components,in paticular;

  5. Interface for the AllScale API.

Deliverables

This effort led to an accurate definition of requirements and external project-related developments described in Deliverable D2.1 submitted on month 14. The updated and final version of the document, Deliverable D2.2, was delivered on month 22.

All the efforts on the AllScale API specification resulted in the initial solid draft submitted as Deliverable D2.5 on month 17 and its finalized version, Deliverable D2.6, on month 25.

Deliverable D2.3 documenting the AllScale overall architecture has been submitted on project month 16.

Deliverable

Title

Date

D2.1

Requirement specifications and reports on external technological developments (a)

M14

D2.2

Requirement specifications and reports on external technological developments (b)

M22

D2.3

AllScale system architecture (a)

M16

D2.4

AllScale system architecture (b)

M32

D2.5

AllScale API specification (a)

M17

D2.6

AllScale API specification (b)

M25

Relationship with other WP-s

This work package has direct links with the technical work packages in the project.  Such links were established right from the beginning of the project in order to derive requirements from the pilot applications (WP6) towards the AllScale Toolchain (WPs3-5) and inverse. For instance, requirements regarding external libraries support imposed by the AllScale Compiler (WP3) and the AllScale Runtime System (WP4) towards the pilots, but also the desired resilience strategies and mechanisms (WP5) to be provided. This effort was extended towards definition of the AllScale system architecture with the interfaces between the AllScale Toolchain components – such as compiler-runtime, runtime-monitoring, monitoring-resilience, scheduler-monitoring, etc. This, therefore, was also accompanied by the definition of both User- and Core-Level AllScale APIs, again involving the whole stack of the technical work packages (WPs2-6). 

These linkages are clearly reflected in the system architecture diagram shown here, a diagram which reflects the layered architecture of the AllScale system.

Results imported

No work has been imported from outside the project. There is considerable linkage between the other tech WPs in the project and WP2.

Results produced

The initial requirements on the AllScale Environment were establish within the first several months of the project to drive and facilitate the project development. At first, a detailed investigation of the pilot applications has been carried out, resulting in a preliminary description of the application-specific requirements for the AllScale Environment. This work was followed by the definition of the interface and functionality requirements component-wise within the AllScale Environment. In addition, the AllScale Environment also imposed requirements towards the pilot applications. The requirements and constrains definition has been regularly discussed and revised during WP2 bi-weekly conference calls as well as during the cross-reviewing phases. Moreover, some of the AllScale objectives and their measure of success have been revised to more precisely capture and represent the project goals. Furthermore, a corresponding technology watch function was set up and the extended technological developments have been collected to ensure that the AllScale Environment adapts and responds to both hardware and software evolutions.

The AllScale Architecture has been gradually refined starting from a first design outlined in the project proposal. Thereby, the runtime application model, comprising the concept of work and data items – developed as part of this project – as well as a hardware model has been concretized in a bi-lateral meeting in Erlangen in yearly 2016. An additional overhaul of the design of those core concepts, plus the presentation of concepts to the project partners involved in runtime-related components (scheduler, resilience manager, and monitoring component) has been conducted during a technical meeting in Erlangen in February 2016. Additionally, derived interfaces for the runtime-related components have been drafted. Following the meeting, various details of the design have been evaluated for their feasibility using prototype implementations. Those prototypes serve as the current evaluation platform for the development of the system’s architecture. Deliverable D2.3 documenting the architecture has been submitted on month 16.

In addition to the internal architecture of the AllScale Toolchain (compiler and runtime system), the visible interface to the end users, the design of the AllScale API, are other major obligations of this work package. As covered in the original proposal, the AllScale API is subdivided into the Core and User-Level API. Starting from a shared-memory based prototype implementation of the Core API the fully working prototypes of the AllScale Pilot applications have been created in C++. From that, the User-Level API component requirements have been deduced, and the corresponding solutions have been added to the User-Level API design. Additionally, the Core API abilities have been extended by 2 major improvements to support the implementation of required User-Level components:

  1. Support for decomposable futures, to realize fine grained, scalable task synchronization

  2. C++ concepts like elements to specify the traits of data structures to be distributed and managed by the AllScale Runtime System

Both of those have been integrated into the design of the AllScale Core API.

Starting from October 2016, we have one new activity in WP2 – we decided to extend our efforts on the external technology watch to derive a task-based taxonomy for the HPC technologies – such as programming API, runtime systems, scheduling approaches, monitoring frameworks, and fault tolerance mechanisms – due to the fact that there is no comprehensive overview or classification of task-based technologies for HPC. In our article, to appear in the Proceedings of the PPAM conference, we also demonstrated the usefulness of our taxonomy by classifying some of the state-of-the-art task-based environments in use today.

M18 review

5 deliverables are produced:

  1. Deliverable 2.1 - "Requirement Specifications and Reports on External Technological Developments (a)" presents the AllScale environment requirements. The link with the project use cases is well demonstrated and it shows a good interaction with WP6.

  2. Deliverable 2.2 - "Requirement Specifications and Reports on External Technological Developments (b)"

  3. Deliverable 2.3 - "AllScale System Architecture” presents the overall AllScale System architecture and again the project’s use cases played a meaningful role in the final design.

  4. Deliverable 2.5 - "AllScale API Specification (a)" is well written with a noticeable user oriented approach.

  5. Deliverable 2.6 - "AllScale API Specification (b)"

and the first two milestones have been successfully reached, as follows:

  1. MS1 “Specification and design of the AllScale environment”

  2. MS2 “Early prototype of the AllScale environment and pilot applications”

The “continuous” interaction with other workpackages and in particular WP6 (project’s use cases) has been very beneficial and lead to the development of prototypes (MS2) that will help to make the necessary adjustments of the overall AllScale system.

Work in progress

WP2 has only one ongoing task – the update of the AllScale system architecture design.

Final results envisaged   

By the end of the project, we envisage to have the final definition of requirements and a solid overview of external technological developments; complete AllScale API specification; complete detailed software architecture of the AllScale Environment comprising all its components, their interfaces and responsibilities.

External Links

A Taxonomy of Task-Based Parallel Programming Technologies for High-Performance Computing :


 

WP3: High-level parallel API and API-aware optimising source-to-source compiler

Objectives

WP3 covers the incremental research and development operations in the context of the AllScale compiler. It develops the Core and User API layers to facilitate the recursive description of algorithms and to establish the compiler awareness regarding the parallel structure of processed programs. Furthermore, research is done on automated code optimisation techniques, as well as multi-versioning solutions enabling the automated porting of input codes to distributed memory systems and accelerators. It facilitates the research of compiler-aided dynamic workload and data management techniques. Furthermore, explores the compiler aided check-pointing techniques to be utilised by the resilience component.

Deliverable

Title

Date

D3.1

API implementation for recursive parallelism (a)

M6

D3.2

API implementation for recursive parallelism (a)

M21

D3.3

AllScale Compiler prototype

M15

D3.4

Full AllScale Compiler prototype

M31

Results produced

Over the past year, the API and compiler components have progressed towards the full implementation of the AllScale system. Highlights of this progress include:

  • Full feature-completeness in the AllScale API for the iPic3D pilot.

  • Over a factor of 10 compilation performance improvement for large-scale C++ codes. This was achieved by various algorithmic improvements in core components of the compiler.

  • A significant extension to the unit testing coverage for all compiler components.

  • Extensions to the AllScale compiler frontend, backend and core leading to compilation of the iPic3D pilot.

  • Implementation of many features to improve user interaction, including interactive HTML reporting, as well as analysis debugging tools.

In the progress of this work, close collaboration with WP4 was required in order to finalize and tweak the interface between the API and generated compiler output code on the one side and the AllScale runtime system on the other side. Also, the continuous development of the AllScale API was carried out based on feedback from the pilot application groups within WP6.

The first AllScale compiler prototype was delivered in project month 15, and the revised API specification and implementation has been available on github for several months. Based on this progress, project milestone 2 and 3 were achieved.

Figure 1: User Interface Snapshot

The image above shows a snapshot of the conversion report generated by the AllScale compiler. It offers insight into the conversion of each parallel recursive execution block, including analysis reports. These might e.g. provide the user with information about why a specific parallel region could not be automatically ported to distributed memory.

Figure 2: Unit, Integration and Distribution Testing

As the API and compiler are central to the project, and should maintain stable on several configurations throughout very active development, we are using a relatively large-scale testing infrastructure. The screenshot above gives a current overview on all the integration testing jobs for the AllScale compiler and API.

M18 review

2 deliverables are produced:

  1. Deliverable D3.1 describes in generic terms the first AllScale User and Core API implementation and the associated source-code is maintained in GitLab. D3.1 was evaluated during the first interim review and no major changes were asked. However, the deliverable was updated following the reviewer’s feedbacks.

  2. Deliverable D3.3 is a very short deliverable and corresponds to the description of the AllScale compiler prototype. The compiler prototype is freely available on github.


 

WP4: Unified runtime system for extreme scale systems

Objectives

WP4 covers the research and development of the scalable AllScale runtime system for any scale of parallel computing systems to effectively support dynamic, nested (recursive) parallelism as well as providing sophisticated support for data migration. The latter serves as the foundation for both dynamic load balancing across address space boundaries, as well as resiliency support by moving data from and to persistent storage devices. Furthermore, the innovative techniques to incorporate alternative parallel code versions – targeting specific hardware (e.g. accelerators) and multi-objective auto-tuning, are applied.

Deliverable

Title

Date

D4.1

AllScale runtime system interface specification

M6

D4.2

Early AllScale runtime system prototype

M14

D4.3

AllScale runtime system monitoring infrastructure

M30

D4.4

Data management support

M30

D4.5

Resource management support

M30

D4.6

Multi-Objective Dynamic Optimiser (a)

M16

D4.7

Multi-Objective Dynamic Optimiser (b)

M32

Relationship with other WP-s

WP4 is the backbone of the AllScale Project. While the Pilot applications (WP6) don't directly interact with the Runtime system, their code get's compiled (WP3) using the outcome of WP4, as a result. WP5 is as well heavily depending on the runtime and uses the provided interfaces directly

M18 review

3 deliverables are produced:

Deliverable D4.1 “Allscale runtime system interface specification” has been evaluated during the first interim review and clearly describes the runtime interface specification with an emphasis on the strategies developed for optimizing task-based recursive parallelism. There is an effective collaboration with WP5 to include resilience strategies into the decisions of the scheduler.

The Allscale runtime system interface has been presented at the workshop “MTAGS16: 9th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers” co-located with the Supercomputing Conference 2016 in Salt Lake City.

The first prototype has been developed and thanks to good interaction with other workpackages, further refinements have been done. There is a noticeable effort to reach standardization bodies in the hope to influence on-going C++ standardization to match with the project runtime system development needs.

No update is available on the status of Data Management Infrastructure within the deliverables, but an update has been given during the review meeting, showing no deviations to the DoA.

Work in progress

In WP4 currently working towards optimizing for pilot applications as well as towards distributed memory systems. The first results are promising.


 

WP5: Cross layer resilience and online analysis for non-functional parameters

Objectives

WP5 develops language and tool support for continuous monitoring of application performance and error resilience, as well as support for application-specific, algorithmic error detection and recovery from both errors and performance. The activity in the WP has been divided into 5 tasks, which work together to deliver the objectives.

T5.1 Application Specific Resilience Analysis

T5.2 On-demand, on-line introspection of non-functional parameters

T5.3 Resilience Primitives

T5.4 Development of application specific resilience techniques

T5.5 Resilience manager

A key objective is to develop scalable event detection and notification mechanisms that attempt to localise the impact of execution-time anomalies due to errors or performance variability in hardware.

Deliverable

Title

Date

D5.1

Application Specific Resilience Strategies

M6

D5.2

On-Demand, On-Line Monitoring Infrastructure (a)

M16

D5.3

On-Demand, On-Line Monitoring Infrastructure (b)

M32

D5.4

Resilience Primitives

M20

D5.5

Implementation and Evaluation of Application Specific Resilience Techniques (a)

M16

D5.6

Implementation and Evaluation of Application Specific Resilience Techniques (b)

M30

D5.7

Resilience Manager

M30

Relationship with other WPs

WP5 has direct links with the technical work packages in the project.  It contributes to WP2, defining the overall system architecture for AllScale. In addition the code implemented in WP6 links to the compiler prototype and incorporation of resilience and monitoring into the compiler.  Furthermore, WP6 is inextricably linked to the research and development carried out in WP4, the overarching activity which extends the HPX runtime system in order to create the AllScale runtime system prototype.

These linkages are clearly reflected in the system architecture diagram, which reflects the layered architecture of the AllScale system. The WP5 tasks correlate to the purple shaded boxes to the middle right of the figure.  WP5 is implicitly linked to the application exemplars in WP6 through the use of the compiler toolchain and runtime system to enable the execution of these applications. Thus the resilience and monitoring components are not directly visible to the application, but these components nevertheless influence the execution of the applications.

Results imported

There is considerable linkage between the other tech WPs in the project and WP5.

Results produced

The shared memory version of the monitoring component is in place, providing real-time introspection data as well as post-mortem profiles and performance traces. One paper about this first prototype has been submitted to the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2018).

At the outset the research in this WP6 considered exploring self-stabilising and self-healing algorithms for the AllScale pilot applications, however this work has not been pursued.

Work continues on a simulator to explore our proposed local recovery strategy, which is based on recovery from local checkpoints and full re-computation. There are several new results and a paper is being prepared to record this. This work is part of the evaluation of the cost of resilience strategies in WP5.

The implementation of the failure detector, within the HPX multi-threaded environment is raising a number of issues. Two lengthy debugging sessions, supported by Thomas Heller, took place. One issue was found in the AllScale runtime (in treeture implementation), and one issue in HPX (sanitizer reports errors in shutdown). At the time of writing this document, the current detector does not work for the fibonnaci test case, showing non-deterministic failures such as:

  1. srun examples/fibonacci (standard run, fib(10), 1 iteration) never computes successfully.

  2. srun examples/fibonacci 20 100 (fib(20, 100 iterations)) completes all computations, but crashes at shutdown.

These unresolved bugs represent a potential showstopper at this time. Alternative algorithms may need to be investigated. The root of present problems appears to be that all threads have the same priority in HPX and so along running application thread can block a heartbeat thread from execution.

M18 review

Deliverable D5.1 “Application-specific resilience strategy” has been produced. It contains the detailed results of the questionnaire gathering resilience requirement from the 3 pilot applications. The results show that application developers lack knowledge and understanding of what a fault is and what the benefits and possible approaches of application-specific resilience are. It therefore clearly limits the benefit of the questionnaire and its ability to properly gather user requirements and demonstrates the need to better explain the work of AllScale on resiliency aspects. This lack of understanding should be taken into account for the dissemination of the project results.

Work in progress

This report is being written midway through M27 of the project. The start and end dates for each project task are show below

M2 – M6 T5.1 Application Specific Resilience Analysis

M7 – M30 T5.2 On-demand, on-line introspection of non-functional parameters

M7 – M20 T5.3 Resilience Primitives

M7 – M30 T5.4 Development of application specific resilience techniques

M21- M32 T5.5 Resilience manager

Thus, tasks 5.1 and 5.3 are completed and tasks 5.2 and 5.4 are nearing completion. Task 5.5 draws together much of the work to produce executable software. Intense activity is taking place in this task at the present time.

There is a close coupling between the resilience manager and the scheduling component. For example, when a task dies, the scheduler can no longer activate that task and should be informed of the task state by the resilience manager. Furthermore, while the scheduler is being actively developed there are bugs in the code, which can impact testing of the resilience manager. The scheduler work performed in WP6 is built upon HPX.

Thomas Heller has built an extension to the HPX scheduler, reliant on treetures, making that component specific to the recursive decomposition in AllScale. However WP6, led by IBM, is developing more advanced scheduling on top of the AllScale treeture based scheduler mention in the last sentence.  To date the issues faced by the WP5 resilience manager relate to the treeture based component and not the more advanced WP6 components.

A first prototype of the distributed monitoring component has been implemented. This prototype is currently being tested and improved. There is also on-going work testing the tracing capabilities of the shared memory (non-distributed) version of the monitoring component with the iPIC3D pilot applications.

Final results envisaged   

At the end of the project the software generated by WP6 will be instantiated as a component within the complete toolchain and runtime that the project will be deliver. While these components will be identifiable, they will be intrinsically linked to the templates and APIs used the AllScale system. This means that the components will not immediately be available for use in other runtime environments without significant porting efforts. Nevertheless the components will enable substantial experiments to be conducted using the AllScale system and so contribute to results suitable for academic publication.


 

WP6: Integration, testing and pilot applications

Objectives

WP6 establishes the AllScale computing infrastructure comprising a range of computing systems provided by various partners and prepares pilot applications that are used to drive the development, tuning and proof of value of the AllScale environment. WP6 has been divided into 4 tasks, which work together to deliver the targeted objectives.

T6.1 AllScale computing infrastructure deployment and management

T6.2 Preparation and porting of pilot applications

T6.3 Assembling the AllScale environment and pilot applications

T6.4 Testing, evaluation and tuning of the AllScale environment with the pilot applications

Deliverable

Title

Date

D6.1

AllScale computing infrastructure

M18

D6.2

iPIC3D implicit particle-incell code for space weather applications preparation and evaluation (a)

M18

D6.3

iPIC3D implicit particle-incell code for space weather applications preparation and evaluation (b)

M31

D6.4

Fine/Open large industrial unsteady CFD simulations preparation and evaluation (a)

M18

D6.5

Fine/Open large industrial unsteady CFD simulations preparation and evaluation (b)

M31

D6.6

AMDADOS Deepwater Horizon application preparation and evaluation (a)

M18

D6.7

AMDADOS Deepwater Horizon application preparation and evaluation (b)

M31

D6.8

Installation, integration, and deployment of the AllScale environment and pilot applications (a)

M16

D6.9

Installation, integration, and deployment of the AllScale environment and pilot applications (b)

M32

D6.10

Testing, evaluation and tuning of the AllScale environment with the pilot applications

M36

 

Relationship with other WPs

WP6 has direct links with the complete AllScale toolchain with ultimate focus on testing, evaluation and tuning of the AllScale environment with the pilot applications.  As a starting point, this WP involves all partners to identify characteristics of partners’ computing infrastructure and the desired external computing infrastructure to enable successful ExaScale implementation. Interaction with WP2 focused on providing user requirements from the 3 pilot applications to guide architecture and system design of the AllScale environment. Deliverables 2.1 and 2.2 summarise these requirements.

As part of Task 5.1, partners liaised with WP5 to analyse pilot applications and identify the most appropriate resiliency strategies for AllScale developments. Objectives included :

  1. Understand the vulnerability of application tasks and data structures and

  2. Design application-independent checkpointing strategies.

Results imported

AllScale selected existing industry standard pilot applications codes from different domains (Space-weather, CFD, environment).

Results produced

WP6 results cover 3 different areas:

  1. Requirements and infrastructure,

  2. Pilot application development and porting and

  3. Testing and evaluation of the AllScale environment.

Deliverable 6.1 worked with partners to identify available computing infrastructure and necessary external supercomputers. Further, appropriate continuous integration workflow was implemented to enable efficient development of the AllScale environment and pilot applications (IPIC3D, Fine/Open and AMDADOS. These developments are described in detail in deliverables 6.2, 6.4 and 6.6 respectively. In January 2017 (M16), the first successful implementation of the AllScale environment was achieved to reach MS2 of the project. Deliverable 6.8 details the status and structure of the achieved installation, integration, and deployment of the AllScale environment and pilot applications. This was complemented by an advanced prototype of the AllScale environment in M24.

M18 review

5 deliverables were produced.

Deliverable D6.1 gives an overview of the computing infrastructure available for the AllScale project.

Deliverable D6.2 describes the work done with one use case iPIC3D;

Deliverable D6.4 provides information on the work done with CFD code FINE while deliverable D6.6 with AMADOS.

Very often a new paradigm is used with a completely new algorithm and it would be valuable to understand how much easier/faster it is to implement them with the AllScale API compared to already widely used programming models. This aspect is important, especially for the exploitation of the project results: SMEs or more generally commercial companies would be more interested in using the new AllScale paradigm if they can save a substantial amount of time (and therefore money) and however reach Exascale too.

In all cases, scalability results are for now very limited and constraint to shared-memory architectures. The impact of further optimizations in the AllScale runtime, the implementation of the pilot applications and the implementation of the appropriate structures in the User API should be evaluated.

Deliverable D6.8 describes the technical infrastructure in place to share and develop the various parts of the software stack.

Work in progress

This report is being written midway through M27 of the project. The start and end dates for each project task are show below

M4 – M36 T6.1 AllScale computing infrastructure deployment and management

M4 – M32 T6.2 Preparation and porting of pilot applications

M15 – M36 T6.3 Assembling the AllScale environment and pilot applications

M16– M36 T6.4 Testing, evaluation and tuning of the AllScale environment with the pilot applications

Current work is focused on assembling the AllScale environment and pilot applications and testing of this with the pilot applications

Final results envisaged   

Developing an efficient and effective computing environment such as AllScale requires continuous testing from both software and hardware perspective together with high quality demonstrations of its capabilities. WP6 aims on providing such for AllScale. The final goal is to provide ExaScale level demonstrations of the AllScale environment across a number of high value, simulation domains (space-weather, CFD and environment).

 

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671603

Contact Details

General Coordinator

Thomas Fahringer

Scientific Coordinator

Herbert Jordan