- UIBK collaborates with Prof. Shajulin Benedict from St. Xavier's Catholic College of Engineering, Nagercoil, India on the topic of performance models for task parallel programs. The AllScale compiler is the basis for this collaboration.
The team collaborated on building a performance model for message passing parallel programs which used the AllSCale compiler and let to a published paper named "Philipp Gschwandtner, Alex Hirsch, Shajulin Benedict, Thomas Fahringer. Towards Automatic Compiler-assisted Performance and Energy Modeling for Message-Passing Parallel Programs". In proceedings of the 13th Workshop on Parallel Systems and Algorithms (PASA) 2018, accepted, Braunschweig, Germany, IEEE, 2018.
This colaboration with Prof. Shajulin is now focusing on mixed parallel programs written in MPI/OpenMP. Link
- On the topic of high-level C++ APIs and optimization for accelerators and GPU architectures, UIBK collaborates with Biagio Cosenza, senior researcher at the Technische Universitaet Berlin. The AllScale API design regarding GPU support is informed by the results of this collaboration.
- FAU collaborates with John Biddiscombe on improving the messaging layer for the AllScale Runtime by providing highly optimised routines for High Speed Interconnects.
FAU is actively engaged in the C++ Standardisation process by contributing to various key proposals regarding massive parallelism for C++. Those proposals include: P0443R1, P0361R0 and P0234R1.
FAU collaborates widely with various researchers, which happens on the premise of HPX, and covers the underlying parallel programming framework for AllScale, involving the following collaborators: Hartmut Kaiser (LSU) - John Biddiscombe (CSCS) - Rolf Rabenseifner (HLRS) - Dalvan Griebler (PUCRS) - Patrick Diehl (POLYMTL) - Michael Wong (Codeplay).
At QUB, AllScale is tightly collaborating with UniServer and NanoStreams projects, as shortly explained in the following text:
UniServer, UNIversal Micro-Server EcoSystem by Exceeding the Energy and Performance Scaling Boundaries, is funded by the EC under Call H2020-ICT-2015-4 of the ICT Theme. It is a three year project which develops a universal system architecture and software ecosystem for server targeting cloud data-centres as well as upcoming edge-computing markets. UniServer probes intrinsic hardware heterogeneity and exposes it to the software layers, which in turn adapt to the changing compute environment.
A key part of handling the hardware heterogeneity in UniServer is to adapt the runtime to be able to handle dynamic information on hardware performance margins, dependent on CPU frequencies and RAM voltages, and then to make the run-time fault aware. In AllScale the run-time in use is different from that used in UnIServer, namely AllScale is adapting and developing the HPX system. WP5 in AllScale is adding resilience to the AllScale run-time and in particular is concentrating on detecting node failures.
Overall the task faced by both projects has a great deal in common. The two project teams at QUB therefore have frequent discussions on algorithmic approaches that can be deployed in each case.
NanoStreams received funding from the European Community’s Seventh Framework Programme [FP7] under grant agreement no. 610509 and ran from September 2013 to April 2017. The project carried out research on energy efficient computing on streams of real-time data.
C language extensions for dataflow programming and a NanoStreams runtime environment were developed to handle dynamic concurrency throttling. The dataflow approach can be readily contrasted to the recursive nested parallelism used in AllScale. On the other hand, the thread allocation policy research provides some insight for gaining maximum performance, something that could be leveraged to an extent in the AllScale run-time.
The dynamic resource allocation policy in NanoStreams aims to improve both performance and energy consumption for a system. Applied to a single application, it saves energy consumption by throttling the number of threads to the minimum needed for meeting the Quality of Service (QoS) target, assuming it is hardware achievable. On a multi-application setup, it improves both energy and performance adjusting the number of threads per application to meet QoS targets of each application with the least number of threads needed.
NanoStreams defined application-related but platform-independent energy and performance metrics to allow for fair comparison in the context of data-centres. These metrics can be adopted, and perhaps extended, to cover the recursive nested parallelism that AllScale applies to the mathematical applications in the project. As such, it would provide a new way to compare AllScale performance on small and large systems.