HPC Cloud

 

This task will support the deployment and management of cloud execution codes coming from the services of the upper layer (basically, the MADFs and certain end-user applications to be deployed in MSO4SC). The main activities of this task will be to:
  • Analyse the performance impact in supercomputers and the loose of flexibility in cloud environments for the target applications;
  • Implement simultaneous job submission mechanism to heterogeneous multi-cloud, cluster and supercomputer systems in a transparent way for the user, by adapting SLURM as needed;
  • Run multiparameter simulations and MPI development tasks in the cloud in such a way that can be moved to the HPC infrastructure in a transparent way to the user, and get good performance in the HPC Supercomputer;
  • Monitor of the cloud and HPC infrastructure providing the probes needed to show the information in the MSO Portal.

As a result, MSO4SC will provide a jobs and containers submission mechanism able to get the maximum benefit from HPC infrastructures, and a monitoring mechanism which will provide information about infrastructure status and resources available to the Orchestrator.

During the first year of the project there will be two HPC and cloud infrastructures available for the developing and execution of the simulations. ATOS will be providing access to additional HPC and cloud resources during the project, for high availability and increased performance to the users.

CESGA will provide access to FinisTerrae-II Supercomputer, a linux based cluster, with an Infiniband FDR low latency network interconnecting 317 computing nodes based on Intel Xeon Hasswell processors, 7712 cores in total. Together, these nodes are able to provide a computing power of 328 TFLOPS, 44,8 TB of RAM and 1,5 PB of disk capacity. The system includes a high performance parallel storage system able to achieve a speed of more than 20 GB/s.

SZE will provide support to the project with a linux based Plexi cluster consisting of 26 computing nodes, housing 312 Intel Xeon CPU cores, 12 nVidia Tesla GPUs with a total of 5888 GPU cores and 1,2TB RAM. The nodes are interconnected with Infiniband QDR and connected to a redundant storage with 12TB disk capacity.