The Power of Many: Running Many Simulations on Many : Dr. Shantenu Jha, Rutgers University (57 mins, ~28 MB)

Released Tuesday, 2nd April 2013

Good episode? Give it some love!

The Power of Many: Running Many Simulations on Many : Dr. Shantenu Jha, Rutgers University (57 mins, ~28 MB)

Tuesday, 2nd April 2013

Good episode? Give it some love!

Rate Episode

There are several important science and engineering problems that require the coordinated execution of multiple high-performance simulations. Some common scenarios include but are not limited to, "an ensemble of tasks", "loosely-coupled simulations of tightly-coupled simulations" or "multi-component multi-physics simulations". However, historically supercomputing centers, have supported and priortised the execution of single "jobs" on supercomputers. Not suprisingly, the tools and capabilities to support coordinated multiple simulations are limited.

A promising way to overcome this common limitation is the use of a Pilot-Job --- which can be defined as a container or placeholder job to provide multi-level scheduling via an application-level scheduling overlay over the system scheduler. We discuss both the theory and practise of Pilot-Jobs: Specifically, we introduce the P* Model of Pilot-Jobs and present "BigJob" as a SAGA-based extensible, interopable and scalable implementation of the P* Model. We then discuss several science problems that have/are using BigJob to execute multiple simulations at unprecedented scales on a range of supercomputers and distributed supercomputing infrastructure such as XSEDE.

This talk was given as part of our MSc in HPC's 'HPC Ecosystem' course.