Webinar Series: Best Practices for HPC Software Developers

This series of webinars will present best practices that will help users of HPC systems carry out their software development more productively. Initial topics include:

What All Codes Should Do: Overview of Best Practices in HPC Software Development

Developing, Configuring, Building, and Deploying HPC Software

Distributed Version Control and Continuous Integration Testing

Testing and Documenting your Code

How the HPC Environment is Different from the Desktop (and Why)

An Introduction to High-Performance Parallel I/O

Basic Performance Analysis and Optimization — An Ant Farm Approach

The webinars will occur at approximately two week intervals and last about one hour each. Audience questions and discussion will be encouraged. The sessions are independent, so join any or all.

What’s New? This updated announcement includes schedule details and abstracts for the complete series of webinars. Please note that the schedule and order of presentations has changed slightly from the original plan. We have also provided notes about participation and follow-up. (Last updated 2016-06-11.)

Who should attend: This series is designed for HPC software developers who are seeking help in increasing their team’s productivity, as well as facility staff who interact extensively with users.

Organizers: This series is a joint endeavor of the IDEAS scientific software productivity project (http://ideas-productivity.org) and the ALCF, NERSC, and OLCF computing facilities. For further information, please contact Fernanda Foertter, 865-576-9391.

Registration: https://www.olcf.ornl.gov/training-event/webinar-series-best-practices-for-hpc-software-developers/

Note: A single registration will suffice to receive announcements for all sessions in the series.

Participation Details: We will be using Zoom to host these webinars. Connection information (different for each session) will be included in the reminders sent to all registrants. If you are unable to connect to Zoom with your browser, you can download the presentation slides from the registration web site, dial in to Zoom from your phone for audio, and follow along.

Follow-Up: Presentation slides and recordings of the sessions will be posted on the training web site (same URL as for registration). We encourage follow-up discussions on the material presented on the cse-forum@cse-software.org mailing list. Please see http://cse-software.org to join the mailing list.

[tab:] [tab:

Session 1: What All Codes Should Do: Overview of Best Practices in HPC Software Development

Slides : PDF

Video: YouTube

Date : Wednesday, May 4, 2016

Time: 1:00-2:00 pm ET

Presenter: Anshu Dubey, ANL, http://www.mcs.anl.gov/person/anshu-dubey

Description: Scientific code developers have increasingly been adopting software processes derived from the mainstream (non-scientific) community. Software practices are typically adopted when continuing without them becomes impractical. However, many software best practices need modification and/or customization, partly because the codes are used for research and exploration, and partly because of the combined funding and sociological challenges. This presentation will describe the lifecycle of scientific software and important ways in which it differs from other software development. We will provide a compilation of software engineering best practices that have generally been found to be useful by science communities, and we will provide guidelines for adoption of practices based on the size and the scope of the project.

Session 2: Developing, Configuring, Building, and Deploying HPC Software Developing, Configuring, Building, and Deploying HPC Software

Slides : PPTX

Video: YouTube

Date: Wednesday, May 18, 2016

Time: 1:00-2:00 pm ET

Presenter : Barry Smith, ANL, http://www.mcs.anl.gov/person/barry-smith

Description: The process of developing HPC software requires consideration of issues in software design as well as practices that support the collaborative writing of well-structured code that is easy to maintain, extend, and support. This presentation will provide an overview of development environments and how to configure, build, and deploy HPC software using some of the tools that are frequently used in the community. We will also discuss ways in which these and other tools are best utilized by various categories of scientific software developers, ranging from small teams (for example, a faculty member and graduate students who are writing research code intended primarily for their own use) through moderate/large teams (for example, collaborating developers spread among multiple institutions who are writing publicly distributable code intended for use by others in the community).

Session 3: Distributed Version Control and Continuous Integration Testing

Slides : PDF

Video: YouTube

Date : Thursday, June 2, 2016

Time: 1:00-2:00 pm ET

Presenter: Jeff Johnson, LBNL, http://esd.lbl.gov/profiles/jeffrey-n-johnson/

Description: Recently, many tools and workflows have emerged in the software industry that have greatly enhanced the productivity of development teams. GitHub, a site that hosts projects in Git repositories, is a popular platform for open source and closed source projects. GitHub has encoded several best practices into easily followed procedures such as pull requests, which enrich the software engineering vocabularies of non-professionals and professionals alike. GitHub also provides integration to other services (for example, continuous integration such as Travis CI, which allows code changes to be automatically tested before they are merged into a master development branch). This presentation will discuss how to set up a project on GitHub, illustrate the use of pull requests to incorporate code changes, and show how Travis CI can be used to boost confidence that changes will not break existing code.

Session 4: Testing and Documenting your Code

Slides: PDF

Video: YouTube

Date: Wednesday, June 15, 2016

Time: 1:00-2:00 pm ET

Presenter : Alicia Klinvex, SNL http://www.cs.sandia.gov/cr-amklinv

Description: Software verification and validation are needed for high-quality and reliable scientific codes. For software with moderate to long lifecycles, a strong automated testing regime is indispensable for continued reliability. Similarly, comprehensive and comprehensible documentation is vital for code maintenance and extensibility. This presentation will provide guidelines on testing and documentation that can help to ensure high-quality and long-lived HPC software. We will present methodologies, with examples, for developing tests and adopting regular automated testing. We also will provide guidelines for minimum, adequate, and good documentation practices depending on the available resources of the development team.

Session 5: How the HPC Environment is Different from the Desktop (and Why)

Slides: to be posted

Video: to be posted

Date: Thursday, July 14, 2016

Time: 1:00-2:00pm ET

Presenter: Katherine Riley, ALCF, https://www.alcf.anl.gov/staff-directory/katherine-riley

Description: High performance computing has transformed how science and engineering research is conducted. Answering a question in 30 minutes that used to take 6 months can quickly change the way one asks questions. Large computing facilities provide access to some of the world’s largest computing, data, and network resources in the world. Indeed, the DOE complex has the highest concentration of supercomputing capability in the world. However, by nature of their existence, making use of the largest computers in the world can be a challenging and unique task. This talk will discuss how supercomputers are unique and explain how that impacts their use.

Session 6: An Introduction to High-Performance Parallel I/O

Slides: to be posted

Video: to be posted

Date: Thursday, July 28, 2016

Time: 1:00-2:00pm ET

Presenter: Feiyi Wang, ORNL|

Presenter Bio: Feiyi Wang received his Ph.D. in Computer Engineering from North Carolina State University (NCSU). Before he joined Oak Ridge National Laboratory as research scientist, he worked at Cisco Systems and Microelectronic Center of North Carolina (MCNC) as a lead developer and principal investigator for several DARPA-funded projects. His current research interests include high performance storage system, parallel I/O and file systems, fault tolerance and system simulation, and scientific data management and integration. Dr. Wang is a Joint Faculty Professor at EECS Department of University of Tennessee and a senior member of IEEE.

Description: Parallel data management is a complex problem at large-scale HPC environments. The HPC I/O stack can be viewed as a multi-layered cake and presents an high-level abstraction to the scientists. While this abstraction shields the users from many of the I/O system details, it is very hard to obtain parallel I/O performance or functionality without understanding the end-to-end hierarchical I/O stack in today’s modern complex HPC environments. This talk will introduce the basic parallel I/O concepts and will provide guidelines on obtaining better I/O performance on large-scale parallel platforms.

Session 7: Basic Performance Analysis and Optimization – An Ant Farm Approach

Slides: to be posted

Video: to be posted

Date: Tuesday, August 9, 2016

Time: 1:00-2:00pm ET

Presenter: Jack Deslippe, NERSC, http://www.nersc.gov/about/nersc-staff/application-performance/jack-deslippe/

Description: How is optimizing HPC applications like an Ant Farm? Attend this presentation to find out. We’ll discuss the basic concepts around optimizing code for the HPC systems of today and tomorrow. These systems require codes to effectively exploit both parallelism between nodes and an ever growing amount of parallelism on-node. We’ll discuss profiling strategies, tools (for profiling and debugging) and common issues with both internode communication and on-node parallelism. We will give an overview of traditional optimizations areas in HPC applications like parallel IO and MPI strong and weak scaling as well as topics relevant for modern GPU and many-core systems like threading, SIMD/AVX, SIMT and effectively using cache and memory hierarchies. The “Ant Farm” approach places a heavy emphasis on the roofline performance model and encouraging users to understand the compute, bandwidth and latency sensitivity of their applications and kernels through a series of easy to perform experiments and an easy to follow flow chart. Finally, we’ll discuss what we expect to change in the optimization process as we move towards exascale computers.

Brief announcement for facility weekly emails

NEW WEBINAR SERIES: BEST PRACTICES FOR HPC SOFTWARE DEVELOPERS.

These webinars will present best practices that will help users of HPC systems carry out their software development more productively. The webinar series is a collaboration of the IDEAS scientific software productivity project, ALCF, NERSC, and OLCF. The first webinar will take place on May 4. For more information or to register, please visit: https://www.olcf.ornl.gov/training-event/webinar-series-best-practices-for-hpc-software-developers/