Guix-HPC Activity Report, 2018
This document is also available as PDF (printable booklet).
This document is also available as PDF (printable booklet).
In the quest for truly reproducible workflows I set out to create an example of a reproducible workflow using GNU Guix, IPFS, and CWL. GNU Guix provides content-addressable, reproducible, and verifiable software deployment. IPFS provides content-addressable storage, and CWL describes workflows that can run on specifically supported backend hardware system. In principle, this combination of tools should be enough to provide reproducibility with provenance and improved security.
December 2018 the Akalin lab at the Berlin Institute of Medical Systems Biology (BIMSB) published a paper about a collection of reproducible genomics pipelines called PiGx that are made available through GNU Guix. The article was awarded third place in the GigaScience ICG-13 Prize. Representing the authors, Ricardo Wurmus was invited to present the work on PiGx and Guix in Shenzhen, China at ICG-13.
Ricardo urged the audience of wet lab scientists and bioinformaticians to apply the same rigorous standards of experimental design to experiments involving software: all variables need to be captured and constrained. To demonstrate that this does not need to be complicated, Ricardo reported the experiences of the Akalin lab in building a collection of reproducibly built automated genomics workflows using GNU Guix.
Due to technical difficulties the recording of the talk was lost, so Ricardo re-recorded the talk a few weeks later.
Version 0.16.0 of Guix was released yesterday. It’s slated to be the last release before 1.0, and as usual, it brings noteworthy packages and features for HPC and reproducible research.
Version 0.15.0 of Guix was released today. As usual, it brings packages and features that we hope HPC users and sysadmins will enjoy. This release brings us close to our goals for 1.0, so it’s probably one of the last zero-dot-something releases.
I’m happy to announce that the bioinformatics group at the Max Delbrück Center that I’m working with has released a preprint of a paper on reproducibility with the title Reproducible genomics analysis pipelines with GNU Guix.
Guix follows a transparent source/binary deployment model: it will
download pre-built binaries when they’re available—like apt-get
or
yum
—and otherwise falls back to building from source. Most of the
time the project’s build farm provides binaries so that users don’t have
to spend resources building from source. Pre-built binaries may be
missing when you’re installing a custom package, or when the build farm
hasn’t caught up yet. However, deployment of binaries is often seen as
incompatible with high-performance requirements—binaries are “generic”,
so how can they take advantage of cutting-edge HPC hardware? In this
post, we explore the issue and solutions.
GNU Guix will be present at FOSDEM, the main yearly free software developer conference in Europe, and in particular in the HPC track.
Version 0.14.0 of Guix was announced yesterday. In this post we look at the many goodies that made it into Guix during this release cycle.
Previously we discussed ways to use Guix-produced packages on a cluster where Guix is not installed. In this post we look at how a cluster sysadmin can install Guix for system-wide use, and discuss the various tradeoffs.
In the previous
post, we saw that
Guix’s build daemon needs to run as root
, and for a good reason:
that’s currently the only way to create isolated build environments for
packages on GNU/Linux. This requirement means that you cannot use Guix
on a cluster where the sysadmins have not already installed it. In this
article, we discuss how to take advantage of Guix on clusters that lack
a proper Guix installation.
Guix is a good fit for multi-user environments such as clusters:
it
allows non-root users to install packages at will without interfering with each other.
However, a common complaint is that installing Guix requires administrator
privileges. More precisely, guix-daemon
, the system-wide daemon that
spawns package builds and downloads on behalf of
users,
must be running as root
.
This is not much of a problem on one's laptop but it surely makes it
harder to adopt Guix on an HPC cluster.
This post marks the debut of Guix-HPC, an effort to optimize GNU Guix for reproducible scientific workflows in high-performance computing (HPC). Guix-HPC is a joint effort between Inria, the Max Delbrück Center for Molecular Medicine (MDC), and the Utrecht Bioinformatics Center (UBC). Ludovic Courtès, Ricardo Wurmus, Roel Janssen, and Pjotr Prins are driving the effort in each of these institutes, each one focusing specific areas of interest within this overall Guix-HPC effort. Our institutes have in common that they are users of HPC, and that, as scientific research institutes, they have an interest in using reproducible methodologies to carry out their research.