Installing Guix on a cluster

Ludovic Courtès — November 23, 2017

Previously we discussed ways to use Guix-produced packages on a cluster where Guix is not installed. In this post we look at how a cluster sysadmin can install Guix for system-wide use, and discuss the various tradeoffs.

Setting up a “master” node

The recommended approach is to set up one master node running guix-daemon and exporting /gnu/store over NFS to compute nodes.

Remember that guix-daemon is responsible for spawning build processes and downloads on behalf of clients, and more generally accessing /gnu/store, which contains all the package binaries built by all the users. “Client” here refers to all the Guix commands that users see, such as guix package. On a cluster, these commands may be running on the compute nodes and we’ll want them to talk to the master node’s guix-daemon instance.

To begin with, the master node can be installed following the binary installation instructions, which should be straightforward.

Since we want guix-daemon to be reachable not just from the master node but also from the compute nodes, we’ll use the new TCP transport recently added as part of the Guix-HPC effort and part of the forthcoming 0.14.0 release:

root@master# vi /etc/systemd/system/guix-daemon.service

and from there we’ll add --listen arguments to the ExecStart line:

ExecStart=/var/guix/profiles/per-user/root/guix-profile/bin/guix-daemon --build-users-group=guixbuild --listen=/var/guix/daemon-socket/socket --listen=0.0.0.0

The --listen=0.0.0.0 bit means that guix-daemon will process all incoming TCP connections on port 44146. This is usually fine in a cluster setup where the master node is reachable exclusively from the cluster’s LAN—you don’t want that to be exposed to the Internet!

The next step is to define our NFS exports in /etc/exports by adding something along these lines:

/gnu/store  *(ro)
/var/guix   *(rw, async)

The /gnu/store directory can be exported read-only since only guix-daemon on the master node will ever modify it. /var/guix contains user profiles as managed by guix package; thus, to allow users to install packages with guix package, this must be read-write.

Users can create as many profiles as they like in addition to the default profile, ~/.guix-profile. For instance, guix package -p ~/dev/python-dev -i python installs Python in a profile reachable from the ~/dev/python-dev symlink. To make sure that this profile is protected from garbage collection—i.e., that Python will not be removed from /gnu/store while this profile exists—, home directories should be mounted on the master node as well so that guix-daemon knows about these non-standard profiles and avoids collecting software they refer to.

It may be a good idea to periodically remove unused bits from /gnu/store by running guix gc. This can be done by adding a crontab entry on the master node:

root@master# crontab -e

… with something like this:

# Every day at 5AM, run the garbage collector to make sure
# at least 10 GB are free on /gnu/store.
0 5 * * 1  /var/guix/profiles/per-user/root/guix-profile/bin/guix gc -F10G

We’re done with the master node! Let’s look at compute nodes now.

Setup on compute nodes

First of all, we need to tell guix to talk to the daemon running on our master node, by adding these lines to /etc/profile:

GUIX_DAEMON_SOCKET="guix://master.guix.example.org"
export GUIX_DAEMON_SOCKET

To avoid warnings and make sure guix uses the right locale, we need to tell it to use locale data provided by Guix:

GUIX_LOCPATH=/var/guix/profiles/per-user/root/guix-profile/lib/locale
export GUIX_LOCPATH

# Here we must use a valid locale name.  Try "ls $GUIX_LOCPATH/*"
# to see what names can be used.
LC_ALL=fr_FR.utf8
export LC_ALL

For convenience, guix package automatically generates ~/.guix-profile/etc/profile, which defines all the environment variables necessary to use the packages—PATH, C_INCLUDE_PATH, PYTHONPATH, etc. Thus it’s a good idea to source it from /etc/profile:

GUIX_PROFILE="$HOME/.guix-profile"
if [ -f "$GUIX_PROFILE/etc/profile" ]; then
  . "$GUIX_PROFILE/etc/profile"
fi

Last but not least, Guix provides command-line completion notably for Bash and zsh. In /etc/bashrc, consider adding this line:

. /var/guix/profiles/per-user/root/guix-profile/etc/bash_completion.d/guix

Voilà!

You can check that everything’s in place by logging in on a compute node and running:

guix package -i hello

The daemon on the master node should download pre-built binaries on your behalf and unpack them in /gnu/store, and guix package should create ~/.guix-profile containing the ~/.guix-profile/bin/hello command.

Network access

Guix requires network access to download source code and pre-built binaries. The good news is that only the master node needs that since compute nodes simply delegate to it.

It is customary for cluster nodes to have access at best to a white list of hosts. Our master node needs at least mirror.hydra.gnu.org in this white list since this is where it gets pre-built binaries from, for all the packages that are in Guix proper.

Incidentally, mirror.hydra.gnu.org also serves as a content-addressed mirror of the source code of those packages. Consequently, it is sufficient to have only mirror.hydra.gnu.org in that white list.

Software packages maintained in a separate repository like that of Inria or that of MDC/BIMSB of course isn’t mirror on mirror.hydra.gnu.org. For these packages, the situation is different. One solution is to run your own mirror on the local network. Another solution, as a last resort, is to let users download source on their workstation and add it to the cluster’s /gnu/store, like this:

workstation$ GUIX_DAEMON_SOCKET=ssh://compute-node.example.org \
  guix download http://starpu.gforge.inria.fr/files/starpu-1.2.3/starpu-1.2.3.tar.gz

The above command downloads starpu-1.2.3.tar.gz and sends it to the cluster’s guix-daemon instance over SSH.

Air-gapped clusters require more work. At the moment, our suggestion would be to download all the necessary source code on a workstation running Guix. For instance, using the --sources option of guix build, the example below downloads all the source code the openmpi package depends on:

$ guix build --sources=transitive openmpi

…

/gnu/store/xc17sm60fb8nxadc4qy0c7rqph499z8s-openmpi-1.10.7.tar.bz2
/gnu/store/s67jx92lpipy2nfj5cz818xv430n4b7w-gcc-5.4.0.tar.xz
/gnu/store/npw9qh8a46lrxiwh9xwk0wpi3jlzmjnh-gmp-6.0.0a.tar.xz
/gnu/store/hcz0f4wkdbsvsdky3c0vdvcawhdkyldb-mpfr-3.1.5.tar.xz
/gnu/store/y9akh452n3p4w2v631nj0injx7y0d68x-mpc-1.0.3.tar.gz
/gnu/store/6g5c35q8avfnzs3v14dzl54cmrvddjm2-glibc-2.25.tar.xz
/gnu/store/p9k48dk3dvvk7gads7fk30xc2pxsd66z-hwloc-1.11.8.tar.bz2
/gnu/store/cry9lqidwfrfmgl0x389cs3syr15p13q-gcc-5.4.0.tar.xz
/gnu/store/7ak0v3rzpqm2c5q1mp3v7cj0rxz0qakf-libfabric-1.4.1.tar.bz2
/gnu/store/vh8syjrsilnbfcf582qhmvpg1v3rampf-rdma-core-14.tar.gz
…

(In case you’re wondering, that’s more than 320 MiB of compressed source code.)

We can then make a big archive containing all of this:

$ guix archive --export \
    `guix build --sources=transitive openmpi` \
    > openmpi-source-code.nar

… and we can eventually transfer that archive to the cluster on removable storage and unpack it there:

$ guix archive --import < openmpi-source-code.nar

This process has to be repeated every time new source code needs to be brought to the cluster.

As we write this, the research institutes involved in Guix-HPC do not have air-gapped clusters though. If you have experience with such setups, we would like to hear feedback and suggestions.

Disk usage

A common concern of sysadmins’ is whether this is all going to eat a lot of disk space. If anything, if something is going to exhaust disk space, it’s going to be scientific data sets rather than compiled software. With more than three years of experience running Guix on the cluster of the Max Delbrück Center, Ricardo Wurmus notes that disk usage does grow, but that overall Guix’s store is not a major contributor. Nevertheless, it’s worth taking a look at how Guix contributes to disk usage.

First, having several versions or variants of a given package in /gnu/store does not necessarily cost much, because guix-daemon implements deduplication of identical files, and package variants are likely to have a number of common files.

As mentioned above, we recommend having a cron job to run guix gc periodically, which removes unused software from /gnu/store. However, there’s always a possibility that users will keep lots of software in their profiles, or lots of old generations of their profiles, which is “live” and cannot be deleted from the viewpoint of guix gc.

The solution to this is for users to regularly remove old generations of their profile. For instance, the following command removes generations that are more than two-month old:

$ guix package --delete-generations=2m

Likewise, it’s a good idea to invite users to regularly upgrade their profile, which can reduce the number of variants of a given piece of software stored in /gnu/store:

$ guix pull
$ guix package -u

As a last resort, it is always possible for sysadmins to do some of this on behalf of their users. Nevertheless, one of the strengths of Guix is the freedom and control users get on their software environment, so we strongly recommend leaving users in control.

Security considerations

On an HPC cluster, Guix is typically used to manage scientific software. Security-critical software such as the operating system kernel and system services such as sshd and the batch scheduler remain under control of sysadmins.

The Guix project has a good track record delivering security updates in a timely fashion. To get security updates, users have to run guix pull && guix package -u.

Because Guix uniquely identifies software variants, it is easy to see if a vulnerable piece of software is in use. For instance, to check whether the glibc 2.25 variant without the mitigation patch against “Stack Clash”, one can check whether user profiles refer to it at all:

$ guix gc --referrers /gnu/store/…-glibc-2.25

This will report whether profiles exist that refer to this specific glibc variant.

Summary

Guix can readily be installed cluster-wide on a cluster. The task primarily involves installing Guix on a master node and exporting /gnu/store and /var/guix over NFS to compute node, and possibly augmenting the firewall’s white list to allow the master node to retrieve software binaries and source code.

This setup gives cluster users a great level of control over their computing environment. Users can reproduce the exact same environment on their laptop and on other clusters using Guix, which we think is key to the reproducibility of scientific experiments.

Acknowledgments

Thanks to Ricardo Wurmus at the Max Delbrück Center and to Julien Lelaurain at Inria for their feedback on an earlier draft of this post.

  • MDC
  • Inria
  • UBC