This post is part of a series about trying to setup a gitlab runner based on
systemd-nspawn. I published the polished
result as nspawn-runner on GitHub.

Here I try to figure out possible ways of invoking nspawn for the prepare,
run, and cleanup steps of gitlab custom runners. The results might be
useful invocations beyond Gitlab’s scope of application.

I begin with a chroot which will be the base for our build environments:

debootstrap --variant=minbase --include=git,build-essential buster workdir

Fully ephemeral nspawn

This would be fantastic: set up a reusable chroot, mount readonly, run the CI
in a working directory mounted on tmpfs. It sets up quickly, it cleans up after
itself, and it would make prepare and cleanup noops:

mkdir workdir/var/lib/gitlab-runner
systemd-nspawn --read-only --directory workdir --tmpfs /var/lib/gitlab-runner "$@"

However, run gets run multiple times, so I need the side effects of run to
persist inside the chroot between runs.

Also, if the CI uses a large amount of disk space, tmpfs may get into trouble.

nspawn with overlay

Federico used –overlay
to keep the base chroot readonly while allowing persistent writes on a
temporary directory on the filesystem.

Note that using --overlay requires systemd and systemd-container from
buster-backports because of systemd bug #3847.

Example:

mkdir -p tmp-overlay
systemd-nspawn --quiet -D workdir 
  --overlay="`pwd`/workdir:`pwd`/tmp-overlay:/"

I can run this twice, and changes in the file system will persist between
systemd-nspawn executions. Great! However, any process will be killed at the
end of each execution.

machinectl

I can give a name to systemd-nspawn invocations using --machine, and it
allows me to run multiple commands during the machine lifespan using
machinectl and systemd-run.

In theory machinectl can also fully manage chroots and disk images in
/var/lib/machines, but I haven’t found a way with machinectl to start
multiple machines sharing the same underlying chroot.

It’s ok, though: I managed to do that with systemd-nspawn invocations.

I can use the --machine=name argument to systemd-nspawn to make it visible
to machinectl. I can use the --boot argument to systemd-nspawn to start
enough infrastructure inside the container to allow machinectl to interact
with it.

This gives me any number of persistent and named running systems, that share
the same underlying chroot, and can cleanup after themselves. I can run
commands in any of those systems as I like, and their side effects persist
until a system is stopped.

The chroot needs systemd and dbus for machinectl to be able to interact with it:

debootstrap --variant=minbase --include=git,systemd,systemd,build-essential buster workdir

Let’s boot the machine:

mkdir -p overlay
systemd-nspawn --quiet -D workdir 
    --overlay="`pwd`/workdir:`pwd`/overlay:/"
    --machine=test --boot

Let’s try machinectl:

# machinectl list
MACHINE CLASS     SERVICE        OS     VERSION ADDRESSES
test    container systemd-nspawn debian 10      -

1 machines listed.
# machinectl shell --quiet test /bin/ls -la /
total 60
[…]

To run commands, rather than machinectl shell, I need to use systemd-run
--wait --pipe --machine=name
, otherwise machined won’t forward the exit
code
. The result however is
pretty good, with working stdin/stdout/stderr redirection and forwarded exit
code.

Good, I’m getting somewhere.

The terminal where I ran systemd-nspawn is currently showing a nice getty for
the booted system, which is cute, and not what I want for the setup process of
a CI.

Spawning machines without needing a terminal

machinectl uses /lib/systemd/system/systemd-nspawn@.service to start
machines. I suppose there’s limited magic in there: start systemd-nspawn as a
service, use --machine to give it a name, and machinectl manages it as if
it started it itself.

What if, instead of installing a unit file for each CI run, I try to do the
same thing with systemd-run?

systemd-run 
  -p 'KillMode=mixed' 
  -p 'Type=notify' 
  -p 'RestartForceExitStatus=133' 
  -p 'SuccessExitStatus=133' 
  -p 'Slice=machine.slice' 
  -p 'Delegate=yes' 
  -p 'TasksMax=16384' 
  -p 'WatchdogSec=3min' 
  systemd-nspawn --quiet -D `pwd`/workdir 
    --overlay="`pwd`/workdir:`pwd`/overlay:/"
    --machine=test --boot

It works! I can interact with it using machinectl, and fine tune DevicePolicy
as needed to lock CI machines down.

This setup has a race condition where if I try to run a command inside the
machine in the short time window before the machine has finished booting, it
fails:

# systemd-run […] systemd-nspawn […] ; machinectl --quiet shell test /bin/ls -la /
Failed to get shell PTY: Protocol error
# machinectl shell test /bin/ls -la /
Connected to machine test. Press ^] three times within 1s to exit session.
total 60
[…]

systemd-nspawn has the option --notify-ready=yes that solves exactly this
problem:

# systemd-run […] systemd-nspawn […] --notify-ready=yes ; machinectl --quiet shell test /bin/ls -la /
Running as unit: run-r5a405754f3b740158b3d9dd5e14ff611.service
total 60
[…]

On nspawn’s side, I should now have all I need.

Next steps

My next step will be wrapping it all together in
a gitlab runner.