RPi5 Docker socket: upgrade to mTLS (2376) and wire the Terraform runner #33

Open
opened 2026-06-16 13:53:07 +01:00 by lyrathorpe · 1 comment
Owner

Background

Follow-up to #31 / PR #32, which added the lyrathorpe-rpi5 Docker host with the
daemon on plain TCP 2375, no TLS, protected only by a LAN-source nftables
rule. This issue tracks upgrading that to mutual TLS (mTLS) on 2376 and wiring
the client end into the OpenTofu/Terraform repo (lyrathorpe/Terraform), whose
Gitea Actions runner drives the Docker provider against this host.

Plain 2375 is root-equivalent to anyone who reaches the port. mTLS makes both
ends prove identity with certificates and removes the "trust the whole subnet"
assumption.

How the pieces connect today

  • Server: system/machine/RPi5/docker.nix — dockerd exposed on 2376/2375 via
    the systemd docker.socket ListenStream; firewall rule scoped to
    10.187.1.0/24.
  • Client: lyrathorpe/Terraform root main.tf has an empty
    provider "docker" {} (env-driven). .gitea/workflows/apply.yml and
    validate.yml set DOCKER_HOST: ${{ vars.DOCKER_HOST }} and run
    runs-on: ubuntu-latest, i.e. inside the act_runner's job containers. The
    Docker provider already honours DOCKER_HOST, and additionally
    DOCKER_TLS_VERIFY + DOCKER_CERT_PATH — so no provider-block change is
    required if those env vars are set.

Certificates (prerequisite for both ends)

  • Stand up a small CA (e.g. step-ca/cfssl/openssl). Keep the CA key
    offline.
  • Server cert for the Pi. SANs MUST include every name/address clients
    will use in DOCKER_HOST: lyrathorpe-rpi5, its FQDN if any, and its LAN
    IP (an IP SAN is required if DOCKER_HOST uses an IP). extendedKeyUsage = serverAuth.
  • Client cert for the Terraform runner. extendedKeyUsage = clientAuth.
  • Decide certificate lifetime + a rotation plan; document renewal and
    revocation (CRL or short-lived certs). These are secrets — never place
    private keys in the world-readable Nix store.

Server side — nixfiles (system/machine/RPi5/docker.nix)

  • Move dockerd from socket-activated plain TCP to a TLS listener. TLS can't
    be done via socket activation (fd://); dockerd must own the listener:
    daemon.settings.hosts = [ "unix:///run/docker.sock" "tcp://0.0.0.0:2376" ]
    plus tls = true; tlsverify = true; tlscacert/tlscert/tlskey = <paths>.
  • Reconcile the hosts/socket conflict (the tricky bit): daemon.json
    hosts conflicts with the unit's -H fd://. Either drop the -H fd://
    from systemd.services.docker (ExecStart override) or disable
    systemd.sockets.docker so dockerd owns all listeners. Pick one; document
    why. Remove the current systemd.sockets.docker.socketConfig.ListenStream
    TCP addition.
  • Provision the server cert/key/CA onto the Pi out-of-band. The repo has
    no secrets manager today; introduce one (sops-nix or agenix) and reference
    the decrypted paths, or deploy the files to e.g. /var/lib/docker-certs
    via a non-Nix channel and point the daemon settings at them. Files must be
    root-readable, key 0600.
  • Firewall: replace the 2375 rule with a source-restricted 2376 rule;
    remove 2375 entirely. (networking.firewall.extraInputRules.)
  • Update system/machine/RPi5/README.md and docker.nix comments: flip the
    "mTLS is the upgrade path" note to the implemented design; document the
    cert paths and the secrets mechanism.

Client side — lyrathorpe/Terraform (Gitea Actions runner)

  • Store the client cert triple as Gitea secrets (base64): DOCKER_CA_PEM,
    DOCKER_CERT_PEM, DOCKER_KEY_PEM.
  • In apply.yml and validate.yml, add a step that decodes those into a
    dir (e.g. $RUNNER_TEMP/docker-certs/{ca,cert,key}.pem, chmod 600 key.pem)
    and set the env:
    - DOCKER_HOST=tcp://lyrathorpe-rpi5:2376 (update the existing
    vars.DOCKER_HOST; must match a SAN on the server cert)
    - DOCKER_TLS_VERIFY=1
    - DOCKER_CERT_PATH=$RUNNER_TEMP/docker-certs
    The empty provider "docker" {} block then needs no change.
  • Alternative to per-job secrets: bake the certs onto the act_runner host
    and bind-mount a read-only cert dir into job containers via the runner
    config (container.options / valid_volumes in the act_runner config),
    with the same three env vars. Choose secrets-in-workflow (portable) vs
    mounted-volume (no secrets in job logs/files); document the choice.
  • Ensure the act_runner container has network reachability to the Pi on
    2376 (routing/firewall from the runner's network to 10.187.1.0/24).
  • If the act_runner container definition lives in this Terraform repo
    (Services/Docker/), wire the cert volume/secret there too.

Acceptance criteria

  • From the runner: docker -H tcp://lyrathorpe-rpi5:2376 --tlsverify info
    succeeds; the same call without valid client certs is refused.
  • Port 2375 is closed on the Pi; 2376 reachable only from the trusted source.
  • tofu plan/apply in the Terraform repo runs against the Pi over mTLS
    from CI with no plaintext Docker exposure.
  • No private key is committed or lands in the Nix store; CA key is offline.

Notes

  • Cross-repo: server tasks land in nixfiles; client tasks in
    lyrathorpe/Terraform. Consider a sibling issue on that repo referencing this
    one.
## Background Follow-up to #31 / PR #32, which added the `lyrathorpe-rpi5` Docker host with the daemon on **plain TCP 2375**, no TLS, protected only by a LAN-source nftables rule. This issue tracks upgrading that to **mutual TLS (mTLS) on 2376** and wiring the client end into the OpenTofu/Terraform repo (`lyrathorpe/Terraform`), whose Gitea Actions runner drives the Docker provider against this host. Plain 2375 is root-equivalent to anyone who reaches the port. mTLS makes both ends prove identity with certificates and removes the "trust the whole subnet" assumption. ## How the pieces connect today - **Server**: `system/machine/RPi5/docker.nix` — dockerd exposed on 2376/2375 via the systemd `docker.socket` `ListenStream`; firewall rule scoped to `10.187.1.0/24`. - **Client**: `lyrathorpe/Terraform` root `main.tf` has an empty `provider "docker" {}` (env-driven). `.gitea/workflows/apply.yml` and `validate.yml` set `DOCKER_HOST: ${{ vars.DOCKER_HOST }}` and run `runs-on: ubuntu-latest`, i.e. inside the act_runner's job containers. The Docker provider already honours `DOCKER_HOST`, and additionally `DOCKER_TLS_VERIFY` + `DOCKER_CERT_PATH` — so no provider-block change is required if those env vars are set. ## Certificates (prerequisite for both ends) - [ ] Stand up a small CA (e.g. `step-ca`/`cfssl`/`openssl`). **Keep the CA key offline.** - [ ] **Server cert** for the Pi. SANs MUST include every name/address clients will use in `DOCKER_HOST`: `lyrathorpe-rpi5`, its FQDN if any, and its LAN **IP** (an IP SAN is required if `DOCKER_HOST` uses an IP). `extendedKeyUsage = serverAuth`. - [ ] **Client cert** for the Terraform runner. `extendedKeyUsage = clientAuth`. - [ ] Decide certificate lifetime + a rotation plan; document renewal and revocation (CRL or short-lived certs). These are **secrets** — never place private keys in the world-readable Nix store. ## Server side — `nixfiles` (`system/machine/RPi5/docker.nix`) - [ ] Move dockerd from socket-activated plain TCP to a TLS listener. TLS can't be done via socket activation (`fd://`); dockerd must own the listener: `daemon.settings.hosts = [ "unix:///run/docker.sock" "tcp://0.0.0.0:2376" ]` plus `tls = true; tlsverify = true; tlscacert/tlscert/tlskey = <paths>`. - [ ] **Reconcile the hosts/socket conflict** (the tricky bit): `daemon.json` `hosts` conflicts with the unit's `-H fd://`. Either drop the `-H fd://` from `systemd.services.docker` (`ExecStart` override) **or** disable `systemd.sockets.docker` so dockerd owns all listeners. Pick one; document why. Remove the current `systemd.sockets.docker.socketConfig.ListenStream` TCP addition. - [ ] **Provision the server cert/key/CA onto the Pi out-of-band.** The repo has no secrets manager today; introduce one (sops-nix or agenix) and reference the decrypted paths, **or** deploy the files to e.g. `/var/lib/docker-certs` via a non-Nix channel and point the daemon settings at them. Files must be root-readable, key `0600`. - [ ] **Firewall**: replace the 2375 rule with a source-restricted 2376 rule; remove 2375 entirely. (`networking.firewall.extraInputRules`.) - [ ] Update `system/machine/RPi5/README.md` and `docker.nix` comments: flip the "mTLS is the upgrade path" note to the implemented design; document the cert paths and the secrets mechanism. ## Client side — `lyrathorpe/Terraform` (Gitea Actions runner) - [ ] Store the client cert triple as Gitea **secrets** (base64): `DOCKER_CA_PEM`, `DOCKER_CERT_PEM`, `DOCKER_KEY_PEM`. - [ ] In `apply.yml` **and** `validate.yml`, add a step that decodes those into a dir (e.g. `$RUNNER_TEMP/docker-certs/{ca,cert,key}.pem`, `chmod 600 key.pem`) and set the env: - `DOCKER_HOST=tcp://lyrathorpe-rpi5:2376` (update the existing `vars.DOCKER_HOST`; must match a SAN on the server cert) - `DOCKER_TLS_VERIFY=1` - `DOCKER_CERT_PATH=$RUNNER_TEMP/docker-certs` The empty `provider "docker" {}` block then needs no change. - [ ] **Alternative to per-job secrets**: bake the certs onto the act_runner host and bind-mount a read-only cert dir into job containers via the runner config (`container.options` / `valid_volumes` in the act_runner config), with the same three env vars. Choose secrets-in-workflow (portable) vs mounted-volume (no secrets in job logs/files); document the choice. - [ ] Ensure the **act_runner container has network reachability** to the Pi on 2376 (routing/firewall from the runner's network to `10.187.1.0/24`). - [ ] If the act_runner container definition lives in this Terraform repo (`Services/Docker/`), wire the cert volume/secret there too. ## Acceptance criteria - [ ] From the runner: `docker -H tcp://lyrathorpe-rpi5:2376 --tlsverify info` succeeds; the same call **without** valid client certs is refused. - [ ] Port 2375 is closed on the Pi; 2376 reachable only from the trusted source. - [ ] `tofu plan`/`apply` in the Terraform repo runs against the Pi over mTLS from CI with no plaintext Docker exposure. - [ ] No private key is committed or lands in the Nix store; CA key is offline. ## Notes - Cross-repo: server tasks land in `nixfiles`; client tasks in `lyrathorpe/Terraform`. Consider a sibling issue on that repo referencing this one.
Author
Owner

Client-side counterpart filed on the Terraform repo: lyrathorpe/Terraform#59

Client-side counterpart filed on the Terraform repo: https://code.emmathe.dev/lyrathorpe/Terraform/issues/59
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: lyrathorpe/nixfiles#33