2 min read

K3s on Hetzner

K3s on Hetzner
Photo by Compare Fibre / Unsplash

After running a k3s.io cluster at home for a year, I decided to spin up a cloud cluster for greater learning (and internet capabilities - damn my pitiful 20/4Mbps home connection). I chose Hetzner, specifically their ARM instances, for the price & capabilities.

After an initial setup (a breeze) for my server (main) node, and my setup of fluxcd, I decided to add a second node. In doing so, I had a few hiccups. Mainly to do with whenever a pod was assigned to my agent (secondary) node. With a Deployment, Service + IngressRoute (traefik CRD) setup, I would get a glaring Gateway Timeout.

I tried a variety of debugging techniques:

  • Deleted and re-added all the manifests (deployment, svc + ingressroute)
  • Checked traefik pod logs, svclb logs
  • Recreated the Hetzner private networks
  • Begging & cursing k8s to work
  • Getting a couple of friends involved
  • Recreated the agent (secondary) node
  • Recreated the entire cluster

Still no dice.

I then searched "hetzner k3s setup", just in case I missed anything. I came across https://ellie.wtf/notes/hetzner-k3s. Firstly, I notice they assign a few startup args to k3s, but one stands out.--flannel-iface=enp7s0.

Whilst debugging, I had ran ifconfig, saw the usual eth0, cni0, and other interfaces and remember seeing that eth0 was the public IP. Surely, k3s wouldn't be trying to route over public IP, hence my issue?

I scroll further, I see:

Otherwise, as I enabled private networking on my Hetzner + want my cluster to use that, I’ve pointed flannel towards the private network interface.

Hot damn. It was. ?

I spin up two instances, do the k3s setup again but with the minimal arguments needed.

# server
curl -sfL https://get.k3s.io | sh -s - server \
    --flannel-iface=enp7s0

# agent
curl -sfL https://get.k3s.io | K3S_URL=https://<ip>:6443 K3S_TOKEN=<token> sh -s - agent \ 
    --flannel-iface=enp7s0

Setup instructions

I apply my minimal-reproduction k8s manifests, and hey-presto, it works.

Heading over to my alone-instance, I applied the new k3s argument to nano /etc/systemd/system/k3s.service to use the private interface for the Flannel CNI, and rebooted to fine it worked.

Onwards and upward to the next issue I'll encounter...