K3s on Hetzner
After running a k3s.io cluster at home for a year, I decided to spin up a cloud cluster for greater learning (and internet capabilities - damn my pitiful 20/4Mbps home connection). I chose Hetzner, specifically their ARM instances, for the price & capabilities.
After an initial setup (a breeze) for my server (main) node, and my setup of fluxcd, I decided to add a second node. In doing so, I had a few hiccups. Mainly to do with whenever a pod was assigned to my agent (secondary) node. With a Deployment, Service + IngressRoute (traefik CRD) setup, I would get a glaring Gateway Timeout.
I tried a variety of debugging techniques:
- Deleted and re-added all the manifests (deployment, svc + ingressroute)
- Checked traefik pod logs, svclb logs
- Recreated the Hetzner private networks
- Begging & cursing k8s to work
- Getting a couple of friends involved
- Recreated the agent (secondary) node
- Recreated the entire cluster
Still no dice.
I then searched "hetzner k3s setup", just in case I missed anything. I came across https://ellie.wtf/notes/hetzner-k3s. Firstly, I notice they assign a few startup args to k3s, but one stands out.--flannel-iface=enp7s0
Whilst debugging, I had ran ifconfig
, saw the usual eth0
, cni0
, and other interfaces and remember seeing that eth0
was the public IP. Surely, k3s wouldn't be trying to route over public IP, hence my issue?
I scroll further, I see:
Otherwise, as I enabled private networking on my Hetzner + want my cluster to use that, I’ve pointed flannel towards the private network interface.
Hot damn. It was. ?
I spin up two instances, do the k3s setup again but with the minimal arguments needed.
# server
curl -sfL https://get.k3s.io | sh -s - server \
# agent
curl -sfL https://get.k3s.io | K3S_URL=https://<ip>:6443 K3S_TOKEN=<token> sh -s - agent \
Setup instructions
I apply my minimal-reproduction k8s manifests, and hey-presto, it works.
Heading over to my alone-instance, I applied the new k3s argument to nano /etc/systemd/system/k3s.service
to use the private interface for the Flannel CNI, and rebooted to fine it worked.
Onwards and upward to the next issue I'll encounter...