I've been experimenting with Kamal for deploying an Elixir application. The application was previously deployed on Fly.io and relies on the Erlang VM's clustering capabilities, so I had to get this working on the Hetzner deployment I'd spun up using Kamal.
It took me a bit of effort to get the clustering setup working, so here are my notes on what I did to get it working. It's also not a perfect setup at the moment, so if you have any suggestions do let me know!
Initial setup(without clustering)
Because mix
already has a task to generate a Dockerfile, and that's all you really need for Kamal, this step was fairly easy.
I created a setup with two roles:
servers:
web:
hosts:
- WEB_IP
pipeline:
hosts:
- PIPELINE_IP
I also created a private network in Hetzner:
We'll use this private network later for our clustering setup.
At this point, I was able to deploy to my servers and see my Phoenix app running but they weren't in a cluster.
libcluster_hcloud
I set up the Hetzner Cloud clustering strategy for libcluster
using libcluster_hcloud. This uses the Hetzner API to find VMs to cluster with, filtered using label selectors so make sure you have the label you set in the config also set up on your VMs.
config :libcluster,
topologies: [
labels_example: [
strategy: Elixir.ClusterHcloud.Strategy.Labels,
config: [
hcloud_api_access_token: "xxx",
label_selector: "cluster",
app_prefix: "my_app",
show_debug: false,
private_network_name: "private-network",
polling_interval: 10_000]]],
At this point, if you redeploy the app you should see in the logs that libcluster
finds the VMs but isn't able to connect to them.
2024-10-28T08:29:58.801497484Z 08:29:58.800 [warning] [libcluster:labels_example] unable to connect to :"my_app@10.0.1.1"
Setting up distribution
The problem is that libcluster
tries to find :"my_app@10.0.1.1
"
but we haven't named our node as such. So let's do that. In rel/env.sh.eex
I added:
#!/bin/sh
# Try to get private IP from Hetzner metadata endpoint first, fallback to hostname -i or localhost
IP_ADDR=${HOST_PRIVATE_IP:-$(curl -s http://169.254.169.254/hetzner/v1/metadata/private-networks | grep -m1 "ip: " | awk '{print $3}' || hostname -i 2>/dev/null || echo "127.0.0.1")}
export RELEASE_DISTRIBUTION="name"
export RELEASE_NODE="my_app@${HOST_PRIVATE_IP}"
Setting RELEASE_DISTRIBUTION="name"
ensures that we use fully qualified names that can talk to BEAMs in other VMs.
We use the private IP address as the hostname in the RELEASE_NODE
as that's what libcluster is using.
Disabling EPMD and publishing distribution port
By default, enabling distribution means that there are two ports at play– the EPMD port(4369) and a randomly assigned port for each node in the cluster.
Initially I tried constraining the range from which the port is randomly assigned using the flags described here but ran into issues connecting iex
remotely.
Instead, it is possible to use an EPMD-less approach(see here and here)
# In vm.args.eex
-start_epmd false -erl_epmd_port 6789
# In remote.vm.args.eex
-start_epmd false -erl_epmd_port 6789 -dist_listen false
This way we only have to expose port 6789 from our container.
This is the config I had to add to my deploy.yml
to publish the port:
accessories:
traefik:
service: traefik
image: traefik:v3.1
roles:
- web
- pipeline
options:
publish:
- 6789:6789
cmd: "--providers.docker --providers.docker.exposedByDefault=false --entryPoints.epmd.address=:6789 --log.level=INFO"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
labels:
traefik.enable: true
traefik.tcp.routers.epmd.rule: "ClientIP(`0.0.0.0/0`)"
traefik.tcp.routers.epmd.priority: 5
traefik.tcp.routers.epmd.entryPoints: epmd
traefik.tcp.routers.epmd.service: epmd
traefik.tcp.services.epmd.loadBalancer.server.port: 6789
This seems like another of Kamal's limitations– if you were directly invoking Docker, it's possible to just publish the port. But instead you have to set up Traefik to expose "extra" ports from your services.
And with that, once you re-deploy, libcluster
should now be able to connect with the nodes it finds.
iex(pod_clipper@10.0.2.1)1> Node.list()
[:"pod_clipper@10.0.1.1"]
iex(pod_clipper@10.0.2.1)2>