HA with Raft

Run 3-5 control plane nodes with embedded Raft consensus for high availability. No external etcd, no external database.

Configuration

mode: "controlplane"
controlPlane:
  address: ":8080"
  storePath: "/data"
  raft:
    nodeId: "${POD_NAME}"
    bindAddress: ":7000"
    advertiseAddress: "${POD_IP}:7000"
    discovery:
      dns: "vrata-headless.vrata.svc.cluster.local"

All Raft fields

FieldDefaultDescription
raft.nodeIdrequiredUnique node identifier (use ${POD_NAME} in Kubernetes)
raft.bindAddressrequiredAddress for Raft inter-node communication
raft.advertiseAddressAddress other nodes use to reach this one (use ${POD_IP}:7000 in Kubernetes)
raft.discovery.dnsHeadless Service FQDN for automatic peer discovery
raft.peersStatic peer list (alternative to DNS discovery)

Either discovery.dns or peers must be set — not both.

How it works

Peer discovery

raft:
  discovery:
    dns: "vrata-headless.vrata.svc.cluster.local"

Vrata resolves the headless Service and discovers all peers automatically. New nodes join the cluster as they appear in DNS.

Static peers

raft:
  peers:
    - "cp-0=10.0.0.1:7000"
    - "cp-1=10.0.0.2:7000"
    - "cp-2=10.0.0.3:7000"

For bare metal or environments without DNS service discovery.

Scaling

NodesToleratesRecommendation
10 failuresDevelopment only
31 failureMinimum for production
52 failuresMaximum recommended

Avoid even numbers — Raft needs a majority for quorum (2 of 3, 3 of 5). Beyond 5 nodes, write latency increases without meaningful benefit.

Kubernetes example

# Helm values
controlPlane:
  replicas: 3
  config:
    controlPlane:
      address: ":8080"
      storePath: "/data"
      raft:
        nodeId: "${POD_NAME}"
        bindAddress: ":7000"
        advertiseAddress: "${POD_IP}:7000"
        discovery:
          dns: "vrata-control-plane-headless.vrata.svc.cluster.local"