Deployment and storage
The Vault OSS VM cluster — three Raft voters plus a separate transit-seal Vault — running on dedicated VMs to keep secret delivery independent of any OpenShift cluster.
Vault in this lab is HashiCorp Vault OSS running on dedicated Ubuntu VMs, on the lab /16, completely outside any OpenShift cluster. This page covers the topology, the storage backend choice, and the rationale.
The decision (per ADR vault-oss-vm-plan.md)
All previous RKE2/OpenShift-hosted Vault deployments are retired for the clean rebuild. The new active target is a Vault OSS service running directly on dedicated VMs on the private lab network. Vault must be independent of OpenShift and RKE2 so OpenShift secret delivery does not depend on a cluster that is itself being rebuilt.
In one sentence: OpenShift depends on Vault, not the other way around.
That principle means Vault cannot live in OpenShift. If it did, every cluster bootstrap (registry pull secrets, htpasswd identity, certs, ESO sync targets) would race against Vault’s own bootstrap. By keeping Vault on VMs, the dependency graph is acyclic: build the lab → bring up Vault → install OpenShift → wire ESO from OpenShift to Vault.
The four VMs
| Hostname | Role |
|---|---|
vault-seal-0 | Transit-seal Vault. Single-node. Shamir 5/3 (5 unseal shares, threshold 3). Manual unseal after restart. |
vault-0 | Main Vault Raft voter. |
vault-1 | Main Vault Raft voter. |
vault-2 | Main Vault Raft voter. |
DNS:
vault-seal-0.sub.comptech-lab.com— direct VM name; not used by ESO clients, only by the main Vault nodes for transit auto-unseal traffic.vault-0.sub.comptech-lab.com,vault-1.sub.comptech-lab.com,vault-2.sub.comptech-lab.com— direct VM names; used for Raft cluster traffic and operator CLI.vault.sub.comptech-lab.com— DNS round-robin to the three Raft voters. This is what every ESOSecretStore, everyVAULT_ADDRenvironment variable, every operator CLI session points at. The Raft cluster handles “this client got the standby” via HA redirects.
The version
| Item | Value |
|---|---|
| Vault | OSS 1.21.1 |
| Source | Official HashiCorp release archive, checksum-verified at install |
| Storage backend | Vault Integrated Storage (Raft) — no separate Consul or etcd |
| API endpoint | https://vault.sub.comptech-lab.com:8200 |
| Raft cluster traffic | :8201 (peer-to-peer between the three voters) |
Vault OSS, not Enterprise. The lab does not need Enterprise namespaces or DR replication; OSS Raft handles three-node HA, snapshot/restore, and autopilot.
Storage: Raft, on dedicated data disk per VM
Each of the four VMs (three voters + seal) has a dedicated data disk:
- OS disk: small (~100G), root + binaries.
- Data disk: dedicated qcow2 mounted at
/var/lib/vault/raft(for voters) or/var/lib/vault/raft(for the seal Vault — same path, just a one-node Raft).
The Vault config points the storage backend at this dedicated path:
storage "raft" {
path = "/var/lib/vault/raft"
node_id = "vault-0" # unique per voter
}
Why dedicated:
- Independent fsync. Vault’s Raft path needs to fsync on every entry; sharing the OS disk would couple the OS’s buffered writes (apt, journald) with Vault’s durability path.
- Independent sizing. Vault’s data path grows with secret count + revision history; the OS disk shouldn’t have to.
- Clean snapshot/restore. Replacing a node means rebuilding the OS, leaving the data disk intact, and starting the daemon again. Cleaner than a full qcow2 swap.
Listener
listener "tcp" {
address = "0.0.0.0:8200"
tls_cert_file = "/etc/vault.d/tls/vault.crt"
tls_key_file = "/etc/vault.d/tls/vault.key"
}
api_addr = "https://vault-0.sub.comptech-lab.com:8200"
cluster_addr = "https://vault-0.sub.comptech-lab.com:8201"
address = 0.0.0.0:8200— listens on every NIC. Firewall (ufw/iptables) is what restricts who can reach it.- TLS required. No plaintext Vault traffic anywhere.
api_addrandcluster_addruse the per-node FQDN. This is what HA redirects send clients to: when a client asks the standby for a write, the standby sends a redirect toapi_addrof the active node. The DNS RR endpoint isn’t suitable for the redirect target — it has to point at a specific node — so each Vault node knows its own name.
Why three voters (and not five, or one)
| Cluster size | Quorum survives loss of | Trade-off |
|---|---|---|
| 1 voter | Nothing (any loss = quorum loss) | Cheapest; least durable |
| 3 voters | 1 node | Lab-appropriate; survives a single VM failure or one hypervisor down |
| 5 voters | 2 nodes | Better fault tolerance; doubles the disk + RAM cost |
The lab picked three. With one operator, three hypervisors max, and a relatively small secret population, three voters is the right cost/availability balance. The hypervisor placement convention is to spread the three voters across distinct hosts so a single hypervisor failure doesn’t take quorum.
Why a separate transit-seal Vault
The main Vault Raft cluster auto-unseals against a separate Vault running on vault-seal-0. The seal Vault uses Shamir unseal (5 shares, threshold 3) — operators bring it out of seal manually after each restart. Once unsealed, it exposes a single transit key (transit/keys/main-vault-auto-unseal) that the main Vault encrypts/decrypts its root key with.
That gives the cluster:
- Automatic restart of voters. A reboot or
systemctl restart vaulton a voter brings it back without operator interaction; it asks the seal Vault to decrypt its sealed root key. - Operator-gated re-bootstrap. If the seal Vault is sealed (e.g., after a power loss), no voter can come up. That’s the explicit guard: the lab refuses to fully auto-recover Vault from an unknown state without an operator.
Auto-unseal flow:
vault-0starts. Reads its sealed root key from/var/lib/vault/raft.- Talks to
vault-seal-0:8200via its transit token. - Asks transit to decrypt the sealed root key.
- Unseals itself, joins Raft.
If vault-seal-0 is sealed, step 2 fails and vault-0 stays sealed. The operator unseals vault-seal-0 first (Shamir 3-of-5), and the voters then unseal themselves on next restart.
The seal Vault’s unseal shares are kept in offline custody, separated from the main Vault VMs. This is the “key not next to the data it protects” principle — auto-unseal that hides the key on the same disk would be theater.
VM hardening baseline (per vault-oss-vm-plan.md)
- Minimal supported Linux OS, fully patched from the local mirror where possible.
- Dedicated
vaultuser and group; binary owned byvault:vault. - Vault binary pinned to
1.21.1and checksum-verified before install. systemdservice with restricted filesystem access.- Swap disabled.
- Core dumps disabled.
- Firewall allows:
8200/tcpfrom approved clients and the lab-internal HAProxy / DNS RR clients only;8201/tcponly between Vault nodes;- SSH only from the approved admin subnet.
- TLS required for all Vault client and cluster traffic.
- Vault data directory writable only by the
vaultuser. - Audit log path writable only where needed, with rotation and disk alerts.
- VM disks protected with the lab disk-encryption standard where practical.
What clients see
| Use case | What clients write |
|---|---|
| ESO (OpenShift) | VaultConfig.spec.server: https://vault.sub.comptech-lab.com:8200 |
| Operator CLI | export VAULT_ADDR=https://vault.sub.comptech-lab.com:8200 |
| Vault snapshot job (on a voter) | VAULT_ADDR=https://127.0.0.1:8200 (loopback against the local voter) |
In every case the resolver is the lab recursor; the DNS RR endpoint returns the three voter addresses; the client picks one and HA-redirects to the active node if needed.
Production readiness gates (still pending)
From vault-oss-vm-plan.md, the gates that must pass before Vault is treated as production-trusted:
vault statusshows initialized, unsealed, HA enabled.vault operator raft list-peersshows three healthy voters.vault operator raft autopilot stateis healthy.- Active node failover works when the leader is stopped.
- Restart behavior matches the approved seal strategy.
- Audit device is enabled and log rotation is validated.
- Snapshot backup and isolated restore both pass.
- OpenShift ESO can reach Vault over TLS.
- Vault can reach each OpenShift API TokenReview endpoint.
- A low-risk smoke
ExternalSecretsyncs without printing values.
The current state of the gates is in the rebuild plan; several of them (notably restore drill, audit device with rotation) remain open.
References
opp-full-plat/plans/disconnected-rebuild/environments/dc-lab/vault-oss-vm-plan.mdopp-full-plat/connection-details/vault-app-secrets.mdopp-full-plat/plans/disconnected-rebuild/environments/dc-lab/allocation-table.md(Vault OSS VM Allocation)- HashiCorp Vault docs: developer.hashicorp.com/vault/docs