Deployment and storage

The Vault OSS VM cluster — three Raft voters plus a separate transit-seal Vault — running on dedicated VMs to keep secret delivery independent of any OpenShift cluster.

Vault in this lab is HashiCorp Vault OSS running on dedicated Ubuntu VMs, on the lab /16, completely outside any OpenShift cluster. This page covers the topology, the storage backend choice, and the rationale.

The decision (per ADR `vault-oss-vm-plan.md`)

All previous RKE2/OpenShift-hosted Vault deployments are retired for the clean rebuild. The new active target is a Vault OSS service running directly on dedicated VMs on the private lab network. Vault must be independent of OpenShift and RKE2 so OpenShift secret delivery does not depend on a cluster that is itself being rebuilt.

In one sentence: OpenShift depends on Vault, not the other way around.

That principle means Vault cannot live in OpenShift. If it did, every cluster bootstrap (registry pull secrets, htpasswd identity, certs, ESO sync targets) would race against Vault’s own bootstrap. By keeping Vault on VMs, the dependency graph is acyclic: build the lab → bring up Vault → install OpenShift → wire ESO from OpenShift to Vault.

The four VMs

ESO on each cluster (hub, spoke)

Operator CLI / API (VAULT_ADDR)

MinIO (snapshot target)

vault.sub.comptech-lab.com (DNS RR to 3 voters, :8200)

vault-0 Raft voter

vault-1 Raft voter

vault-2 Raft voter

vault-seal-0 Transit-seal Vault Shamir 5/3 (manual unseal)

secret/ocp/platform/* secret/ocp/<cluster>/* secret/apps/<div>/<app>/*

Hostname	Role
`vault-seal-0`	Transit-seal Vault. Single-node. Shamir 5/3 (5 unseal shares, threshold 3). Manual unseal after restart.
`vault-0`	Main Vault Raft voter.
`vault-1`	Main Vault Raft voter.
`vault-2`	Main Vault Raft voter.

DNS:

vault-seal-0.sub.comptech-lab.com — direct VM name; not used by ESO clients, only by the main Vault nodes for transit auto-unseal traffic.
vault-0.sub.comptech-lab.com, vault-1.sub.comptech-lab.com, vault-2.sub.comptech-lab.com — direct VM names; used for Raft cluster traffic and operator CLI.
vault.sub.comptech-lab.com — DNS round-robin to the three Raft voters. This is what every ESO SecretStore, every VAULT_ADDR environment variable, every operator CLI session points at. The Raft cluster handles “this client got the standby” via HA redirects.

The version

Item	Value
Vault	OSS `1.21.1`
Source	Official HashiCorp release archive, checksum-verified at install
Storage backend	Vault Integrated Storage (Raft) — no separate Consul or etcd
API endpoint	`https://vault.sub.comptech-lab.com:8200`
Raft cluster traffic	`:8201` (peer-to-peer between the three voters)

Vault OSS, not Enterprise. The lab does not need Enterprise namespaces or DR replication; OSS Raft handles three-node HA, snapshot/restore, and autopilot.

Storage: Raft, on dedicated data disk per VM

Each of the four VMs (three voters + seal) has a dedicated data disk:

OS disk: small (~100G), root + binaries.
Data disk: dedicated qcow2 mounted at /var/lib/vault/raft (for voters) or /var/lib/vault/raft (for the seal Vault — same path, just a one-node Raft).

The Vault config points the storage backend at this dedicated path:

storage "raft" {
  path    = "/var/lib/vault/raft"
  node_id = "vault-0"   # unique per voter
}

Why dedicated:

Independent fsync. Vault’s Raft path needs to fsync on every entry; sharing the OS disk would couple the OS’s buffered writes (apt, journald) with Vault’s durability path.
Independent sizing. Vault’s data path grows with secret count + revision history; the OS disk shouldn’t have to.
Clean snapshot/restore. Replacing a node means rebuilding the OS, leaving the data disk intact, and starting the daemon again. Cleaner than a full qcow2 swap.

Listener

listener "tcp" {
  address       = "0.0.0.0:8200"
  tls_cert_file = "/etc/vault.d/tls/vault.crt"
  tls_key_file  = "/etc/vault.d/tls/vault.key"
}

api_addr     = "https://vault-0.sub.comptech-lab.com:8200"
cluster_addr = "https://vault-0.sub.comptech-lab.com:8201"

address = 0.0.0.0:8200 — listens on every NIC. Firewall (ufw / iptables) is what restricts who can reach it.
TLS required. No plaintext Vault traffic anywhere.
api_addr and cluster_addr use the per-node FQDN. This is what HA redirects send clients to: when a client asks the standby for a write, the standby sends a redirect to api_addr of the active node. The DNS RR endpoint isn’t suitable for the redirect target — it has to point at a specific node — so each Vault node knows its own name.

Why three voters (and not five, or one)

Cluster size	Quorum survives loss of	Trade-off
1 voter	Nothing (any loss = quorum loss)	Cheapest; least durable
3 voters	1 node	Lab-appropriate; survives a single VM failure or one hypervisor down
5 voters	2 nodes	Better fault tolerance; doubles the disk + RAM cost

The lab picked three. With one operator, three hypervisors max, and a relatively small secret population, three voters is the right cost/availability balance. The hypervisor placement convention is to spread the three voters across distinct hosts so a single hypervisor failure doesn’t take quorum.

Why a separate transit-seal Vault

The main Vault Raft cluster auto-unseals against a separate Vault running on vault-seal-0. The seal Vault uses Shamir unseal (5 shares, threshold 3) — operators bring it out of seal manually after each restart. Once unsealed, it exposes a single transit key (transit/keys/main-vault-auto-unseal) that the main Vault encrypts/decrypts its root key with.

That gives the cluster:

Automatic restart of voters. A reboot or systemctl restart vault on a voter brings it back without operator interaction; it asks the seal Vault to decrypt its sealed root key.
Operator-gated re-bootstrap. If the seal Vault is sealed (e.g., after a power loss), no voter can come up. That’s the explicit guard: the lab refuses to fully auto-recover Vault from an unknown state without an operator.

Auto-unseal flow:

vault-0 starts. Reads its sealed root key from /var/lib/vault/raft.
Talks to vault-seal-0:8200 via its transit token.
Asks transit to decrypt the sealed root key.
Unseals itself, joins Raft.

If vault-seal-0 is sealed, step 2 fails and vault-0 stays sealed. The operator unseals vault-seal-0 first (Shamir 3-of-5), and the voters then unseal themselves on next restart.

The seal Vault’s unseal shares are kept in offline custody, separated from the main Vault VMs. This is the “key not next to the data it protects” principle — auto-unseal that hides the key on the same disk would be theater.

VM hardening baseline (per `vault-oss-vm-plan.md`)

Minimal supported Linux OS, fully patched from the local mirror where possible.
Dedicated vault user and group; binary owned by vault:vault.
Vault binary pinned to 1.21.1 and checksum-verified before install.
systemd service with restricted filesystem access.
Swap disabled.
Core dumps disabled.
Firewall allows:
- 8200/tcp from approved clients and the lab-internal HAProxy / DNS RR clients only;
- 8201/tcp only between Vault nodes;
- SSH only from the approved admin subnet.
TLS required for all Vault client and cluster traffic.
Vault data directory writable only by the vault user.
Audit log path writable only where needed, with rotation and disk alerts.
VM disks protected with the lab disk-encryption standard where practical.

What clients see

Use case	What clients write
ESO (OpenShift)	`VaultConfig.spec.server: https://vault.sub.comptech-lab.com:8200`
Operator CLI	`export VAULT_ADDR=https://vault.sub.comptech-lab.com:8200`
Vault snapshot job (on a voter)	`VAULT_ADDR=https://127.0.0.1:8200` (loopback against the local voter)

In every case the resolver is the lab recursor; the DNS RR endpoint returns the three voter addresses; the client picks one and HA-redirects to the active node if needed.

Production readiness gates (still pending)

From vault-oss-vm-plan.md, the gates that must pass before Vault is treated as production-trusted:

vault status shows initialized, unsealed, HA enabled.
vault operator raft list-peers shows three healthy voters.
vault operator raft autopilot state is healthy.
Active node failover works when the leader is stopped.
Restart behavior matches the approved seal strategy.
Audit device is enabled and log rotation is validated.
Snapshot backup and isolated restore both pass.
OpenShift ESO can reach Vault over TLS.
Vault can reach each OpenShift API TokenReview endpoint.
A low-risk smoke ExternalSecret syncs without printing values.

The current state of the gates is in the rebuild plan; several of them (notably restore drill, audit device with rotation) remain open.

References

opp-full-plat/plans/disconnected-rebuild/environments/dc-lab/vault-oss-vm-plan.md
opp-full-plat/connection-details/vault-app-secrets.md
opp-full-plat/plans/disconnected-rebuild/environments/dc-lab/allocation-table.md (Vault OSS VM Allocation)
HashiCorp Vault docs: developer.hashicorp.com/vault/docs