Installation Manual - 70 Old Vault cold-retention soak

Read-only cold-retention validation after the old lost-custody Vault VMs were powered off.

This chapter records the first cold-retention soak checkpoint after the old lost-custody v7 Vault VMs were powered off.

No VM was started. No disk image was deleted. No DNS, GitOps, OpenShift, Vault R1, or MinIO state was changed.

Governance

FieldValue
IssueOP-GF-VAULTRECOVERY-1 / #389
MilestoneWorkspace Governance
ADRADR 0028: Greenfield Vault Replacement After Custody Loss
Existing controlsADR 0016 and ADR 0025

Validation Boundary

The checkpoint validated the current cold-retention state:

  • old Vault VMs remained powered off;
  • old Vault autostart remained disabled;
  • old Vault disk images remained retained;
  • replacement Vault R1 remained running;
  • stable Vault DNS still resolved to R1;
  • active OpenShift consumers remained healthy on R1-backed stores;
  • OADP, RHACS, nodes, and ClusterOperators remained healthy.

This checkpoint did not observe a new scheduled backup after old Vault VM power-off. The latest normal scheduled backups completed before the power-off gate and were rechecked as still Completed. A later post-power-off backup window remains the next checkpoint before any disk-retention decision.

VM State

VMStateAutostart
gf-ocp-vault-seed-01shut offdisabled
gf-ocp-vault-01shut offdisabled
gf-ocp-vault-02shut offdisabled
gf-ocp-vault-03shut offdisabled
gf-ocp-vault-r1-seed-01runningenabled
gf-ocp-vault-r1-01runningenabled
gf-ocp-vault-r1-02runningenabled
gf-ocp-vault-r1-03runningenabled

Old disk images remained present under /var/lib/libvirt/images/.

DNS And Vault Health

DNS remained unchanged:

NameResult
vault.v7.comptech-lab.com30.30.200.35, 30.30.200.36, 30.30.200.37
gf-ocp-vault-seed-01.v7.comptech-lab.com30.30.200.30
gf-ocp-vault-01.v7.comptech-lab.comno A record
gf-ocp-vault-02.v7.comptech-lab.comno A record
gf-ocp-vault-03.v7.comptech-lab.comno A record

PowerDNS zone serial remained 44.

Old Vault direct health remained unreachable after power-off:

EndpointHTTP code
30.30.200.30:8200000
30.30.200.31:8200000
30.30.200.32:8200000
30.30.200.33:8200000

Replacement Vault R1 health was healthy with standby-ok enabled:

EndpointHTTP code
30.30.200.35:8200200
30.30.200.36:8200200
30.30.200.37:8200200

OpenShift Validation

Both clusters remained on OpenShift 4.20.18.

ClusterNode stateClusterOperator exceptions
hub-dc-v7all Readynone
spoke-dc-v7all Readynone

Argo CD:

ContextApplicationSyncHealthRevision
hubhub-dc-v7-bootstrapSyncedHealthyc138c6c
hubOCM wrapper for spoke-dc-v7-cluster-configSyncedProgressingc138c6c
OCM reportspoke-cluster-config-pullSyncedHealthyc138c6c
spokespoke-dc-v7-cluster-configSyncedHealthyc138c6c

The hub-side wrapper remained Progressing, but the managed-cluster report showed the spoke application healthy:

clusters=1 synced=1 healthy=1 inProgress=0 notHealthy=0 notSynced=0

External Secrets stores:

ClusterStoreReadyReason
hubvault-r1-eso-smokeTrueValid
hubvault-r1-oadpTrueValid
hubvault-r1-rhacsTrueValid
spokevault-r1-eso-smokeTrueValid
spokevault-r1-oadpTrueValid
spokevault-r1-rhacsTrueValid

ExternalSecrets:

ClusterReadyNotes
hub6/6OADP and ESO smoke refreshed after old VM power-off; RHACS remained Ready
spoke6/6OADP, ESO smoke, and logging refreshed after old VM power-off; RHACS remained Ready

Vault egress policies continued to allow only R1 Vault CIDRs:

30.30.200.35/32
30.30.200.36/32
30.30.200.37/32

OADP Validation

OADP remained healthy:

ClusterDPABSLScheduleLast Backup
hub-dc-v7ReconciledAvailable15 2 * * *platform-resource-daily-20260518021546
spoke-dc-v7ReconciledAvailable45 2 * * *platform-resource-daily-20260518024523

Latest normal scheduled backups were rechecked:

ClusterBackupPhaseItemsWarningsErrors
hub-dc-v7platform-resource-daily-20260518021546Completed10169/1016900
spoke-dc-v7platform-resource-daily-20260518024523Completed15844/1584400

These backups completed before the old VM power-off. The next checkpoint should observe or intentionally accelerate a new backup window while old Vault VMs remain off.

RHACS Validation

  • Hub Central remained deployed.
  • No non-running StackRox pods were found on hub or spoke.

Result

The first cold-retention checkpoint passed.

Replacement Vault R1 remained healthy, active OpenShift consumers remained healthy, and the old lost-custody Vault VMs stayed powered off with old disks retained.

The next gate should validate a post-power-off OADP backup window before any final retention decision about old Vault VM disk images.