Installation Manual - 39 Spoke worker coredump hardening rollout

How the spoke-dc-v7 worker coredump MachineConfig control was applied through GitOps and validated.

This chapter records the supervised rollout of the rhcos4-high-worker-coredump-disable-storage control on spoke-dc-v7.

The rollout applied a worker MachineConfig through GitOps and validated that all worker hosts now disable persistent coredump storage with:

Storage=none
ProcessSizeMax=0

Target State

Item	Value
Governance issue	`OP-GF-SPOKEDCV7-26`, issue `#376`
Cluster	`spoke-dc-v7`
Control	`rhcos4-high-worker-coredump-disable-storage`
MachineConfig	`75-worker-coredump-disable-storage`
Final worker render	`rendered-worker-430d044e4d36ecc194bdcd0b451ca322`
Evidence report	`reports/compliance/spoke-dc-v7/20260517/worker-coredump-hardening-rollout.md`

Access Path

Run operational commands from the bootstrap VM through dl385-2.

ssh ze@dl385-2
ssh gf-ocp-bootstrap-01

export HUB_KUBECONFIG=/home/ze/ocp-greenfield-deployment/artifacts/openshift/hub-dc-v7/auth/kubeconfig
export SPOKE_KUBECONFIG=/home/ze/ocp-greenfield-deployment/artifacts/openshift/spoke-dc-v7/auth/kubeconfig

Do not print kubeconfigs, kubeadmin passwords, pull secrets, PAT values, repository private keys, Secret data, or full Secret manifests.

GitOps Change

The active GitOps repository is:

git@github.com:zeshaq/openshift-platform-gitops.git

Commit applied:

8175ed896909906e8317a6c1f9514c4ce4bf942a Add spoke worker coredump hardening

Files changed:

clusters/spoke-dc-v7/node-hardening/kustomization.yaml
clusters/spoke-dc-v7/node-hardening/machineconfig-worker-coredump-disable-storage.yaml

The new MachineConfig writes:

/etc/systemd/coredump.conf

with:

[Coredump]
Storage=none
ProcessSizeMax=0

Server-Side Dry Run

Before pushing GitOps, copy the rendered kustomization to the bootstrap VM and run a server-side dry run.

oc --kubeconfig "$SPOKE_KUBECONFIG" apply --dry-run=server \
  -k /tmp/op-gf-spokedcv7-26-node-hardening

Expected coredump result:

machineconfig.machineconfiguration.openshift.io/75-worker-coredump-disable-storage created (server dry run)

Apply Through Argo

After pushing the GitOps commit, refresh the spoke cluster-config application.

oc --kubeconfig "$HUB_KUBECONFIG" -n openshift-gitops \
  annotate applications.argoproj.io spoke-dc-v7-cluster-config \
  argocd.argoproj.io/refresh=hard --overwrite

Validate convergence:

oc --kubeconfig "$HUB_KUBECONFIG" -n openshift-gitops \
  get applications.argoproj.io spoke-dc-v7-cluster-config \
  -o custom-columns=NAME:.metadata.name,SYNC:.status.sync.status,HEALTH:.status.health.status,REV:.status.sync.revision

Observed final state:

spoke-dc-v7-cluster-config  Synced  Healthy  8175ed896909906e8317a6c1f9514c4ce4bf942a

Worker MCP Watch

Watch the worker MCP and worker node annotations until every worker is on the new render.

oc --kubeconfig "$SPOKE_KUBECONFIG" get mcp worker

oc --kubeconfig "$SPOKE_KUBECONFIG" get nodes \
  -l node-role.kubernetes.io/worker -o json \
  | jq -r '.items[] |
    [.metadata.name,
     (.spec.unschedulable // false),
     (.metadata.annotations["machineconfiguration.openshift.io/state"] // ""),
     (.metadata.annotations["machineconfiguration.openshift.io/currentConfig"] // ""),
     (.metadata.annotations["machineconfiguration.openshift.io/desiredConfig"] // "")]
    | @tsv'

Observed rollout order:

spoke-dc-v7-worker-2
spoke-dc-v7-worker-1
spoke-dc-v7-worker-0

Final MCP state:

worker rendered-worker-430d044e4d36ecc194bdcd0b451ca322 Updated=True Updating=False Degraded=False 3/3

NooBaa Primary Handling

Before the rollout, worker-1 hosted the protected NooBaa DB primary. During the worker-1 update, CNPG moved the primary to worker-0 and rescheduled the other instance to worker-2.

Before MCO updated worker-0, promote the ready instance on worker-2 with the ODF-bundled CNPG plugin:

KUBECONFIG="$SPOKE_KUBECONFIG" /tmp/kubectl-cnpg-noobaa \
  promote noobaa-db-pg-cluster noobaa-db-pg-cluster-2 \
  -n openshift-storage --request-timeout=60s

Observed final CNPG state:

ready=2/2
primary=noobaa-db-pg-cluster-2

Do not patch PDB/noobaa-db-pg-cluster-primary directly as the default workaround.

Final Validation

Validate the rendered worker MachineConfig includes the coredump file.

worker_render=$(oc --kubeconfig "$SPOKE_KUBECONFIG" \
  get mcp worker -o jsonpath='{.status.configuration.name}')

oc --kubeconfig "$SPOKE_KUBECONFIG" get machineconfig "$worker_render" -o json \
  | jq -r 'any(.spec.config.storage.files[]?; .path == "/etc/systemd/coredump.conf")'

Expected:

true

Validate the host file on every worker.

for node in spoke-dc-v7-worker-0 spoke-dc-v7-worker-1 spoke-dc-v7-worker-2; do
  oc --kubeconfig "$SPOKE_KUBECONFIG" debug "node/$node" --quiet -- \
    chroot /host sh -c \
    "grep -E '^(Storage|ProcessSizeMax)=' /etc/systemd/coredump.conf"
done

Observed on all three workers:

Storage=none
ProcessSizeMax=0

Validate cluster and storage health:

oc --kubeconfig "$SPOKE_KUBECONFIG" get nodes
oc --kubeconfig "$SPOKE_KUBECONFIG" get co --no-headers \
  | awk '$3!="True" || $4!="False" || $5!="False" {print}'
oc --kubeconfig "$SPOKE_KUBECONFIG" -n openshift-storage \
  get noobaa noobaa storagecluster ocs-storagecluster cephcluster ocs-storagecluster-cephcluster
oc --kubeconfig "$SPOKE_KUBECONFIG" -n openshift-storage \
  get cluster noobaa-db-pg-cluster \
  -o jsonpath='ready={.status.readyInstances}/{.status.instances} primary={.status.currentPrimary}{"\n"}'

Observed:

all workers Ready and schedulable
no non-steady ClusterOperators reported
NooBaa=Ready
StorageCluster=Ready
CephCluster=Ready HEALTH_OK
CNPG ready=2/2 primary=noobaa-db-pg-cluster-2

Post-Rollout Drainability

Run server-side dry-run drain checks after the rollout because NooBaa primary placement changed.

for node in spoke-dc-v7-worker-0 spoke-dc-v7-worker-1 spoke-dc-v7-worker-2; do
  oc --kubeconfig "$SPOKE_KUBECONFIG" adm drain "$node" \
    --ignore-daemonsets --delete-emptydir-data --dry-run=server --timeout=90s
done

Observed:

Worker	Result	Reason
`spoke-dc-v7-worker-0`	passed	no NooBaa DB primary
`spoke-dc-v7-worker-1`	passed	hosts NooBaa DB replica
`spoke-dc-v7-worker-2`	failed	hosts protected NooBaa DB primary

Worker-2 failed because noobaa-db-pg-cluster-2 is the current primary and the NooBaa primary PDB allows zero voluntary disruptions.

Next Step

The coredump MachineConfig control is live. If formal SCAP evidence is needed, open a tracked follow-up to rerun or observe the Compliance Operator scan and confirm rhcos4-high-worker-coredump-disable-storage reports passing.

For future worker maintenance, revalidate NooBaa DB primary placement first. Worker-2 is currently the protected drain target.