Installation Manual - 36 Spoke NooBaa primary relocation
How to relocate the spoke-dc-v7 NooBaa database primary before draining the worker that currently hosts it.
This chapter records the NooBaa database primary relocation gate for
spoke-dc-v7. The previous drainability gate found that worker-2 could not be
drained while it hosted the NooBaa DB primary. This gate moved the primary to
the other NooBaa DB instance and proved worker-2 drainability by server-side
dry run.
Target State
| Item | Value |
|---|---|
| Governance issue | OP-GF-SPOKEDCV7-24, issue #374 |
| Cluster | spoke-dc-v7 |
| Scope | NooBaa DB primary relocation for worker-2 drainability |
| CNPG resource | clusters.postgresql.cnpg.noobaa.io/noobaa-db-pg-cluster |
| Promoted instance | noobaa-db-pg-cluster-2 |
| Evidence report | reports/compliance/spoke-dc-v7/20260517/noobaa-primary-relocation-gate.md |
Access Path
Run operational commands from the bootstrap VM through dl385-2.
ssh ze@dl385-2
ssh gf-ocp-bootstrap-01
export HUB_KUBECONFIG=/home/ze/ocp-greenfield-deployment/artifacts/openshift/hub-dc-v7/auth/kubeconfig
export SPOKE_KUBECONFIG=/home/ze/ocp-greenfield-deployment/artifacts/openshift/spoke-dc-v7/auth/kubeconfig
Do not print kubeconfigs, kubeadmin passwords, pull secrets, PAT values, repository private keys, Secret data, or full Secret manifests.
Supported Mechanism
The NooBaa DB pods are owned by an ODF-managed CNPG cluster:
postgresql.cnpg.noobaa.io/v1/Cluster/noobaa-db-pg-cluster
The bundled plugin exposes the supported primary relocation command:
kubectl-cnpg promote CLUSTER INSTANCE
CloudNativePG documents this as the command to promote a selected pod to primary for maintenance or switchover:
https://cloudnative-pg.io/documentation/preview/kubectl-plugin/#promote
Do not patch PDB/noobaa-db-pg-cluster-primary directly. The PDB protects the
current database primary.
Preflight
Confirm baseline health and current NooBaa DB placement.
oc --kubeconfig "$SPOKE_KUBECONFIG" get clusterversion version
oc --kubeconfig "$SPOKE_KUBECONFIG" get nodes
oc --kubeconfig "$SPOKE_KUBECONFIG" get mcp
oc --kubeconfig "$SPOKE_KUBECONFIG" get co --no-headers \
| awk '$3!="True" || $4!="False" || $5!="False" {print}'
oc --kubeconfig "$SPOKE_KUBECONFIG" -n openshift-storage \
get clusters.postgresql.cnpg.noobaa.io noobaa-db-pg-cluster -o json \
| jq -r '"currentPrimary=\(.status.currentPrimary) targetPrimary=\(.status.targetPrimary) readyInstances=\(.status.readyInstances) phase=\(.status.phase)"'
oc --kubeconfig "$SPOKE_KUBECONFIG" -n openshift-storage \
get pods -o json \
| jq -r '.items[] | select(.metadata.name | test("^noobaa-db-pg-cluster")) |
[.metadata.name, .spec.nodeName, .status.phase,
([.status.conditions[]? | select(.type=="Ready") | .status][0] // ""),
(.metadata.labels["cnpg.io/instanceRole"] // "")] | @tsv'
Pre-change state:
currentPrimary=noobaa-db-pg-cluster-1
targetPrimary=noobaa-db-pg-cluster-1
readyInstances=2
phase=Cluster in healthy state
noobaa-db-pg-cluster-1 spoke-dc-v7-worker-2 Running True primary
noobaa-db-pg-cluster-2 spoke-dc-v7-worker-1 Running True replica
Plugin Handling
The controller pod contains the matching ODF-bundled plugin. Copy it to the bootstrap VM and run it with the existing cluster-admin kubeconfig.
controller_pod=$(oc --kubeconfig "$SPOKE_KUBECONFIG" -n openshift-storage \
get pods -o json \
| jq -r '.items[] | select(.metadata.name | test("^cnpg-controller-manager-")) |
.metadata.name' | head -1)
oc --kubeconfig "$SPOKE_KUBECONFIG" -n openshift-storage \
exec "$controller_pod" -- cat /usr/bin/kubectl-cnpg \
> /tmp/kubectl-cnpg-noobaa
chmod 700 /tmp/kubectl-cnpg-noobaa
/tmp/kubectl-cnpg-noobaa version --kubeconfig "$SPOKE_KUBECONFIG"
Observed version:
Build: Version 4.20.10
The in-pod status command works, but the in-pod promote path failed on
the controller service account’s discovery path in this environment. Running
the same binary from the bootstrap VM with the cluster kubeconfig avoided that
RBAC/discovery issue.
Promote The Replica
Promote the replica that is not on the worker being drained.
/tmp/kubectl-cnpg-noobaa promote noobaa-db-pg-cluster noobaa-db-pg-cluster-2 \
-n openshift-storage \
--kubeconfig "$SPOKE_KUBECONFIG" \
--request-timeout=60s
Expected response:
Node noobaa-db-pg-cluster-2 in cluster noobaa-db-pg-cluster will be promoted
The cluster may briefly show:
phase=Switchover in progress
Wait until it returns to:
currentPrimary=noobaa-db-pg-cluster-2
targetPrimary=noobaa-db-pg-cluster-2
readyInstances=2
phase=Cluster in healthy state
Final Validation
Validate cluster health:
oc --kubeconfig "$SPOKE_KUBECONFIG" -n openshift-storage \
get storagecluster ocs-storagecluster -o jsonpath='{.status.phase}{"\n"}'
oc --kubeconfig "$SPOKE_KUBECONFIG" -n openshift-storage \
get cephcluster ocs-storagecluster-cephcluster -o jsonpath='{.status.ceph.health}{"\n"}'
oc --kubeconfig "$SPOKE_KUBECONFIG" -n openshift-storage \
get noobaa noobaa -o jsonpath='{.status.phase}{"\n"}'
oc --kubeconfig "$SPOKE_KUBECONFIG" -n openshift-logging \
get lokistack logging-loki
Validate worker drain behavior:
for node in $(oc --kubeconfig "$SPOKE_KUBECONFIG" get nodes \
-l node-role.kubernetes.io/worker= \
-o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | sort); do
oc --kubeconfig "$SPOKE_KUBECONFIG" adm drain "$node" \
--ignore-daemonsets --delete-emptydir-data --dry-run=server --timeout=90s
done
Final observed state:
noobaa-db-pg-cluster-1 spoke-dc-v7-worker-2 Running True replica
noobaa-db-pg-cluster-2 spoke-dc-v7-worker-1 Running True primary
Drain dry-run result:
| Worker | Result |
|---|---|
spoke-dc-v7-worker-0 | passed |
spoke-dc-v7-worker-1 | failed, now hosts protected NooBaa DB primary |
spoke-dc-v7-worker-2 | passed |
Worker-2 evidence:
node/spoke-dc-v7-worker-2 drained (server dry run)
All workers remained schedulable after the dry-run checks.
Operating Decision
This gate made spoke-dc-v7-worker-2 drainable without weakening the NooBaa
primary PDB.
It did not make all workers simultaneously drainable. With the current two-instance NooBaa DB shape, whichever worker hosts the primary will be the blocked voluntary drain target.
If worker-1 must be drained later, first relocate the primary away from worker-1 and rerun dry-run drain validation.