Bump an operator version
From CSV release notes to a merged MR: how to mirror the new operator/operand images, pin the startingCSV, validate the rollout, and update the operator-version-lock.
This task covers the canonical operator version bump on the v6 fleet: from CSV release notes to a merged platform-gitops MR, through oc mirror --v2 for the new images, the startingCSV pin update, the MachineConfigPool waits (when applicable), and the operator-version-lock refresh.
The procedure is the same across operators; specifics differ (channel, default CSV, MCP-touching status). The fleet’s active operator queue lives in connection-details/platform-admin-handoff.md §“Current Installed Operator Baseline” + §“Planned Operator Install Queue”, and the source-of-truth version pin file is plans/disconnected-rebuild/environments/dc-lab/operator-version-lock.md.
When this task runs
- Upstream releases a CSV with a security fix or required feature. CVE in the operator’s image, a CRD addition needed by an ADR-mandated capability, a new channel default.
- Compliance scan flags the current version. PCI-DSS scan finds an operator at a version no longer covered by a vendor’s support window.
- Operator-dependency satisfaction. A new operator install requires an existing operator to be at version
>= X.Y.Z.
The cadence is event-driven, not calendar-driven. The fleet does not auto-bump operators; the explicit pin in operator-version-lock.md is the contract.
What is in scope
- One operator at a time. Bumping two operators in the same MR makes the validation harder and the rollback more expensive.
- The
SubscriptionstartingCSVvalue, the channel, the catalog source (Red Hat vs certified), the operand version compatibility. - Image-supply: every image the new CSV references must be mirrored to Nexus before the bump.
- MachineConfigPool rollout windows: operators that ship MachineConfigs (Compliance Operator, File Integrity Operator, certain network operators) trigger MCP rollouts that take 30-60 minutes per pool.
Out of scope:
- OCP minor/patch upgrades — different procedure, ADR 0018-governed.
- Operator install (vs bump). Install is the MR mechanics operator-install pattern.
- Reverting a bump that proved bad — that is an incident, not a routine task; see the break-glass procedure.
Pre-checks
Before mutating anything:
-
Read the upstream CSV release notes. Note any breaking changes, new RBAC requirements (which may require an
argocd-platform-extensionsClusterRole extension), changed CRD versions, new operand dependencies. -
Identify all images the new CSV references. The Subscription resolves to an InstallPlan, which resolves to a CSV YAML, which lists
relatedImages. Capture the set:# Inspect the new CSV before installing it (if available in the local catalog # source after a fresh oc mirror): K=/home/ze/.kube/configs/spoke-dc-v6.kubeconfig oc --kubeconfig "$K" -n openshift-marketplace get packagemanifest <package> \ -o jsonpath='{.status.channels[?(@.name=="<channel>")].currentCSV}{"\n"}' -
Confirm catalog coverage. The new CSV must be present in the mirrored CatalogSource. If
oc get packagemanifestdoes not show the new version, run anoc mirror --v2cycle from the mirror VM first (see What To Do If An Operator Package Is Missing — recipe inconnection-details/platform-admin-handoff.md). -
Confirm operand image coverage. Spot-check that the operand images in the new CSV’s
relatedImagesare mirrored. The image-supply drift script catches this after rollout, but the cheap check beforehand is:skopeo inspect \ --authfile /home/ze/ocp-mirror-workspaces/dc-lab/pull-secret.merged.json \ docker://mirror-registry.apps.sub.comptech-lab.com/<path>@sha256:<digest> -
Check ADR / governance. If the bump is non-trivial (major version, new CRDs, changed defaults), the issue should cite the governing ADR set.
-
Open the GitHub issue. Branch prefix
op-bump/<operator>-<from>-to-<to>.
The change
Step 1 — Mirror the new images (if needed)
If the new CSV is not in the local catalog yet, refresh the mirror:
# On the mirror VM:
ssh ze@oc-mirror.sub.comptech-lab.com
cd /home/ze/ocp-mirror-workspaces/dc-lab
# Edit imageset-config.yaml to add or bump the operator entry.
# Bump the version under operators.packages.<name>.versions.
# Dry-run first:
RUN_ID="$(date -u +%Y%m%dT%H%M%S)"
oc mirror \
--config imageset-config.yaml \
--workspace "file:///home/ze/ocp-mirror-workspaces/dc-lab/dryrun-$RUN_ID" \
--dry-run \
docker://mirror-registry.apps.sub.comptech-lab.com \
2>&1 | tee "logs/oc-mirror-dryrun-$RUN_ID.log"
wc -l "dryrun-$RUN_ID/working-dir/dry-run/mapping.txt"
# Review the mapping count, then run for real:
tmux new -s oc-mirror-op-bump
./tools/run-oc-mirror-fast.sh
After the run completes, capture the refreshed catalog image digests from the generated resources:
ls /home/ze/ocp-mirror-workspaces/dc-lab/full-workspace/working-dir/cluster-resources/ \
| grep -E "cs-redhat-operator-index-v4-20|cs-certified-operator-index-v4-20"
Step 2 — Update GitOps
Branch and edit in the working copy:
cd /home/ze/ops-workspace/clones/platform-gitops
git checkout main
git pull --ff-only origin main
git checkout -b op-bump/<operator>-<from>-to-<to>
Edit clusters/<cluster>/operators/<name>/subscription.yaml:
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: <subscription-name>
namespace: <operator-namespace>
annotations:
argocd.argoproj.io/sync-wave: "10"
spec:
channel: <channel> # may also change with the bump
installPlanApproval: Automatic
name: <package-name>
source: cs-redhat-operator-index-v4-20
sourceNamespace: openshift-marketplace
startingCSV: <package>.v<new-version> # the pinned new CSV
If the catalog image digest changed (step 1 above), also update clusters/<cluster>/platform/catalogs/catalogsource-redhat-operator-index-v4-20.yaml (and the certified equivalent) to pin the refreshed digest:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: cs-redhat-operator-index-v4-20
namespace: openshift-marketplace
spec:
sourceType: grpc
image: mirror-registry.apps.sub.comptech-lab.com/<path>@sha256:<new-digest>
displayName: Red Hat Operators (mirror v4.20)
publisher: Red Hat (mirror)
If the new CSV needs an RBAC group not currently in argocd-platform-extensions, extend the ClusterRole at sync-wave 0 in the same MR or in a sequenced preceding MR. See Spoke RBAC extension memory.
Update plans/disconnected-rebuild/environments/dc-lab/operator-version-lock.md with the new pin (in a separate hunk inside the same MR for the version-lock file lives in opp-full-plat, not platform-gitops — this is a cross-repo update; sequence the MRs).
Step 3 — Validate the build
cd /home/ze/ops-workspace/clones/platform-gitops
kubectl kustomize clusters/hub-dc-v6 > /dev/null
kubectl kustomize clusters/spoke-dc-v6 > /dev/null
If kustomize errors, stop and fix before committing.
Optional but recommended: server-side dry-run on a kubeconfig:
oc --kubeconfig "$K" apply --server-side --dry-run=server \
-k clusters/spoke-dc-v6
Step 4 — Commit + open MR
Commit with the tracking-header block (per MR mechanics):
op-bump: bump <operator> <from> -> <to> on <cluster> (#<n>)
Issue: #<n> op-bump
Milestone: <milestone> (#<m>)
Phase: <phase>
ADRs: 0018, 0019
<rationale: why this version, what CVE/feature/dep>
<files changed>
Validation:
- Subscription resolves to InstallPlan; InstallPlan moves to Complete
- CSV phase Succeeded
- Operand pods Ready
- oc get co clean
- No new external image refs (image-supply drift script clean)
Rollback:
- Revert this MR; let Argo re-sync to the previous Subscription pin
- If operands rolled (MCP-touching), wait for MCP rollback before validating
Push and open the MR via the GitLab API (per MR mechanics).
Step 5 — Watch Argo apply
oc --kubeconfig "$K" -n openshift-gitops get app spoke-dc-v6-cluster-config \
-o jsonpath='{.status.sync.status}{" "}{.status.health.status}{" "}{.status.sync.revision}{"\n"}'
Expected: Synced Healthy <merge-sha>.
Step 6 — Watch the operator bump itself
oc --kubeconfig "$K" -n <operator-ns> get sub,installplan,csv -o wide
Expected progression:
Subscriptionshowsstate=AtLatestKnown,currentCSV=<new>.- A new
InstallPlanfor<new>is created and moves toComplete. - The new
CSVmoves fromInstallingtoSucceeded. - The operator pod restarts on the new image; new operand pods (if applicable) come up.
If the operator ships MachineConfigs, MCP rollout begins:
oc --kubeconfig "$K" get mcp -w
Wait for UPDATED=True / UPDATING=False / DEGRADED=False / readyMachineCount==machineCount on every affected pool. This is usually 30 minutes per pool on a 3-node spoke; canary with master first if the pool selector includes both.
Validation
A bump is complete when all of the following are true:
- Argo
ApplicationshowsSynced / Healthyat the merge SHA. SubscriptionisAtLatestKnownwithcurrentCSV=<new>.CSVfor the new version isSucceeded.- Operand pods are
Runningwith no recent restarts. oc get co | awk 'NR==1 || $3 != "True" || $4 != "False" || $5 != "False"'is clean.oc get mcpshows every poolUPDATED=True / UPDATING=False / DEGRADED=False(where MCP-touching).- The image-supply drift script reports zero uncovered external runtime references.
- The CHANGELOG entry on
platform-gitopsdescribes the bump. operator-version-lock.mdonopp-full-platreflects the new pin.- The session report under
reports/sessions/captures the bump’s evidence.
Skipping step 7 is how the image-supply silently drifts — every bump must re-validate the drift posture.
Prevention
- Use the operator-version-lock.md as the source of truth. Never rely on
installPlanApproval: Manualand a human reviewer to gate auto-upgrades; gate via the explicitstartingCSVpin in GitOps. - Sequence catalog-digest-pin changes ahead of CSV bumps when both are needed. The catalog pin must be in effect before the Subscription resolves to the new CSV.
- Canary on one cluster first when multiple clusters run the same operator. The fleet has two clusters; bumping on
spoke-dc-v6first and waiting a session before bumping onhub-dc-v6(or vice versa) catches operator-version regressions cheaply. - For SigNoz, re-discover the API contract before bumping past the v0.122 baseline. The v0.121 -> v0.122 break (see SigNoz auth runbook) is precedent. SigNoz is not an OpenShift operator but the same pattern applies for VM-hosted services.
Forbidden actions
- Re-enabling default external OperatorHub CatalogSources to “see” a new CSV. The mirror is the source.
- Bumping two operators in the same MR.
- Bumping without updating
operator-version-lock.md. - Using
installPlanApproval: Manualas a gate. Use explicitstartingCSVpins. - Bumping during an active MCP rollout window for another change.
References
opp-full-plat/connection-details/platform-admin-handoff.md§“Operator Install Workflow” + §“Catalog Refresh Workflow”opp-full-plat/plans/disconnected-rebuild/environments/dc-lab/operator-version-lock.mdopp-full-plat/plans/disconnected-rebuild/environments/dc-lab/imageset-config.yamlopp-full-plat/adr/0018-acm-openshift-gitops-pull-model-v6.md,0019-nexus-only-image-supply-chain.md- Issues: #137 (hub catalog capture), #229 (this section)