Blob Stores, Lifecycle, and TLS
How Nexus blob storage, cleanup policies, hosted-repo retention, and the HAProxy TLS termination model fit together to keep the lab image supply chain reproducible.
This page covers what sits underneath the three Docker endpoints — the Nexus blob store, the per-repo cleanup policies that govern retention, the immutability rules that protect already-published tags, and the HAProxy TLS termination model that wraps everything. None of this is exposed to clients directly; clients see only the three hostnames in the three-endpoint split. But every operational decision about disk pressure, retention, image immutability, certificate rotation, and replication ends up here.
Blob storage
Sonatype Nexus 3 stores every Docker layer, manifest, image config, and metadata blob in a blob store. By default a fresh Nexus has one filesystem-backed blob store called default; the lab Nexus runs the standard single blob store for all Docker repositories.
| Concern | Setting |
|---|---|
| Blob store | default (filesystem) |
| Backing path | Dedicated data disk on the nexus-mirror VM, mounted under the Nexus work dir |
| Repositories sharing the store | ocp-mirror, docker-dev-hosted, icr-proxy, redhat-proxy, dockerhub-proxy |
| Deduplication | Content-addressable: identical layer blobs across repos cost disk once |
| Hashing | SHA-256 (Docker Registry v2 standard) |
| Compaction | Nightly “Compact blob store” task |
| Soft delete | Yes — blobs are marked deleted, then physically removed by compaction |
Operational implication. A pull-through proxy fetching the same UBI minimal layer that a hosted image uses adds zero disk; the layer is stored once. This is also why a 30-day cleanup on docker-dev-hosted does not always free obvious amounts of disk — if a layer is still referenced by an unrelated repo (or by an unrelated tag in the same repo), the underlying blob stays.
The compaction task is what physically reclaims disk. Run it nightly. Watch the data disk’s used percent; if compaction is not keeping up, the policy tuning is the lever, not adding disk.
Cleanup policies
Nexus cleanup policies are independent objects attached to repositories. Each policy specifies a time-based and/or size-based condition under which a tag becomes eligible for deletion. The cleanup task runs the policy and tombstones tags; compaction physically frees the blobs.
The live policies and attachments on the lab Nexus (2026-05-09):
| Repository | Cleanup policy | Approximate semantic | Attachment scope |
|---|---|---|---|
docker-dev-hosted | docker-dev-hosted-retain-30d | Delete tags not pulled in 30 days | The hosted repo only (1 attachment) |
icr-proxy | docker-proxy-retain-14d | Delete cached layers not used in 14 days | Proxy member (1 of 3 proxy attachments) |
redhat-proxy | docker-proxy-retain-14d | Same semantic, different proxy | 2 of 3 |
dockerhub-proxy | docker-proxy-retain-14d | Same semantic, different proxy | 3 of 3 |
ocp-mirror | (none) | Platform invariant — never auto-prune | — |
The proxy policy keeps the cache fresh enough that mutable upstream tags don’t permanently shadow new releases; the hosted policy keeps stale dev builds from accumulating indefinitely while letting actively-pulled tags persist.
The reason ocp-mirror has no policy: any digest a cluster might pull during install or operand reconciliation must remain in the mirror until a tracked decision removes it. Aging an OpenShift release image out because nothing pulled it for 30 days would break the next cluster install in a disconnected lab. Pruning is manual, planned, and tied to release/operator retirement.
Immutability
The hosted Docker repository docker-dev-hosted is configured with Allow redeploy = false. Practical effect:
- A tag like
:build-8cannot be overwritten by a later push of the same name. - Re-tagging on the client side fails at the Nexus push step.
- Tag history is therefore append-only and content-addressable.
The same setting applies in spirit to ocp-mirror, where oc mirror’s own behavior plus the convention of pulling by digest provides equivalent immutability at the consumer side.
This is the property that makes the runtime path safe. A pod referencing app-registry.apps.sub.comptech-lab.com/smoke/readiness-probe:build-8 is referencing one specific manifest digest at the time of push and forever after. There is no class of failure where today’s :build-8 pull differs from yesterday’s.
TLS and certificate management
All three Docker endpoints — and the Nexus UI/API — terminate TLS at the HAProxy edge VM, not at Nexus. The Nexus VM speaks plain HTTP on its local ports (5000, 5001, 5002, 8081); HAProxy decrypts incoming TLS and forwards plain HTTP over the lab /24 to those ports.
| Concern | Setting |
|---|---|
| Certificate scope | *.apps.sub.comptech-lab.com (LE wildcard) |
| Issuer | Let’s Encrypt (ACME DNS-01) |
| Renewal | acme.sh cron on the pdns VM; copies into HAProxy certs dir |
| HAProxy bind | Private bind on lab /24, port 443, SNI-multiplexed |
| Nexus listener | Plain HTTP on the VM private network |
| Mutual TLS | None — HAProxy presents the cert, clients verify |
The certificate covers all four Nexus-exposed hostnames (nexus-mirror, mirror-registry, docker-group, app-registry) because they all live under *.apps.sub.comptech-lab.com. A single LE cert renewal updates all four endpoints simultaneously.
The choice to terminate at HAProxy rather than at Nexus matters:
- Single cert source of truth. One wildcard, one renewal job, one trust chain on every client.
- Nexus internals stay simple. No cert-management plugin, no Nexus-side renewal automation, no Nexus restart cadence tied to cert rotation.
- HAProxy can do SNI splits. The three Docker hostnames terminate on the same
127.0.0.1:8443SNI plane and route to different Nexus ports based on host header.
Renewal is the standard acme.sh --dns dns_pdns flow on the pdns VM (the pdns VM owns the DNS-01 challenge because it is the authoritative DNS server). The renewed cert is then copied into /etc/haproxy/certs/wildcard-apps.pem and HAProxy is reloaded.
What is not HAProxy-fronted
- The OTLP ports of SigNoz (
4317,4318). - Direct Nexus port debug access (
5000,5001,5002,8081on the VM hostname). - Any internal-only API surface that exists only for VM-side debugging.
This is per the lab feedback_haproxy_scope.md directive: HAProxy is for platform VM edge exposure only. Direct ports are reachable on the lab /24 for emergencies but not exposed via the LE wildcard plane.
Replication and backup
The current state on nexus-mirror:
- No active Nexus-side replication to a second Nexus instance.
- Backup posture is unresolved as a tracked item (parallel to the Jenkins backup unresolved-state per
connection-details/jenkins.md). - Effective recovery today comes from the fact that the platform mirror content can be regenerated from upstream by re-running
oc mirror --v2against the same imageset config, and developer/app images can be rebuilt from Git.
That’s a real recovery path but not a fast one. The roadmap items called out in ADR 0019 and related milestones include:
- Tested Nexus backup-and-restore drill (snapshot of the data disk + Nexus DB + blob store).
- Decision on whether to mirror
docker-dev-hostedto a second hosted repo for DR. - Decision on whether
ocp-mirrorcontent should be replicated byoc mirroritself (a “third mirror” semantic) or by a snapshot-only approach.
Until those land, treat Nexus as a single point of failure for the image supply chain. Mitigate by keeping the bootstrap host fully reproducible, the Jenkins job definitions in Git, and the imageset config under platform GitOps.
Operational tasks
A short list of Nexus tasks that should be enabled and scheduled:
| Task | Cadence | Purpose |
|---|---|---|
| Compact blob store | Nightly | Reclaim disk after policy-driven tag deletion |
| Cleanup policies | Daily | Apply per-repo retention; tombstone tags |
| Delete unused manifests | Weekly | Sweep manifests no tag references |
| Rebuild Docker repository metadata | On-demand | After bulk import/cleanup |
| Health check | Continuous | Internal Nexus self-check; surface in monitoring |
The full Nexus task schedule belongs in the Nexus operator runbook; this page records it as a checklist so the lifecycle picture is complete.
Disk pressure playbook
If the Nexus data disk approaches its high-water mark:
- Confirm compaction is running. A blocked compaction task is the #1 reason disk doesn’t drop after cleanup runs.
- Check
docker-dev-hostedfor stale tags. A team that pushed:build-Ncontinuously for weeks without anything pulling them will hit the 30-day policy in a wave. Force a cleanup pass. - Check proxy cache size. If
dockerhub-proxyis hoarding incidental pulls, the 14-day policy plus a one-shot manual cleanup can reclaim significant space. - Do not touch
ocp-mirrorto reclaim space. That’s platform content; reclaiming it triggers cluster pull failures. Add disk instead. - Audit blob orphans. Nexus’s “delete unused manifests” plus compaction usually cleans these up, but a long-skipped cycle can leave them.
Failure modes
Symptom: cleanup policy attached but tags aren’t being deleted
Root cause. Cleanup task isn’t enabled, or the policy condition is satisfied (last-pulled date is recent because something is pulling the tag regularly), or compaction has not yet run.
Fix. Confirm the cleanup task is enabled and has been running. Inspect the per-tag last-downloaded metadata to confirm whether the policy condition is actually met. If everything is in order but disk still hasn’t dropped, run compaction manually.
Prevention. Monitoring on Nexus task success/failure; alerting on stalled cleanup tasks.
Symptom: client gets a TLS error from one of the Docker hostnames
Root cause. Either HAProxy’s cert is expired (acme.sh renewal failed silently), or the client’s trust store doesn’t include the LE root chain (rare on modern OSes), or the HAProxy backend is rewriting headers in a way that mismatches the SNI.
Fix. Verify cert validity (openssl s_client -connect); check pdns VM acme.sh log; reload HAProxy after confirming the renewed cert is in place.
Prevention. Monitor cert expiry; alert at 14 days before expiry. The renewal job is what matters — if it fails three times in a row, page.
Symptom: a tag exists in Nexus but pull returns 404
Root cause. Manifest exists, blob does not (compaction got ahead of manifest deletion; this is the soft-delete window). Or the proxy member is failing upstream and Nexus is returning a stale negative.
Fix. For the hosted-repo case, re-push (build was likely truly deleted). For the proxy case, invalidate the proxy cache for the affected name and retry.
Prevention. Don’t manually delete blobs. Use task-driven deletion only.
Symptom: HAProxy is reachable but Nexus says 502 Bad Gateway
Root cause. Nexus is restarting or the JVM is under heap pressure. The Nexus systemd unit may be cycling.
Fix. SSH to the Nexus VM, systemctl status nexus, check JVM logs, increase Xmx if pressure is real, restart cleanly.
Prevention. Monitor Nexus JVM heap, response-time percentiles, and request-rate against baseline. The lab Nexus is sized for current load; scale before adding new heavy proxies (a new active dockerhub-style proxy can double request volume).
References
opp-full-plat/connection-details/nexus.md— Current Service, Docker Group Exposure, cleanup-policy table.opp-full-plat/adr/0019-nexus-only-image-supply-chain.md— supply-chain framing.- HAProxy edge VM page (Section 2A) — cert renewal flow, SNI termination model.
- pdns VM page (Section 2A) — DNS-01 ACME automation.