Blob Stores, Lifecycle, and TLS

How Nexus blob storage, cleanup policies, hosted-repo retention, and the HAProxy TLS termination model fit together to keep the lab image supply chain reproducible.

This page covers what sits underneath the three Docker endpoints — the Nexus blob store, the per-repo cleanup policies that govern retention, the immutability rules that protect already-published tags, and the HAProxy TLS termination model that wraps everything. None of this is exposed to clients directly; clients see only the three hostnames in the three-endpoint split. But every operational decision about disk pressure, retention, image immutability, certificate rotation, and replication ends up here.

Blob storage

Sonatype Nexus 3 stores every Docker layer, manifest, image config, and metadata blob in a blob store. By default a fresh Nexus has one filesystem-backed blob store called default; the lab Nexus runs the standard single blob store for all Docker repositories.

ConcernSetting
Blob storedefault (filesystem)
Backing pathDedicated data disk on the nexus-mirror VM, mounted under the Nexus work dir
Repositories sharing the storeocp-mirror, docker-dev-hosted, icr-proxy, redhat-proxy, dockerhub-proxy
DeduplicationContent-addressable: identical layer blobs across repos cost disk once
HashingSHA-256 (Docker Registry v2 standard)
CompactionNightly “Compact blob store” task
Soft deleteYes — blobs are marked deleted, then physically removed by compaction

Operational implication. A pull-through proxy fetching the same UBI minimal layer that a hosted image uses adds zero disk; the layer is stored once. This is also why a 30-day cleanup on docker-dev-hosted does not always free obvious amounts of disk — if a layer is still referenced by an unrelated repo (or by an unrelated tag in the same repo), the underlying blob stays.

The compaction task is what physically reclaims disk. Run it nightly. Watch the data disk’s used percent; if compaction is not keeping up, the policy tuning is the lever, not adding disk.

Cleanup policies

Nexus cleanup policies are independent objects attached to repositories. Each policy specifies a time-based and/or size-based condition under which a tag becomes eligible for deletion. The cleanup task runs the policy and tombstones tags; compaction physically frees the blobs.

The live policies and attachments on the lab Nexus (2026-05-09):

RepositoryCleanup policyApproximate semanticAttachment scope
docker-dev-hosteddocker-dev-hosted-retain-30dDelete tags not pulled in 30 daysThe hosted repo only (1 attachment)
icr-proxydocker-proxy-retain-14dDelete cached layers not used in 14 daysProxy member (1 of 3 proxy attachments)
redhat-proxydocker-proxy-retain-14dSame semantic, different proxy2 of 3
dockerhub-proxydocker-proxy-retain-14dSame semantic, different proxy3 of 3
ocp-mirror(none)Platform invariant — never auto-prune

The proxy policy keeps the cache fresh enough that mutable upstream tags don’t permanently shadow new releases; the hosted policy keeps stale dev builds from accumulating indefinitely while letting actively-pulled tags persist.

The reason ocp-mirror has no policy: any digest a cluster might pull during install or operand reconciliation must remain in the mirror until a tracked decision removes it. Aging an OpenShift release image out because nothing pulled it for 30 days would break the next cluster install in a disconnected lab. Pruning is manual, planned, and tied to release/operator retirement.

Immutability

The hosted Docker repository docker-dev-hosted is configured with Allow redeploy = false. Practical effect:

  • A tag like :build-8 cannot be overwritten by a later push of the same name.
  • Re-tagging on the client side fails at the Nexus push step.
  • Tag history is therefore append-only and content-addressable.

The same setting applies in spirit to ocp-mirror, where oc mirror’s own behavior plus the convention of pulling by digest provides equivalent immutability at the consumer side.

This is the property that makes the runtime path safe. A pod referencing app-registry.apps.sub.comptech-lab.com/smoke/readiness-probe:build-8 is referencing one specific manifest digest at the time of push and forever after. There is no class of failure where today’s :build-8 pull differs from yesterday’s.

TLS and certificate management

All three Docker endpoints — and the Nexus UI/API — terminate TLS at the HAProxy edge VM, not at Nexus. The Nexus VM speaks plain HTTP on its local ports (5000, 5001, 5002, 8081); HAProxy decrypts incoming TLS and forwards plain HTTP over the lab /24 to those ports.

ConcernSetting
Certificate scope*.apps.sub.comptech-lab.com (LE wildcard)
IssuerLet’s Encrypt (ACME DNS-01)
Renewalacme.sh cron on the pdns VM; copies into HAProxy certs dir
HAProxy bindPrivate bind on lab /24, port 443, SNI-multiplexed
Nexus listenerPlain HTTP on the VM private network
Mutual TLSNone — HAProxy presents the cert, clients verify

The certificate covers all four Nexus-exposed hostnames (nexus-mirror, mirror-registry, docker-group, app-registry) because they all live under *.apps.sub.comptech-lab.com. A single LE cert renewal updates all four endpoints simultaneously.

The choice to terminate at HAProxy rather than at Nexus matters:

  • Single cert source of truth. One wildcard, one renewal job, one trust chain on every client.
  • Nexus internals stay simple. No cert-management plugin, no Nexus-side renewal automation, no Nexus restart cadence tied to cert rotation.
  • HAProxy can do SNI splits. The three Docker hostnames terminate on the same 127.0.0.1:8443 SNI plane and route to different Nexus ports based on host header.

Renewal is the standard acme.sh --dns dns_pdns flow on the pdns VM (the pdns VM owns the DNS-01 challenge because it is the authoritative DNS server). The renewed cert is then copied into /etc/haproxy/certs/wildcard-apps.pem and HAProxy is reloaded.

What is not HAProxy-fronted

  • The OTLP ports of SigNoz (4317, 4318).
  • Direct Nexus port debug access (5000, 5001, 5002, 8081 on the VM hostname).
  • Any internal-only API surface that exists only for VM-side debugging.

This is per the lab feedback_haproxy_scope.md directive: HAProxy is for platform VM edge exposure only. Direct ports are reachable on the lab /24 for emergencies but not exposed via the LE wildcard plane.

Replication and backup

The current state on nexus-mirror:

  • No active Nexus-side replication to a second Nexus instance.
  • Backup posture is unresolved as a tracked item (parallel to the Jenkins backup unresolved-state per connection-details/jenkins.md).
  • Effective recovery today comes from the fact that the platform mirror content can be regenerated from upstream by re-running oc mirror --v2 against the same imageset config, and developer/app images can be rebuilt from Git.

That’s a real recovery path but not a fast one. The roadmap items called out in ADR 0019 and related milestones include:

  • Tested Nexus backup-and-restore drill (snapshot of the data disk + Nexus DB + blob store).
  • Decision on whether to mirror docker-dev-hosted to a second hosted repo for DR.
  • Decision on whether ocp-mirror content should be replicated by oc mirror itself (a “third mirror” semantic) or by a snapshot-only approach.

Until those land, treat Nexus as a single point of failure for the image supply chain. Mitigate by keeping the bootstrap host fully reproducible, the Jenkins job definitions in Git, and the imageset config under platform GitOps.

Operational tasks

A short list of Nexus tasks that should be enabled and scheduled:

TaskCadencePurpose
Compact blob storeNightlyReclaim disk after policy-driven tag deletion
Cleanup policiesDailyApply per-repo retention; tombstone tags
Delete unused manifestsWeeklySweep manifests no tag references
Rebuild Docker repository metadataOn-demandAfter bulk import/cleanup
Health checkContinuousInternal Nexus self-check; surface in monitoring

The full Nexus task schedule belongs in the Nexus operator runbook; this page records it as a checklist so the lifecycle picture is complete.

Disk pressure playbook

If the Nexus data disk approaches its high-water mark:

  1. Confirm compaction is running. A blocked compaction task is the #1 reason disk doesn’t drop after cleanup runs.
  2. Check docker-dev-hosted for stale tags. A team that pushed :build-N continuously for weeks without anything pulling them will hit the 30-day policy in a wave. Force a cleanup pass.
  3. Check proxy cache size. If dockerhub-proxy is hoarding incidental pulls, the 14-day policy plus a one-shot manual cleanup can reclaim significant space.
  4. Do not touch ocp-mirror to reclaim space. That’s platform content; reclaiming it triggers cluster pull failures. Add disk instead.
  5. Audit blob orphans. Nexus’s “delete unused manifests” plus compaction usually cleans these up, but a long-skipped cycle can leave them.

Failure modes

Symptom: cleanup policy attached but tags aren’t being deleted

Root cause. Cleanup task isn’t enabled, or the policy condition is satisfied (last-pulled date is recent because something is pulling the tag regularly), or compaction has not yet run.

Fix. Confirm the cleanup task is enabled and has been running. Inspect the per-tag last-downloaded metadata to confirm whether the policy condition is actually met. If everything is in order but disk still hasn’t dropped, run compaction manually.

Prevention. Monitoring on Nexus task success/failure; alerting on stalled cleanup tasks.

Symptom: client gets a TLS error from one of the Docker hostnames

Root cause. Either HAProxy’s cert is expired (acme.sh renewal failed silently), or the client’s trust store doesn’t include the LE root chain (rare on modern OSes), or the HAProxy backend is rewriting headers in a way that mismatches the SNI.

Fix. Verify cert validity (openssl s_client -connect); check pdns VM acme.sh log; reload HAProxy after confirming the renewed cert is in place.

Prevention. Monitor cert expiry; alert at 14 days before expiry. The renewal job is what matters — if it fails three times in a row, page.

Symptom: a tag exists in Nexus but pull returns 404

Root cause. Manifest exists, blob does not (compaction got ahead of manifest deletion; this is the soft-delete window). Or the proxy member is failing upstream and Nexus is returning a stale negative.

Fix. For the hosted-repo case, re-push (build was likely truly deleted). For the proxy case, invalidate the proxy cache for the affected name and retry.

Prevention. Don’t manually delete blobs. Use task-driven deletion only.

Symptom: HAProxy is reachable but Nexus says 502 Bad Gateway

Root cause. Nexus is restarting or the JVM is under heap pressure. The Nexus systemd unit may be cycling.

Fix. SSH to the Nexus VM, systemctl status nexus, check JVM logs, increase Xmx if pressure is real, restart cleanly.

Prevention. Monitor Nexus JVM heap, response-time percentiles, and request-rate against baseline. The lab Nexus is sized for current load; scale before adding new heavy proxies (a new active dockerhub-style proxy can double request volume).

References

  • opp-full-plat/connection-details/nexus.md — Current Service, Docker Group Exposure, cleanup-policy table.
  • opp-full-plat/adr/0019-nexus-only-image-supply-chain.md — supply-chain framing.
  • HAProxy edge VM page (Section 2A) — cert renewal flow, SNI termination model.
  • pdns VM page (Section 2A) — DNS-01 ACME automation.

Last reviewed: 2026-05-11