Hardening pass on the original 14 slices
Pin every floating tag by content-addressable digest, make SETUP.md re-runnable end to end, and add a wso2is-data named volume so DCR clients survive recreates. Companion commits a0bd0701 + fb0e48a2 in the insurance-app repo.
The first fourteen slices built a working stack and accumulated three
classes of papercut along the way: floating image tags that drifted
when their upstream re-tagged, SETUP.md commands that assumed a
clean-slate VM and broke if you re-ran them, and a stateful container
(WSO2 IS) whose H2 database lived inside the container layer and
disappeared on every --force-recreate.
None of this is sexy. All of it is the difference between “the classroom track runs reliably on a new student’s VM” and “the student emails the instructor at 22:00 because their wso2is is rebooting in a loop.” This chapter is what closed that gap.
Companion commits: a0bd0701 (volume + image pinning) and fb0e48a2
(SETUP.md idempotency + final pinning).
Image pinning by digest
Floating tags like :latest and :7-alpine are convenient until they
aren’t. The day the upstream re-tags redisinsight:latest with a
breaking change, every fresh VM build now fails — and the failure mode
is “it worked yesterday on a different VM,” which is the hardest kind
to debug.
The fix is to pin every tagged image by its content-addressable digest (the SHA256). Once pinned, podman pulls the exact same bytes forever, regardless of what the upstream re-tags. Five floating-tag admin UIs got the treatment:
# Before
podman run -d --name adminer docker.io/library/adminer
podman run -d --name redisinsight docker.io/redis/redisinsight
podman run -d --name kafka-ui docker.io/provectuslabs/kafka-ui
podman run -d --name minio docker.io/minio/minio:latest server /data
podman run -d --name mailpit docker.io/axllent/mailpit
# After
podman run -d --name adminer docker.io/library/adminer@sha256:cc64d253...
podman run -d --name redisinsight docker.io/redis/redisinsight@sha256:85562d67...
podman run -d --name kafka-ui docker.io/provectuslabs/kafka-ui@sha256:8f2ff02d...
podman run -d --name minio docker.io/minio/minio@sha256:14cea493...
podman run -d --name mailpit docker.io/axllent/mailpit@sha256:0059ef81...
The digests come from the currently-running, 144/0-green stack —
running podman inspect <name> --format '{{.ImageDigest}}' on each
container captured the SHA of exactly what the smoke had been passing
against.
A banner sits above the first pinned line in SETUP.md:
These digests are pinned to the smoke-validated image. To bump:
podman pull <repo>:<tag>interactively, capture the new digest withpodman inspect --format '{{.RepoDigests}}', replace below, re-run the smoke. Do NOT re-pin without a green smoke run.
The note is the load-bearing part. A pinned digest with no rebump ritual rots silently when the upstream stops serving the original SHA. The note makes the ritual explicit.
SETUP.md idempotency pass
A SETUP.md command that errors on a re-run trains students to bail at the first error instead of restarting from a known step. Every re-runnable command got a guard. The patterns to copy:
# qcow2 disk copy: only copy if missing
[ ! -f "/var/lib/libvirt/images/insurance-app.qcow2" ] && \
cp /var/lib/libvirt/images/rhel9-base.qcow2 \
/var/lib/libvirt/images/insurance-app.qcow2
# virt-install: skip if the domain already exists
virsh dominfo insurance-app >/dev/null 2>&1 || \
virt-install --name insurance-app --memory 16384 ...
# SSH key generation: only if no key present
[ -f ~/.ssh/id_ed25519 ] || ssh-keygen -t ed25519 -N '' -f ~/.ssh/id_ed25519
# Git clone: only if directory missing
[ -d ~/insurance-app ] || git clone https://github.com/zeshaq/insurance-app ~/insurance-app
# Podman network: ignore "already exists"
podman network create insurance-net 2>/dev/null || true
# SigNoz git clone + sed: clone-once, replay sed from a backup
[ -d ~/signoz ] || git clone https://github.com/SigNoz/signoz ~/signoz
cp ~/signoz/deploy/docker/clickhouse-config.xml.bak ~/signoz/deploy/docker/clickhouse-config.xml 2>/dev/null \
|| cp ~/signoz/deploy/docker/clickhouse-config.xml ~/signoz/deploy/docker/clickhouse-config.xml.bak
sed -i 's/old/new/' ~/signoz/deploy/docker/clickhouse-config.xml
# Cloudflare DNS: list-then-PUT-or-POST (idempotent upsert)
EXISTING=$(curl -sS -H "Authorization: Bearer $TOKEN" \
"https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records?type=A&name=$FQDN" \
| jq -r '.result[0].id // empty')
if [ -n "$EXISTING" ]; then
curl -X PUT .../dns_records/$EXISTING -d "$BODY"
else
curl -X POST .../dns_records -d "$BODY"
fi
# Certbot: --keep-until-expiring is the no-op-if-fresh flag
certbot certonly --dns-cloudflare ... --keep-until-expiring -d "*.insurance-app.comptech-lab.com"
The principle to copy across any runbook: read the existing state
first, take action only if needed. mkdir -p, [ -f ] guards,
upsert patterns, idempotent install flags. Every one of those is
something the student doesn’t have to think about when they restart
from where they got stuck.
The wso2is-data named volume
WSO2 IS ships a wso2is:7.0.0 image that initializes an H2 database
on first boot. By default that database lives inside the container’s
writable layer at /home/wso2carbon/wso2is-7.0.0/repository/database/.
Any podman run --replace, --force-recreate, or podman rm -f
wipes it — and with it, every DCR-registered OIDC client, every
SCIM2-provisioned user, every admin role assignment.
Discovering this is a rite of passage. After the second time you’ve re-registered the customer-app DCR client because someone bumped the WSO2 IS container during a smoke debug, you fix it:
# Create the volume once (idempotent)
podman volume create wso2is-data 2>/dev/null || true
# Mount it on the IS container
podman run -d --replace --name wso2is --network insurance-net \
-p 9444:9443 -p 9763:9763 \
-v wso2is-data:/home/wso2carbon/wso2is-7.0.0/repository/database \
docker.io/wso2/wso2is:7.0.0
Now podman rm -f wso2is && podman run ... keeps the H2 database
intact. DCR clients survive. SCIM users survive. The IS admin
password (changed away from admin/admin?) survives.
The migration cost is one-time DCR re-registration: the original
H2 snapshot was taken from a running IS, so when H2 re-initialized on
first read of the mounted volume, it created a fresh database. Bumped
the server.xml basicRegistry user to the new client_id, rewrote
.wso2is-creds, moved on.
Future container recreates should stop wso2is first so H2 flushes cleanly:
podman stop wso2is
podman rm wso2is
podman run ... (with the -v wso2is-data:... line)
stop then rm gives H2’s writer a chance to flush in-flight pages
before the volume gets re-mounted by a new container.
Why these specific things
The temptation in a hardening pass is to do everything: scan images, add resource limits, enforce read-only root filesystems, etc. The choice to stop at three was deliberate. Each of these was a real incident in a real classroom session:
- A student’s adminer container started crashing during a section
6 demo because
:latestgot re-tagged with a binary that needed a newer libc than the alpine base had. - A student got partway through SETUP.md, hit a transient network error during the SigNoz clone, restarted from “Phase 4”, and the sed-into-clickhouse-config got applied twice.
- A student following the runbook restarted wso2is after changing a port mapping, lost all DCR clients, and went hunting for “why does Liberty’s mpJwt validation fail now” for an hour.
The hardening pass closes those three specific failure modes. The suite went 144 → 198 → 205 across these chapters, and each addition captured the verify path for a real student-visible behavior.
Verify
# Image digests are pinned — re-pulling fetches the same SHA
podman pull docker.io/library/adminer@sha256:cc64d253...
podman inspect docker.io/library/adminer@sha256:cc64d253... --format '{{.Id}}'
# matches
# SETUP.md re-runs cleanly on an already-built VM
cd ~/insurance-app && bash -n SETUP.md && \
for step in $(grep -oP '(?<=^```)bash$' SETUP.md | head -20); do
echo "block ok"
done
# (manual: walk a couple of phases on a VM with the stack already running)
# wso2is volume mounted — DCR client survives recreate
podman exec wso2is sh -c 'ls /home/wso2carbon/wso2is-7.0.0/repository/database'
podman rm -f wso2is && podman run ... # with the volume mount
podman exec wso2is sh -c 'ls /home/wso2carbon/wso2is-7.0.0/repository/database'
# both lists show the same .h2.db files
What you have
- Every floating-tag image pinned by digest, with a documented re-pin ritual.
- A SETUP.md you can interrupt at any phase and restart from without manual cleanup.
- A
wso2is-datanamed volume so DCR clients and SCIM users survive container recreates. - The smoke suite hitting 205/0 across all the slices in the track.