Credential Custody Rules
Where credentials live, where they must never go, how the per-division Vault tenancy model works, and the ESO bridge that delivers Vault secrets and OBC-shaped object-store credentials into the right Kubernetes Secret on the right cluster.
This page is the canonical rule set for handling credentials across the CompTech v6 fleet — where they live, where they get delivered to, how they cross trust boundaries, and the small handful of cases where a credential is allowed to exist as a one-shot bootstrap artifact rather than a Vault-managed resource. It applies to platform operators and application developers equally.
The single most important sentence on this page: credentials are never committed to Git, never pasted into issues or chat, never printed in shared logs. Everything else is implementation detail.
The custody map
There are four custody locations and one forbidden zone:
- Vault (yellow) — the canonical custody for application secrets and most platform secrets. Per-division tenancy under
secret/apps/<division>/<app>/.... Sealed at rest, audited, role-scoped per cluster + division. - Cluster htpasswd Secret (gray) — for the small set of admin credentials that pre-date Vault availability or that bootstrap Vault-adjacent systems. Quay admin and ACS Central admin live here today. The Secret name in the cluster is the contract; the value is never extracted to disk except for one-shot bootstrap.
- Local
secrets/directory (red, restricted) — one-shot bootstrap material on the operator workstation only. Git-ignored, mode-restricted. Values flow into Vault or into a cluster Secret once during bootstrap and the local file becomes a recovery artifact, not a live credential surface. - ESO (green dashed) — the delivery mechanism. Reads from Vault (or from existing Kubernetes Secrets via the
kubernetesprovider) and materializes target Secrets in the right namespace on the right cluster. There are two ESO surfaces: a platform-scopedClusterSecretStore vault-clusterand one per-tenantSecretStore vault-appsinside each tenant namespace. - Git (red, forbidden) — credential values never appear here. Credential names, paths, and Vault path references do appear (they’re configuration, not secrets).
Vault tenancy model
The per-division Vault tenancy model formalized in issue #174 (DEV-OCP-0.4) is the heart of the application-secret story. Three principles:
- One Vault path subtree per division.
secret/apps/<division>/<app>/<env>/<key>. A leaked role only exposes that division’s subtree. - One ACL policy per division.
apps-<division>-readgrantsreadonsecret/data/apps/<division>/*andlist/readonsecret/metadata/apps/<division>/*. The policy is cluster-agnostic. - One role per cluster + division.
apps-<cluster>-<division>lives underauth/kubernetes-<cluster>/role/. It binds the policy to a service-account name + namespace glob. The namespace globapps-<division>-*is the structural lock: only namespaces starting with that prefix can authenticate against the role.
The role JSON shape:
{
"bound_service_account_names": ["app-eso"],
"bound_service_account_namespaces": ["apps-<division>-*"],
"token_policies": ["apps-<division>-read"],
"token_ttl": "1h",
"token_max_ttl": "4h",
"audience": "vault"
}
The role only ever issues short-lived tokens (1-hour TTL, 4-hour max). ESO refreshes each cycle.
Per-tenant SecretStore, not ClusterSecretStore
Critically, tenant apps use namespace-scoped SecretStore (kind: SecretStore), not the cluster-wide ClusterSecretStore. Each tenant namespace owns its own store; it cannot read other divisions’ paths because the role’s namespace glob refuses to issue a token outside apps-<division>-*.
The canonical per-tenant SecretStore shape:
apiVersion: external-secrets.io/v1
kind: SecretStore
metadata:
name: vault-apps
namespace: apps-<DIVISION>-<APP>
spec:
provider:
vault:
server: https://vault.sub.comptech-lab.com:8200
path: secret
version: v2
caBundle: <base64 vault CA>
auth:
kubernetes:
mountPath: kubernetes-<CLUSTER>
role: apps-<CLUSTER>-<DIVISION>
serviceAccountRef:
name: app-eso
audiences:
- vault
A placeholder copy lives in platform-gitops at clusters/spoke-dc-v6/tenants/_template/secretstore-vault-apps.yaml. The tenant template copies it into each new tenant directory and substitutes <DIVISION>, <APP>, and <CLUSTER>.
An ExternalSecret then materializes the Kubernetes Secret from Vault:
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: db-credentials
namespace: apps-payments-checkout-api
spec:
secretStoreRef:
kind: SecretStore
name: vault-apps
refreshInterval: 1h
target:
name: db-credentials # the actual K8s Secret name
data:
- secretKey: username
remoteRef:
key: apps/payments/checkout-api/prod/db.username
property: value
- secretKey: password
remoteRef:
key: apps/payments/checkout-api/prod/db.password
property: value
The Pod consumes db-credentials like any other Secret — envFrom, volumeMount, imagePullSecrets. It doesn’t know Vault exists.
Platform-scoped ClusterSecretStore
For cluster-operator credentials (things like RHACS init-bundle TLS, OADP backup repo credentials, Vault root-token-equivalents needed by operators themselves), a single platform-scoped ClusterSecretStore vault-cluster reads from a separate Vault path subtree:
ocp/platform/* # platform-wide secrets (operator-level)
ocp/<cluster>/* # per-cluster secrets (cluster-specific operands)
This store is bound to the platform ESO service account (external-secrets-operator-controller-manager) and is the canonical reference for ExternalSecret resources in openshift-* and stackrox namespaces.
The two stores never read each other’s path subtrees. The split is enforced by Vault policy, not just convention.
htpasswd Secret pattern (Quay, ACS Central)
A small number of platform credentials pre-date Vault availability or bootstrap subsystems that Vault itself depends on. These live as Kubernetes Secrets of type htpasswd inside the cluster:
| Secret | Used by | Why not Vault |
|---|---|---|
central-htpasswd (stackrox namespace) | RHACS Central admin login | needed during ACS init-bundle generation; pre-dates ESO availability on the cluster |
| Quay superuser htpasswd | Quay registry admin | Quay’s own auth subsystem reads htpasswd directly |
| OpenShift IdP htpasswd (if used) | bootstrap user pre-IdP | replaced once a proper IdP is configured (PCI-DSS sub-issue #251) |
When ESO is available and the subsystem supports Vault-backed auth, these get migrated. Until then they live as cluster Secrets, created via oc create secret htpasswd during bootstrap. The Secret value never leaves the cluster; the operator workstation may hold a one-shot copy in secrets/ for recovery purposes and rotate it through a tracked issue.
The init-bundle TLS material for RHACS Central is also of this shape: generated once via Central’s API, flattened to dockerconfigjson + TLS secret shape, then pushed into Vault for ESO delivery into the stackrox namespace on each secured cluster. The pattern is documented in reference_rhacs_init_bundle_via_api.md.
Local secrets/ directory (bootstrap-only)
The operator workstation holds a local-only secrets/ directory under opp-full-plat/secrets/ (the canonical location for local-only secrets). The directory is:
- Git-ignored by an explicit
.gitignorerule. - Mode-restricted (
chmod 700on the dir,600on the files). - Local-only — never synced to remote storage, never copied off the host.
The directory holds a handful of bootstrap-only credentials (Nexus admin and service-account passwords, the GitLab bootstrap PAT, per-cluster kubeadmin passwords, MinIO env files, the git-askpass shim). Exact filenames live in opp-full-plat/connection-details/ and are not enumerated here.
The directory exists so that one-shot bootstrap operations (creating the first Vault root token, generating an htpasswd Secret, recovering from a degraded ESO state) have a known custody location. It is not a substitute for Vault. Once a credential is in Vault and ESO is delivering it, the local file becomes a recovery artifact, not a live credential surface. Stale or rotated files get a .prev or .stale-pre-* suffix and are kept for rollback windows, then removed.
Workspace boundary rule from feedback_workspace_boundary.md: never look in /home/ze/cloud-init/ for credentials. That path is leftover failed-install scrap. Values found there are not authoritative.
What NEVER goes in Git
Hard rule, every category:
- Vault tokens, root tokens, unseal keys, recovery keys, raft snapshot encryption keys.
- Kubeconfigs containing user credentials. (Cluster cert-only kubeconfigs as bootstrap inputs are sometimes acceptable in cluster-build repos; user-credential kubeconfigs are not.)
- htpasswd hashes, plaintext passwords, salted-and-hashed credentials.
- API tokens — Nexus, GitLab PAT, Jenkins API token, RHACM admin token, ACS Central admin token, GitHub PAT.
- Robot pull secrets / dockerconfigjson values. The reference to the Secret name in a workload spec is fine; the contents are not.
- TLS private keys. Public certs and CSRs are fine; private keys are not.
- OAuth client secrets, OIDC client secrets, JWT signing keys.
- Database connection strings with embedded credentials.
.envfiles with values..env.examplefiles with placeholder values are fine.- Pre-rendered Kubernetes Secret manifests even with placeholders that look obviously fake.
- Any base64-encoded blob whose plaintext is a credential — base64 is not encryption.
What IS allowed in Git:
- Credential names and Vault paths (
secret/apps/<division>/<app>/<env>/<key>). ExternalSecretmanifests that reference Vault paths (but don’t contain values).SecretStore/ClusterSecretStoredefinitions (which contain server URLs and CA bundles — public information).PodandDeploymentspecs that reference Secret names.- Documentation describing where credentials live and how to access them.
When in doubt: ask whether the file, if leaked, would let someone authenticate as something they shouldn’t. If yes, it doesn’t go in Git.
What NEVER goes in this blog
This site publishes to a public Cloudflare Pages site. Stricter than Git:
- No raw internal IPv4 addresses (
30.30.x.xis redacted to “the DNS VM,” “the Vault VM,” etc.). - No internal MAC addresses, hardware serials, license keys, contract numbers.
- No credential values, even masked. “The Vault root token is
<redacted>” is still not OK because it confirms shape and presence. - No screenshots of consoles that include hostnames, IPs, or user metadata.
OK to publish:
- Hostnames under
*.apps.sub.comptech-lab.com,pdns.local,mirror-registry.sub.comptech-lab.com— these are part of the documented DNS plane. - Generic CIDR allocation patterns (e.g., “the lab uses a private
/16with role-banded/24slices”) without specific addresses. - Operator versions, image digests at a generic level, public RHACM/OCP/Argo CD config patterns.
When in doubt, redact and add a footnote pointing at the private connection-details/ runbook.
ESO kubernetes provider — the OBC bridge pattern
A specific ESO pattern recurs frequently enough to warrant naming: the OBC → operand secret bridge.
The problem: NooBaa-backed ObjectBucketClaim resources create a Secret with keys AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY plus a ConfigMap with the endpoint, bucket name, and region. But the LokiStack, TempoStack, and Quay operators expect a different Secret shape: lowercase keys (access_key_id, access_key_secret), endpoint and bucket name inside the Secret itself, not split between Secret and ConfigMap.
The bridge is an ExternalSecret using the ESO kubernetes provider (reading from a ClusterSecretStore of kind kubernetes):
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: tempo-storage
namespace: openshift-tempo
spec:
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: kubernetes-local
target:
name: tempo-storage # what the TempoStack operator expects
template:
type: Opaque
data:
endpoint: "https://s3.<minio>:443"
bucketnames: "{{ .obc_bucket }}"
access_key_id: "{{ .access_key_id }}"
access_key_secret: "{{ .access_key_secret }}"
region: "{{ .region }}"
dataFrom:
- extract:
key: openshift-tempo/tempo-obc # the OBC's Secret
- extract:
key: openshift-tempo/tempo-obc-cm # the OBC's ConfigMap (read via k8s provider)
This pattern lives at clusters/spoke-dc-v6/platform-services/tracing/externalsecret-tempo-storage.yaml and is the reference for new operand-secret bridges (Loki backport tracked under issue #233).
A second egress note: the Red Hat ESO operator 1.1.0 ships a default-deny NetworkPolicy in the external-secrets namespace, so the reconciler hangs on Vault login until an allow-egress NetworkPolicy targeting the Vault VM is in place. The fix is already in GitOps (project_eso_egress_to_vault.md); restart the operand after policy changes.
Robot token rotation (Quay, Nexus)
The per-tenant Quay robot token convention (reference_quay_robot_token_convention.md) is the canonical model for image-push credentials. Rotation steps:
- Generate a new robot token in Quay for the existing robot account (Quay console or API). The old token remains valid until explicitly revoked, so there is no race.
- Write the new token to Vault under
secret/apps/<division>/<app>/ci/quay-robot(KV-v2). The Vault write replaces the prior version; KV-v2 retains history for rollback. - ESO refreshes the Secret on the next reconcile cycle (
refreshInterval); thequay-robot-team-<team>Secret inopenshift-pipelinesupdates with the new dockerconfigjson. - Smoke-test the push pipeline with a no-op build that exercises the new credential.
- Revoke the old robot token in Quay once the smoke test passes.
The same shape works for Nexus credentials (jenkinsbot, future per-division CI accounts): generate, write to Vault, let ESO refresh, smoke-test, revoke.
Never rotate by editing the live Kubernetes Secret in place. That bypasses ESO, which will overwrite it on the next reconcile and leak the broken-rotation state into operator logs.
Break-glass credential access
The handful of cases where credentials are read directly from custody locations (not via ESO):
- First boot of Vault (literally creating the root token); the token goes immediately into a sealed local file and is replaced by short-lived AppRole or Kubernetes-auth credentials within the first day.
- Recovery of a degraded ESO where the controller cannot reach Vault. The recovery operator may need to re-create a cluster Secret manually from the local copy, with a tracked issue and a backport plan.
- kubeadmin one-time access for break-glass cluster operations. The cluster’s kubeadmin password file lives in the local
secrets/directory; using it requires a tracked issue, capture of starting state, and immediate post-action validation.
Each break-glass credential use must produce: a GitHub issue, the cluster/namespace/object touched, the actor and time, the action summary, validation, and a backport or rotation commit. The five-piece audit shape from ADR 0025 applies.
Common credential gotchas
| Symptom | Likely cause | Fix |
|---|---|---|
ExternalSecret stuck in SecretSyncedError with Vault timeout | ESO operand missing egress allow to Vault VM | apply NetworkPolicy allow-egress targeting Vault VM, restart ESO operand |
ExternalSecret stuck in SecretSyncedError with “permission denied” | Vault role’s namespace glob doesn’t match tenant ns | re-check apps-<division>-* glob and the actual tenant namespace name |
| LokiStack pods CrashLoopBackOff with S3 auth error | OBC bridge missing or wrong key shape | apply the operand-shape ExternalSecret from project_obc_to_operand_secret_bridge.md |
| Quay push fails after rotation | ESO hasn’t refreshed yet | wait one refreshInterval (typ. 1h) or kubectl annotate externalsecret <name> force-sync=$(date +%s) |
| RHACS init-bundle won’t accept on a new spoke | ESO not delivered or wrong Secret shape | check the stackrox ns Secret values; the init-bundle should be split into collector-tls and sensor-tls Secrets |
Operator pulls image but gets imagePullBackOff | pull-secret Secret not linked to the SA | oc secrets link default <pull-secret> --for=pull -n <ns> |
Vault vault-apps SecretStore reports cert errors | wrong caBundle (base64 of Vault CA) | copy verbatim from clusters/<cluster>/secrets/eso/clustersecretstore-vault.yaml |
| Tenant pod gets old credential value | ESO refresh interval too long | shorten refreshInterval and consider a force-sync trigger on rotation |
References
connection-details/vault-app-secrets.md— per-division Vault tenancyconnection-details/nexus.md— Nexus credential custody (private)connection-details/minio.md— MinIO credential custody (private)adr/0019-nexus-only-image-supply-chain.mdadr/0024-openshift-only-platform-gitops-boundary.mdadr/0025-gitops-only-operations-break-glass.md