Credential Custody Rules

Where credentials live, where they must never go, how the per-division Vault tenancy model works, and the ESO bridge that delivers Vault secrets and OBC-shaped object-store credentials into the right Kubernetes Secret on the right cluster.

This page is the canonical rule set for handling credentials across the CompTech v6 fleet — where they live, where they get delivered to, how they cross trust boundaries, and the small handful of cases where a credential is allowed to exist as a one-shot bootstrap artifact rather than a Vault-managed resource. It applies to platform operators and application developers equally.

The single most important sentence on this page: credentials are never committed to Git, never pasted into issues or chat, never printed in shared logs. Everything else is implementation detail.

The custody map

There are four custody locations and one forbidden zone:

  1. Vault (yellow) — the canonical custody for application secrets and most platform secrets. Per-division tenancy under secret/apps/<division>/<app>/.... Sealed at rest, audited, role-scoped per cluster + division.
  2. Cluster htpasswd Secret (gray) — for the small set of admin credentials that pre-date Vault availability or that bootstrap Vault-adjacent systems. Quay admin and ACS Central admin live here today. The Secret name in the cluster is the contract; the value is never extracted to disk except for one-shot bootstrap.
  3. Local secrets/ directory (red, restricted) — one-shot bootstrap material on the operator workstation only. Git-ignored, mode-restricted. Values flow into Vault or into a cluster Secret once during bootstrap and the local file becomes a recovery artifact, not a live credential surface.
  4. ESO (green dashed) — the delivery mechanism. Reads from Vault (or from existing Kubernetes Secrets via the kubernetes provider) and materializes target Secrets in the right namespace on the right cluster. There are two ESO surfaces: a platform-scoped ClusterSecretStore vault-cluster and one per-tenant SecretStore vault-apps inside each tenant namespace.
  5. Git (red, forbidden) — credential values never appear here. Credential names, paths, and Vault path references do appear (they’re configuration, not secrets).

Vault tenancy model

The per-division Vault tenancy model formalized in issue #174 (DEV-OCP-0.4) is the heart of the application-secret story. Three principles:

  1. One Vault path subtree per division. secret/apps/<division>/<app>/<env>/<key>. A leaked role only exposes that division’s subtree.
  2. One ACL policy per division. apps-<division>-read grants read on secret/data/apps/<division>/* and list/read on secret/metadata/apps/<division>/*. The policy is cluster-agnostic.
  3. One role per cluster + division. apps-<cluster>-<division> lives under auth/kubernetes-<cluster>/role/. It binds the policy to a service-account name + namespace glob. The namespace glob apps-<division>-* is the structural lock: only namespaces starting with that prefix can authenticate against the role.

The role JSON shape:

{
  "bound_service_account_names":      ["app-eso"],
  "bound_service_account_namespaces": ["apps-<division>-*"],
  "token_policies":                   ["apps-<division>-read"],
  "token_ttl":                        "1h",
  "token_max_ttl":                    "4h",
  "audience":                         "vault"
}

The role only ever issues short-lived tokens (1-hour TTL, 4-hour max). ESO refreshes each cycle.

Per-tenant SecretStore, not ClusterSecretStore

Critically, tenant apps use namespace-scoped SecretStore (kind: SecretStore), not the cluster-wide ClusterSecretStore. Each tenant namespace owns its own store; it cannot read other divisions’ paths because the role’s namespace glob refuses to issue a token outside apps-<division>-*.

The canonical per-tenant SecretStore shape:

apiVersion: external-secrets.io/v1
kind: SecretStore
metadata:
  name: vault-apps
  namespace: apps-<DIVISION>-<APP>
spec:
  provider:
    vault:
      server: https://vault.sub.comptech-lab.com:8200
      path: secret
      version: v2
      caBundle: <base64 vault CA>
      auth:
        kubernetes:
          mountPath: kubernetes-<CLUSTER>
          role: apps-<CLUSTER>-<DIVISION>
          serviceAccountRef:
            name: app-eso
            audiences:
              - vault

A placeholder copy lives in platform-gitops at clusters/spoke-dc-v6/tenants/_template/secretstore-vault-apps.yaml. The tenant template copies it into each new tenant directory and substitutes <DIVISION>, <APP>, and <CLUSTER>.

An ExternalSecret then materializes the Kubernetes Secret from Vault:

apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: db-credentials
  namespace: apps-payments-checkout-api
spec:
  secretStoreRef:
    kind: SecretStore
    name: vault-apps
  refreshInterval: 1h
  target:
    name: db-credentials       # the actual K8s Secret name
  data:
    - secretKey: username
      remoteRef:
        key: apps/payments/checkout-api/prod/db.username
        property: value
    - secretKey: password
      remoteRef:
        key: apps/payments/checkout-api/prod/db.password
        property: value

The Pod consumes db-credentials like any other Secret — envFrom, volumeMount, imagePullSecrets. It doesn’t know Vault exists.

Platform-scoped ClusterSecretStore

For cluster-operator credentials (things like RHACS init-bundle TLS, OADP backup repo credentials, Vault root-token-equivalents needed by operators themselves), a single platform-scoped ClusterSecretStore vault-cluster reads from a separate Vault path subtree:

ocp/platform/*           # platform-wide secrets (operator-level)
ocp/<cluster>/*          # per-cluster secrets (cluster-specific operands)

This store is bound to the platform ESO service account (external-secrets-operator-controller-manager) and is the canonical reference for ExternalSecret resources in openshift-* and stackrox namespaces.

The two stores never read each other’s path subtrees. The split is enforced by Vault policy, not just convention.

htpasswd Secret pattern (Quay, ACS Central)

A small number of platform credentials pre-date Vault availability or bootstrap subsystems that Vault itself depends on. These live as Kubernetes Secrets of type htpasswd inside the cluster:

SecretUsed byWhy not Vault
central-htpasswd (stackrox namespace)RHACS Central admin loginneeded during ACS init-bundle generation; pre-dates ESO availability on the cluster
Quay superuser htpasswdQuay registry adminQuay’s own auth subsystem reads htpasswd directly
OpenShift IdP htpasswd (if used)bootstrap user pre-IdPreplaced once a proper IdP is configured (PCI-DSS sub-issue #251)

When ESO is available and the subsystem supports Vault-backed auth, these get migrated. Until then they live as cluster Secrets, created via oc create secret htpasswd during bootstrap. The Secret value never leaves the cluster; the operator workstation may hold a one-shot copy in secrets/ for recovery purposes and rotate it through a tracked issue.

The init-bundle TLS material for RHACS Central is also of this shape: generated once via Central’s API, flattened to dockerconfigjson + TLS secret shape, then pushed into Vault for ESO delivery into the stackrox namespace on each secured cluster. The pattern is documented in reference_rhacs_init_bundle_via_api.md.

Local secrets/ directory (bootstrap-only)

The operator workstation holds a local-only secrets/ directory under opp-full-plat/secrets/ (the canonical location for local-only secrets). The directory is:

  • Git-ignored by an explicit .gitignore rule.
  • Mode-restricted (chmod 700 on the dir, 600 on the files).
  • Local-only — never synced to remote storage, never copied off the host.

The directory holds a handful of bootstrap-only credentials (Nexus admin and service-account passwords, the GitLab bootstrap PAT, per-cluster kubeadmin passwords, MinIO env files, the git-askpass shim). Exact filenames live in opp-full-plat/connection-details/ and are not enumerated here.

The directory exists so that one-shot bootstrap operations (creating the first Vault root token, generating an htpasswd Secret, recovering from a degraded ESO state) have a known custody location. It is not a substitute for Vault. Once a credential is in Vault and ESO is delivering it, the local file becomes a recovery artifact, not a live credential surface. Stale or rotated files get a .prev or .stale-pre-* suffix and are kept for rollback windows, then removed.

Workspace boundary rule from feedback_workspace_boundary.md: never look in /home/ze/cloud-init/ for credentials. That path is leftover failed-install scrap. Values found there are not authoritative.

What NEVER goes in Git

Hard rule, every category:

  • Vault tokens, root tokens, unseal keys, recovery keys, raft snapshot encryption keys.
  • Kubeconfigs containing user credentials. (Cluster cert-only kubeconfigs as bootstrap inputs are sometimes acceptable in cluster-build repos; user-credential kubeconfigs are not.)
  • htpasswd hashes, plaintext passwords, salted-and-hashed credentials.
  • API tokens — Nexus, GitLab PAT, Jenkins API token, RHACM admin token, ACS Central admin token, GitHub PAT.
  • Robot pull secrets / dockerconfigjson values. The reference to the Secret name in a workload spec is fine; the contents are not.
  • TLS private keys. Public certs and CSRs are fine; private keys are not.
  • OAuth client secrets, OIDC client secrets, JWT signing keys.
  • Database connection strings with embedded credentials.
  • .env files with values. .env.example files with placeholder values are fine.
  • Pre-rendered Kubernetes Secret manifests even with placeholders that look obviously fake.
  • Any base64-encoded blob whose plaintext is a credential — base64 is not encryption.

What IS allowed in Git:

  • Credential names and Vault paths (secret/apps/<division>/<app>/<env>/<key>).
  • ExternalSecret manifests that reference Vault paths (but don’t contain values).
  • SecretStore / ClusterSecretStore definitions (which contain server URLs and CA bundles — public information).
  • Pod and Deployment specs that reference Secret names.
  • Documentation describing where credentials live and how to access them.

When in doubt: ask whether the file, if leaked, would let someone authenticate as something they shouldn’t. If yes, it doesn’t go in Git.

What NEVER goes in this blog

This site publishes to a public Cloudflare Pages site. Stricter than Git:

  • No raw internal IPv4 addresses (30.30.x.x is redacted to “the DNS VM,” “the Vault VM,” etc.).
  • No internal MAC addresses, hardware serials, license keys, contract numbers.
  • No credential values, even masked. “The Vault root token is <redacted>” is still not OK because it confirms shape and presence.
  • No screenshots of consoles that include hostnames, IPs, or user metadata.

OK to publish:

  • Hostnames under *.apps.sub.comptech-lab.com, pdns.local, mirror-registry.sub.comptech-lab.com — these are part of the documented DNS plane.
  • Generic CIDR allocation patterns (e.g., “the lab uses a private /16 with role-banded /24 slices”) without specific addresses.
  • Operator versions, image digests at a generic level, public RHACM/OCP/Argo CD config patterns.

When in doubt, redact and add a footnote pointing at the private connection-details/ runbook.

ESO kubernetes provider — the OBC bridge pattern

A specific ESO pattern recurs frequently enough to warrant naming: the OBC → operand secret bridge.

The problem: NooBaa-backed ObjectBucketClaim resources create a Secret with keys AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY plus a ConfigMap with the endpoint, bucket name, and region. But the LokiStack, TempoStack, and Quay operators expect a different Secret shape: lowercase keys (access_key_id, access_key_secret), endpoint and bucket name inside the Secret itself, not split between Secret and ConfigMap.

The bridge is an ExternalSecret using the ESO kubernetes provider (reading from a ClusterSecretStore of kind kubernetes):

apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: tempo-storage
  namespace: openshift-tempo
spec:
  refreshInterval: 1h
  secretStoreRef:
    kind: ClusterSecretStore
    name: kubernetes-local
  target:
    name: tempo-storage      # what the TempoStack operator expects
    template:
      type: Opaque
      data:
        endpoint:          "https://s3.<minio>:443"
        bucketnames:       "{{ .obc_bucket }}"
        access_key_id:     "{{ .access_key_id }}"
        access_key_secret: "{{ .access_key_secret }}"
        region:            "{{ .region }}"
  dataFrom:
    - extract:
        key: openshift-tempo/tempo-obc                # the OBC's Secret
    - extract:
        key: openshift-tempo/tempo-obc-cm             # the OBC's ConfigMap (read via k8s provider)

This pattern lives at clusters/spoke-dc-v6/platform-services/tracing/externalsecret-tempo-storage.yaml and is the reference for new operand-secret bridges (Loki backport tracked under issue #233).

A second egress note: the Red Hat ESO operator 1.1.0 ships a default-deny NetworkPolicy in the external-secrets namespace, so the reconciler hangs on Vault login until an allow-egress NetworkPolicy targeting the Vault VM is in place. The fix is already in GitOps (project_eso_egress_to_vault.md); restart the operand after policy changes.

Robot token rotation (Quay, Nexus)

The per-tenant Quay robot token convention (reference_quay_robot_token_convention.md) is the canonical model for image-push credentials. Rotation steps:

  1. Generate a new robot token in Quay for the existing robot account (Quay console or API). The old token remains valid until explicitly revoked, so there is no race.
  2. Write the new token to Vault under secret/apps/<division>/<app>/ci/quay-robot (KV-v2). The Vault write replaces the prior version; KV-v2 retains history for rollback.
  3. ESO refreshes the Secret on the next reconcile cycle (refreshInterval); the quay-robot-team-<team> Secret in openshift-pipelines updates with the new dockerconfigjson.
  4. Smoke-test the push pipeline with a no-op build that exercises the new credential.
  5. Revoke the old robot token in Quay once the smoke test passes.

The same shape works for Nexus credentials (jenkinsbot, future per-division CI accounts): generate, write to Vault, let ESO refresh, smoke-test, revoke.

Never rotate by editing the live Kubernetes Secret in place. That bypasses ESO, which will overwrite it on the next reconcile and leak the broken-rotation state into operator logs.

Break-glass credential access

The handful of cases where credentials are read directly from custody locations (not via ESO):

  • First boot of Vault (literally creating the root token); the token goes immediately into a sealed local file and is replaced by short-lived AppRole or Kubernetes-auth credentials within the first day.
  • Recovery of a degraded ESO where the controller cannot reach Vault. The recovery operator may need to re-create a cluster Secret manually from the local copy, with a tracked issue and a backport plan.
  • kubeadmin one-time access for break-glass cluster operations. The cluster’s kubeadmin password file lives in the local secrets/ directory; using it requires a tracked issue, capture of starting state, and immediate post-action validation.

Each break-glass credential use must produce: a GitHub issue, the cluster/namespace/object touched, the actor and time, the action summary, validation, and a backport or rotation commit. The five-piece audit shape from ADR 0025 applies.

Common credential gotchas

SymptomLikely causeFix
ExternalSecret stuck in SecretSyncedError with Vault timeoutESO operand missing egress allow to Vault VMapply NetworkPolicy allow-egress targeting Vault VM, restart ESO operand
ExternalSecret stuck in SecretSyncedError with “permission denied”Vault role’s namespace glob doesn’t match tenant nsre-check apps-<division>-* glob and the actual tenant namespace name
LokiStack pods CrashLoopBackOff with S3 auth errorOBC bridge missing or wrong key shapeapply the operand-shape ExternalSecret from project_obc_to_operand_secret_bridge.md
Quay push fails after rotationESO hasn’t refreshed yetwait one refreshInterval (typ. 1h) or kubectl annotate externalsecret <name> force-sync=$(date +%s)
RHACS init-bundle won’t accept on a new spokeESO not delivered or wrong Secret shapecheck the stackrox ns Secret values; the init-bundle should be split into collector-tls and sensor-tls Secrets
Operator pulls image but gets imagePullBackOffpull-secret Secret not linked to the SAoc secrets link default <pull-secret> --for=pull -n <ns>
Vault vault-apps SecretStore reports cert errorswrong caBundle (base64 of Vault CA)copy verbatim from clusters/<cluster>/secrets/eso/clustersecretstore-vault.yaml
Tenant pod gets old credential valueESO refresh interval too longshorten refreshInterval and consider a force-sync trigger on rotation

References

  • connection-details/vault-app-secrets.md — per-division Vault tenancy
  • connection-details/nexus.md — Nexus credential custody (private)
  • connection-details/minio.md — MinIO credential custody (private)
  • adr/0019-nexus-only-image-supply-chain.md
  • adr/0024-openshift-only-platform-gitops-boundary.md
  • adr/0025-gitops-only-operations-break-glass.md

Last reviewed: 2026-05-11