Security with RHACS
Wire Red Hat Advanced Cluster Security into your ACM fleet so every managed cluster is registered with Central, scanned, and watched at runtime.
ACM lifecycles your clusters. It does not secure them. The product that secures them — runtime behaviour, image CVEs, admission rules, compliance scans — is Red Hat Advanced Cluster Security (RHACS), the Stackrox-derived sibling that ships alongside ACM in most Red Hat fleets.
This module is the integration story. You will not learn RHACS from scratch; you will learn how to wire it into a fleet that ACM already manages, where the policies fan out automatically and the secured-cluster Application lands on every spoke without manual roxctl runs.
Two products, one fleet
ACM and RHACS overlap on the word “policy,” and that overlap is the source of every confused first conversation. Keep them straight:
| Concern | ACM | RHACS |
|---|---|---|
| Cluster lifecycle | Yes | No |
| Declarative config policy (“every ns has a NetworkPolicy”) | Yes | No |
| Runtime detection (“a shell spawned in a payments pod”) | No | Yes |
| Image CVE scanning | No | Yes |
| Build-time policies enforced at admission | No | Yes |
| Compliance reports against PCI, HIPAA, NIST | Some | Yes (deeper) |
| Fan-out to managed clusters | Yes (Policy framework, ApplicationSet) | Indirectly — via ACM |
The integration is intentionally loose. ACM does not call the RHACS API; RHACS does not import ACM CRs. What they share is a fleet topology: ACM provisions and labels managed clusters, and RHACS treats those same clusters as secured-cluster targets for its sensor.
In practice that means RHACS is fan-out as Application: Central is one Argo CD Application targeting the hub, and SecuredCluster is one ApplicationSet with a cluster generator that lands an instance on every spoke. The result is that adding a new managed cluster to ACM also gives you a secured cluster a few minutes later, with no roxctl on a laptop in sight.
RHACS architecture
Reading the diagram:
- Central runs in one place — almost always the hub OpenShift cluster, in its own namespace. It owns the UI, the policy engine, the violations database, and the central authentication endpoint.
- Scanner V4 sits next to Central. It scans images at push time (via registry webhooks) and on demand when admission asks for a verdict. V4 replaced the older Stackrox-Scanner in late 2025; it ships an embedded vulnerability DB and supports OCI artifacts directly.
- SecuredCluster is the per-managed-cluster footprint. It contains three components: a sensor Deployment (single replica; talks to Central over mutual TLS), a collector DaemonSet (one pod per node, reads syscalls via eBPF or a kernel module), and an admission webhook (validates pod specs before they are created).
- Solid black = local intra-cluster traffic. Dashed green = spoke-initiated mTLS connections out to Central, and the scanner reaching back to the image registry.
The deployment looks fat — three workloads per spoke — but the resource footprint is modest. A sensor needs ~500 MiB, the collector ~200 MiB per node, and the admission webhook is a couple of small replicas. Most lab clusters absorb it without any node sizing changes.
Bootstrap: the init-bundle
The interesting wrinkle in RHACS is how the sensor authenticates to Central. There is no API token. There is no kubeconfig federation. Each SecuredCluster receives an init-bundle — a small bag of TLS Secrets (the sensor’s certificate, the collector’s certificate, an admission-controller certificate) signed by Central’s internal CA. The sensor uses its certificate to mutual-TLS into Central; Central recognises the certificate and registers the cluster.
You generate the bundle once per managed cluster, apply it to the SecuredCluster’s namespace before the operator runs, and from then on the sensor is on its own. The bundle has a default expiry — by convention, one year — and rotation is its own dance.
In a lab without automation, you generate the bundle by running roxctl on a laptop, downloading a YAML file, and oc apply-ing it onto the spoke. That is fine for a demo. It is not fine for a fleet, because at 30 clusters you have 30 YAML files containing private keys floating around someone’s ~/Downloads.
The pattern this lab uses instead — see /docs/openshift-platform/openshift-platform/security/init-bundle-via-eso/ — is:
- A scheduled job calls Central’s REST API directly:
POST /v1/cluster-init/init-bundleswith{"name": "<cluster-name>"}. The response is a JSON document containing all the bundle’s Secret materials. - The job flattens the response into Vault under
secret/ocp/<cluster>/rhacs/init-bundle. - Each managed cluster runs External Secrets Operator (ESO) with a
ClusterSecretStorepointing at Vault. AnExternalSecretmaterialises the bundle’s three Secrets into thestackroxnamespace. - The SecuredCluster operator finds the Secrets at startup, sensor talks to Central, cluster registers.
No roxctl on anyone’s machine, no Secrets in git, and rotation is a single Vault write away.
Policies in RHACS vs Policies in ACM
The biggest source of confusion in this product pair is the word “policy.” They mean different things.
- An ACM Policy is a declarative configuration rule. It says “every namespace on this cluster set must have a
NetworkPolicynameddefault-deny.” If the rule is violated, ACM either reports it (inform) or fixes it (enforce). The unit of work is a Kubernetes object that should or should not exist. - An RHACS policy is a security rule evaluated against runtime behaviour, image content, or admission requests. It says “an image with a CVSS >= 7 vulnerability must not run on production-labelled clusters.” If the rule is violated, RHACS either alerts (
inform), blocks the deploy at admission (enforce-on-create), or kills the running pod (enforce-on-runtime). The unit of work is a behaviour or attribute.
You need both. Network policy in every namespace is not a security rule; it is a baseline-config rule. A pod that opens a shell into /bin/bash at runtime is not a config issue; it is a security event. They are complementary, not competing.
For the lab’s per-tenant policy set — for example the platform-team Policy that enforces a default-deny NetworkPolicy across every PCI-scoped namespace — see /docs/openshift-platform/openshift-platform/security/app-team-policy-set/. For the RHACS side, the default policy library that ships with Central is the starting point; almost every operator adds a few of their own.
Per-tenant policy exceptions
Real fleets need exceptions. A tenant’s base image has a known CVE; the upstream maintainer has not pushed a fixed image yet; the tenant cannot stop shipping. The right answer is a time-bounded exception with a written justification, not a permanent disable of the policy.
RHACS supports this natively: each policy violation can be deferred or marked as a false positive, with an expiry and a comment. The lab wraps that in a documented workflow — the tenant opens a ticket, the platform team reviews, the exception is applied with a 30-day clock, and a recurring report surfaces exceptions that are about to expire. See /docs/openshift-platform/application-delivery/tenant-onboarding/rhacs-tenant-exception-process/.
The mistake to avoid is silent permanent exceptions. The exception expires, the team rotates, nobody remembers what the original justification was, and a year later the CVE is still there and nobody can explain why.
Image scanning at admission
Scanner V4 sits between two surfaces: the registry and the kube-apiserver.
- At push time, the registry’s webhook posts to Scanner; Scanner pulls the manifest, layers, and SBOM, computes vulnerabilities, and stores the result against the image digest.
- At admission time, the admission webhook on each spoke calls Sensor; Sensor asks Central for the policy verdict on the image digests in the incoming pod spec; Central pulls the latest scan result (or triggers a scan if missing) and returns allow/deny.
The mistake everyone makes once is setting the admission webhook to enforce without first running it in inform. The first day in enforce mode, kube-controller-manager itself fails to schedule pods because some openshift- namespace image has a high-severity CVE you did not know about. Run in inform for at least a week; sort out the violations; then flip to enforce. For the lab’s policy set and the cluster-scope rules, see /docs/openshift-platform/openshift-platform/security/scanner-v4-and-image-policies/.
A useful trick: an admission policy can exempt the openshift-* namespaces (Red Hat images, which are scanned and patched by Red Hat) and enforce only against tenant namespaces. That is the practical compromise — most of the noise is in the openshift- namespaces; most of the actual risk is in tenant images.
Central admin password rotation via Vault + ESO
A pragmatic operational concern. The first Central admin password is generated by the operator at install time. That password is what every operator who SSHs into the UI uses; it is also what the lab’s init-bundle generator uses to call the Central API. Rotating it is therefore non-trivial.
The lab’s pattern — captured in /docs/openshift-platform/operations/routine-tasks/rotate-rhacs-central-admin/ — is:
- Generate a new password in Vault:
vault kv put secret/ocp/platform/rhacs-admin password=<new>. - ESO sees the change and updates the Kubernetes Secret
central-admin-passwordinstackrox. - The Central CR’s
adminPasswordSecretfield points at that Secret. - Restart Central:
oc rollout restart deploy/central. This is the load-bearing step. Central reads the htpasswd file once at startup and caches it in memory; without a restart, the new password sits in the Secret but does not work.
The lab does not populate central-htpasswd.password — that field is empty by design, because the password lives in central-admin-password instead. The init-bundle automation reads the same Vault path, so a rotation propagates everywhere in one step.
RHACS integrations — SIEM, ticketing, image scanners
Central, the Scanner V4, and the per-spoke SecuredCluster footprints all produce one kind of output: violations. A violation is the runtime event that some policy was tripped — a deploy with a high-severity CVE, a shell spawned in a production pod, an admission request that asks for a privileged container. Without somewhere to send those violations, they sit in the Central UI and depend on someone logging in to see them. The integration layer is how violations reach the on-call channel, the SIEM, the ticketing system, and the audit trail.
Integrations are not optional in a serious deployment. They are the load-bearing path between “Central detected something” and “an engineer is fixing it.”
Integration categories
Five categories cover everything Central can talk to today:
Notification targets. Email, Slack and Microsoft Teams via incoming webhook, JIRA and ServiceNow as ticketing systems, generic webhook as the escape hatch. The Slack and Teams integrations are best for low-volume awareness channels; JIRA and ServiceNow are best for high-severity violations where a ticket needs to exist for the audit trail.
SIEM forwarders. Splunk HEC (HTTP Event Collector), Elasticsearch, IBM QRadar, and generic syslog. These are the integrations the security team usually cares about most — every violation event flows as a structured log entry into the SIEM, where it correlates with other security signals (IDS, EDR, network telemetry) for incident response.
Image-scanner integrations. Scanner V4 ships inside RHACS and is the default, but Central can also consume scan results from external scanners: Quay’s built-in scanner, JFrog Xray, Sonatype Nexus IQ, Snyk, Anchore, and registry-side scanners on Docker Trusted Registry. Findings from any configured scanner merge into the policy engine; a policy that says “no critical CVEs” applies regardless of which scanner found them.
Cloud-provider security platforms. AWS Security Hub, GCP Security Command Center, Azure Sentinel. Violations from RHACS appear as findings in the cloud provider’s central security view alongside their native signals. Useful when the platform is multicloud and the security team uses the cloud provider’s pane of glass rather than RHACS’s.
Image-signature verifiers. cosign and sigstore for verifying image signatures at admission time. A signed-image policy can require that production deploys only run images signed by an approved key; cosign’s verifier integration is how Central checks signatures at policy evaluation.
The integration object shape
In the RHACS Operator world, integrations don’t all use the same CR — different ones live under different APIs. The newer pattern uses integration.stackrox.io/v1 CRs that the operator reconciles into the running Central; the older pattern (still widely deployed) configures integrations via the Central API directly and stores them in Central’s database.
For each integration, RHACS surfaces two adjacent CRs:
Notifier— the integration configuration: endpoint URL, credentials secret reference, retry settings, format options.IntegrationHealth— a read-only status surface that says “this integration is healthy, last successful call N seconds ago” or “this integration is failing with error X.” The on-call should monitor this; a silently-broken Splunk integration is a security gap that doesn’t trigger any built-in alarm.
Authentication is per-integration. Splunk needs an HEC token; JIRA needs basic auth or an API token; webhooks need whatever secret you configure on the receiving end; Slack needs the incoming-webhook URL. Each integration’s secret material is referenced by a secretName in the Notifier CR; the secret itself is materialised separately.
Operational pattern
The mistake to avoid is clicking integrations into existence through the Central UI. The UI is fine for a single integration during evaluation; it does not scale beyond a handful, and worse, it leaves the integration state in Central’s database rather than in Git. A year later, nobody remembers why the Splunk HEC token is what it is, and rotation requires re-clicking a path that nobody documented.
The lab’s pattern: integration configuration is GitOps-managed (Notifier CRs in platform-gitops), and integration credentials follow the standard secret-custody flow:
- Credentials live in Vault under
secret/ocp/stackrox/integrations/<name>. - ExternalSecrets Operator (ESO) materialises them as Kubernetes Secrets in the
stackroxnamespace on the hub. - The
NotifierCR references the materialised Secret by name. - Rotation is a
vault kv put; ESO sees the change and updates the Secret; Central picks it up on next reconcile.
This is the same custody pattern Module 09 already documented for the init-bundle and the Central admin password. Use it for every Notifier, every external scanner, every cloud-provider integration. Don’t make integrations the place where the secret-custody discipline breaks down.
A worked Splunk HEC Notifier
apiVersion: integration.stackrox.io/v1
kind: Notifier
metadata:
name: splunk-soc
namespace: stackrox
spec:
type: splunk
splunk:
httpEndpoint: https://splunk-hec.security.example.com/services/collector/event
source: rhacs
sourceType: _json
truncate: 10000
insecureSkipVerify: false
credentialsSecret:
name: splunk-hec-token # ESO-materialised from Vault
The credentialsSecret is the ESO-materialised Secret containing the HEC token. The Notifier itself is a small, declarative resource; the secret material is the only thing that needs rotation.
The throttling problem
A bad release on a high-volume cluster can generate hundreds of violations per minute — every new pod trips the same policy, every image scan finds the same CVE, every admission request loses the same gate. Without rate-limiting at the Notifier, you’ll flood your SIEM and your on-call channel in the same minute, and the signal-to-noise on the actual problem will be terrible.
Two patterns to apply, in order:
Per-policy notifier selection. RHACS lets you attach notifiers to specific policies. Noisy policies (Image Age, default deny-by-default rules) go to a low-priority audit channel; high-severity policies (privileged container, secret-in-environment) go to the on-call channel and JIRA. Don’t send every policy to every notifier.
Notifier-side throttling. For the SIEM forwarders that take volume well (Splunk, Elasticsearch), let the signal flow — the SIEM is built for it. For human-facing channels (Slack, Teams, PagerDuty), enable Central’s deduplication so the same violation on the same resource doesn’t fire twice within a window. The Slack integration in particular gets unusable past ~20 messages/minute.
The deeper fix is upstream: if you’re generating hundreds of violations per minute, the release is genuinely broken, and what you want is to fail the rollout early — admission enforce mode on the high-severity policy, an Argo Rollouts AnalysisRun against the RHACS violation count metric. Notifications are how you find out; admission gates are how you avoid finding out.
References
- docs.redhat.com — RHACS integrating with other tools
- docs.redhat.com — RHACS Slack integration
- docs.redhat.com — RHACS Splunk integration
- docs.redhat.com — RHACS JIRA integration
- docs.redhat.com — RHACS image-scanner integrations
The hub-on-spoke pattern
Where do you put Central?
The default answer is on the hub OpenShift cluster, in a dedicated stackrox namespace. The hub is the natural home: it is the cluster everyone already trusts, it has the operator catalogue, and it does not run customer workload. Central is itself just an Argo CD Application on the hub — declared in the GitOps repo, owned by the hub’s Argo instance, sync wave 10 so the operator installs before the CR.
The fan-out side is an ApplicationSet on the hub with a cluster generator. For every ManagedCluster registered to ACM, the generator produces one SecuredCluster Application, each parameterised by the cluster’s name. As ACM imports a new spoke, the generator produces a new Application; Argo CD pushes the SecuredCluster manifest at the spoke; ESO materialises the init-bundle from Vault; sensor reports to Central.
In the lab today, Central runs on hub-dc-v6 and SecuredCluster runs on spoke-dc-v6. Adding a second spoke is a labelling change on the ManagedCluster, nothing else.
Try this
-
Read the SecuredCluster CR on a managed cluster.
oc --context spoke get securedcluster -n stackrox -o yamlNote the operator version, the
centralEndpoint, and theimagePullSecrets. The CR is small — the heavy lifting is in the operator’s reconciler. -
Add an RHACS policy in
informmode. In the Central UI, clone the built-in “90-Day-Image-Age” policy, set the scope to your spoke’s cluster, leave enforcement atinform. After 5 minutes, check the violations page: every staging image older than 90 days lights up. Do not flip it to enforce yet. -
Generate a fresh init-bundle via the Central API. From a node that can reach Central:
curl -sk -u admin:"$RHACS_ADMIN" \ -X POST https://central-stackrox.apps.HUB/v1/cluster-init/init-bundles \ -H 'Content-Type: application/json' \ -d '{"name":"spoke-test"}' | jq .The response is the full bundle as JSON. In a real workflow it goes to Vault; for this exercise just inspect the shape — three Secret blobs, plus the kubectlBundle for manual installs.
Common failure modes
SecuredCluster never reports to Central. The first thing to check is the three init-bundle Secrets in stackrox on the spoke: sensor-tls, collector-tls, admission-control-tls. If any are missing or empty, ESO did not deliver them; check the ExternalSecret’s status. If the Secrets exist but sensor is in CrashLoopBackOff, the bundle has expired — generate a new one. The second thing to check is network: sensor reaches Central on :443, and on a NetworkPolicy-tight cluster that egress has to be explicitly allowed.
Admission webhook times out and blocks deploys. Scanner V4 is sometimes slow on the first scan of a new image; if admission cannot get a verdict in time, the policy might fail-open or fail-closed depending on configuration. The fix is either to bump Scanner V4 replicas (Central CR’s scanner.replicas) or to enable admission caching so repeat checks of the same digest are instantaneous. Do not set the admission webhook to failurePolicy: Ignore to “fix” the symptom — that just means a slow scanner becomes a security gap.
Collector OOMKilled on a fleet of mixed kernels. Collector tries eBPF first and falls back to a kernel module. On older RHEL kernels the eBPF program can fail to load and the collector restarts in a loop. The fix is to set the collector’s collectionMethod to KERNEL_MODULE on those clusters, or to upgrade the node OS. The DaemonSet’s logs are the place to look.
Cross-link
ACM Governance gives you the declarative-config side of the fleet’s security posture; RHACS gives you the runtime-and-image side. Together they are the defence-in-depth that compliance audits actually expect to see. For the lab’s integrated view — how Policy reports and RHACS violations feed the same compliance dashboard — see /docs/openshift-platform/openshift-platform/security/rhacs-overview/.
References
- Red Hat ACS documentation:
https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_security_for_kubernetes/ - Stackrox upstream:
https://www.stackrox.io/ - RHACS API reference (
/v1/cluster-init/init-bundles):https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_security_for_kubernetes/latest/html/api_reference/ - Red Hat ACM Policy framework:
https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/latest/html/governance/governance - Open Cluster Management policy framework upstream:
https://open-cluster-management.io/getting-started/integration/policy-framework/