Roll out a policy

Taking a control from compliance-scan finding (or PolicySet update) to enforced state across the fleet: ACM Policy authoring, Placement, PolicyTemplate, remediationAction, canary, full rollout.

This task covers the policy-rollout loop in ACM: take a compliance-scan finding or an upstream PolicySet update, author the Policy CR with the right Placement and PolicyTemplate, dry-run as inform, canary on one cluster, then promote to enforce fleet-wide.

The procedure is mature in ACM — the lab has used it for the PCI-DSS baseline phase chain (PCI-0 through PCI-5 plus PCI-1.13). This page distills the recurring shape.

When this task runs

Compliance scan finding. PCI-DSS scan, CIS Benchmark, FedRAMP, or internal control set flags a non-compliant cluster.
Upstream PolicySet update. Red Hat or upstream community ships a new PolicySet version — vendored into the lab’s policy catalog under policy-templates/.
New control after an incident. A postmortem proposes a new automated guardrail that was previously a runbook step.
Tenant onboarding. A new tenant’s namespace requires the platform’s default policies attached.

What is in scope

This page covers ACM Policy resources delivered via Placement and PlacementBinding, evaluated by the spoke-side policy controller and surfaced in the hub’s governance pillar.

Out of scope:

OPA / Gatekeeper policies (the gatekeeper-operator-product is in the planned-operator queue; when installed, it gets its own page).
ValidatingAdmissionPolicy (VAP) — the lab uses these for tenant exclusions (vap-tenant-exclusions.md); they are governed differently from ACM Policy.
RHACS image / deployment / runtime policies — different surface, owned by connection-details/rhacs-app-policy.md.

Pre-checks

Identify the control. Is it a compliance-control (PCI-DSS 1.2.1, CIS 5.1.6), an internal policy (“no namespace without NetworkPolicy”), or an ADR-mandated guardrail? Capture the control text verbatim in the issue.
Identify the affected cluster set. “All spokes”, “spokes labelled env=prod”, “the new spoke only”. The Placement selector encodes this.
Identify the remediation. A policy can inform (report only) or enforce (actively converge). Always start inform. The promotion to enforce is a second MR after canary evidence.
Identify the PolicyTemplate type. The lab uses three:

Template kind What it checks
ConfigurationPolicy Object-existence / object-spec checks (most policies)
OperatorPolicy Operator install state
CertificatePolicy TLS-cert expiry
Open the GitHub issue. Branch prefix policy/<control-id>-<short-slug>.

Template kind	What it checks
`ConfigurationPolicy`	Object-existence / object-spec checks (most policies)
`OperatorPolicy`	Operator install state
`CertificatePolicy`	TLS-cert expiry

The change

MR 1 — `inform`-only policy

In clones/platform-gitops, add under clusters/hub-dc-v6/platform/policies/<control-id>/:

apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
  name: <control-id>-<slug>
  namespace: <policy-ns>
  annotations:
    policy.open-cluster-management.io/standards: "<framework>"
    policy.open-cluster-management.io/categories: "<category>"
    policy.open-cluster-management.io/controls: "<control-id>"
spec:
  remediationAction: inform        # always start inform
  disabled: false
  policy-templates:
    - objectDefinition:
        apiVersion: policy.open-cluster-management.io/v1
        kind: ConfigurationPolicy
        metadata:
          name: <control-id>-<slug>-cfg
        spec:
          remediationAction: inform
          severity: medium
          object-templates:
            - complianceType: musthave
              objectDefinition:
                apiVersion: networking.k8s.io/v1
                kind: NetworkPolicy
                metadata:
                  name: default-deny
                  namespace: '{{hub-template-functions: ...}}'
                spec:
                  podSelector: {}
                  policyTypes: [Ingress, Egress]
---
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
  name: <control-id>-<slug>-placement
  namespace: <policy-ns>
spec:
  predicates:
    - requiredClusterSelector:
        labelSelector:
          matchLabels:
            vendor: OpenShift
---
apiVersion: policy.open-cluster-management.io/v1
kind: PlacementBinding
metadata:
  name: <control-id>-<slug>-binding
  namespace: <policy-ns>
placementRef:
  apiGroup: cluster.open-cluster-management.io
  kind: Placement
  name: <control-id>-<slug>-placement
subjects:
  - apiGroup: policy.open-cluster-management.io
    kind: Policy
    name: <control-id>-<slug>

Watch the compliance report

After Argo applies and the spoke policy controllers evaluate, the hub aggregates compliance status:

K_HUB=/home/ze/.kube/configs/hub-dc-v6.kubeconfig
oc --kubeconfig "$K_HUB" -n <policy-ns> get policy <control-id>-<slug> \
  -o jsonpath='{.status.status}{"\n"}'

Output is a list of per-cluster compliance entries:

[map[clustername:spoke-dc-v6 clusternamespace:spoke-dc-v6 compliant:NonCompliant]]

Read the per-cluster details:

oc --kubeconfig "$K_HUB" -n <policy-ns> get policy <control-id>-<slug> \
  -o jsonpath='{range .status.details[*]}{.templateMeta.name}{"\t"}{.compliant}{"\t"}{.history[0].message}{"\n"}{end}'

inform evidence is the input to the enforce decision: the report shows which clusters are out of compliance and what the offending object looks like. Capture the report in the issue.

MR 2 — Promote to `enforce` (if appropriate)

After review of the inform evidence, promote:

spec:
  remediationAction: enforce    # the change
  ...
  policy-templates:
    - objectDefinition:
        ...
        spec:
          remediationAction: enforce
          ...

When set to enforce, the policy controller actively converges the spoke to the policy. For object-existence policies (musthave), the controller creates the missing object. For object-spec policies (mustnothave / mustonlyhave), the controller deletes or replaces the offending object.

This MR is the production rollout. Wait for compliant: Compliant on every spoke before closing the issue.

Canary first

For high-blast-radius policies (deletes objects, rewrites fields), canary by Placement scope rather than going fleet-wide on MR 2:

# MR 2a — canary
spec:
  predicates:
    - requiredClusterSelector:
        labelSelector:
          matchLabels:
            env: canary

# MR 2b — full rollout after canary evidence
spec:
  predicates:
    - requiredClusterSelector:
        labelSelector:
          matchLabels:
            vendor: OpenShift

Label one spoke env=canary (or use an existing canary marker), let MR 2a converge, validate, then MR 2b for full rollout.

Validation

A policy rollout is complete when all of the following are true:

The Policy resource is Compliant on every spoke in the Placement scope.
The on-cluster effect of the policy is visible — for “namespace must have NetworkPolicy”, every namespace has the NetworkPolicy; for “etcd encryption at rest”, apiservers/cluster shows encryption.type: aescbc.
The hub governance dashboard shows the policy with a 100% Compliant ratio.
No new failures appeared in adjacent compliance scans (the new policy did not break another control).
argocd-platform-extensions ClusterRole on the spoke includes any new API groups the policy reads/writes (see Spoke RBAC extension).
The session report captures the inform evidence, the enforce decision, the canary evidence, and the full-rollout validation.

Prevention

inform before enforce. Always. Even if the policy looks safe. The inform evaluation surfaces objects you did not expect to be in scope.
Use Placement labels, not cluster-name selectors. Hardcoding cluster names (name: spoke-dc-v6) does not scale; label-based selection (env=prod) does.
Vendor PolicySets from the upstream policy-templates catalog when possible. Authoring custom policies is fine but they need maintenance; PolicySets vendored from upstream get free updates.
Tag every policy with the framework + control ID in the annotation set above. The hub’s PolicySet aggregation builds compliance bundles from these annotations.

Forbidden actions

Landing an enforce policy without prior inform evidence.
Using evaluationInterval: 0s — that loops the controller and pegs CPU. The default 10s is right for almost everything.
Authoring a policy that conflicts with another policy in scope. The hub does not resolve conflicts; both controllers will fight.
Rolling out a policy across the fleet without canary on at least one spoke.

References

ACM Policy Framework documentation
opp-full-plat/connection-details/compliance-implementor-handbook.md (the PCI-DSS phase chain — example of multi-policy rollout)
opp-full-plat/connection-details/vap-tenant-exclusions.md (the adjacent VAP surface)
ADRs: 0018 (pull-model), 0020 (PCI-DSS baseline), 0026 (IPv6-for-OVN-K amendment)
Issues: #108-#113 (PCI-0 through PCI-5), #135 (PCI-1.13), #229 (this section)
Blog post: RHACM: managing OpenShift fleets at scale §“Governance and policy”