~60 min read · updated 2026-05-12

Cluster lifecycle: import, provision, upgrade, destroy

How clusters join the hub, what Hive does for greenfield provisioning, how ACM orchestrates upgrades it doesn't own, and the failure modes the wizard hides.

A managed cluster is just an OpenShift or Kubernetes cluster the hub knows about. The knowing is encoded in two CRs on the hub (ManagedCluster and KlusterletAddonConfig) and a small agent on the cluster (klusterlet). Once those exist and have shaken hands, ACM can target the cluster with policies, applications, and observability — everything later in this track is built on this primitive.

This module is about how that handshake happens, the three ways to start it, and what to do when it doesn’t.

The three ways a cluster joins the hub

The diagram has three top-row entry points and one common bottom-row outcome. ACM doesn’t care which path you took to get the klusterlet running — once it’s there and the CSR has been approved, the cluster is a managed cluster.

The three paths:

  • Import an existing cluster. You already have a running cluster. ACM generates a YAML bundle on the hub; you oc apply it on the target. The klusterlet boots, registers, and the cluster appears in the ACM console.
  • Provision a new cluster via Hive. You give ACM the infrastructure credentials and the install config; ACM does the OpenShift install for you and then imports the result. End-to-end greenfield.
  • Pull-mode import (sometimes called “auto-import”). The hub writes an auto-import-secret into the spoke’s hub-side namespace, and a small bootstrap agent on the target pulls the klusterlet manifests down. Used when you don’t want to push YAML to the target manually.

In the lab’s v6 fleet, spoke-dc-v6 joined via the first path — it was installed independently as a workload cluster, then imported into hub-dc-v6 once both were stable. Cite /docs/openshift-platform/openshift-platform/acm-multicluster/managedcluster-registration/ for the worked example.

Import an existing cluster — the walkthrough

The shape of a ManagedCluster is small. Almost everything that matters is in the labels:

apiVersion: cluster.open-cluster-management.io/v1
kind: ManagedCluster
metadata:
  name: spoke-dc-v6
  labels:
    name: spoke-dc-v6
    vendor: OpenShift
    cloud: BareMetal
    env: dc
    role: spoke
    gitops-managed: "true"
    cluster.open-cluster-management.io/clusterset: default
spec:
  hubAcceptsClient: true

hubAcceptsClient: true is the consent flag — the hub will accept the klusterlet’s incoming registration handshake. env, role, and gitops-managed are the lab’s three Placement-driving labels (see the Foundations module for the labels-as-API philosophy).

When you create this CR on the hub, ACM does two things. First, it expects a hub-side namespace whose name matches the ManagedCluster name (spoke-dc-v6); without that namespace, registration never completes. Second, it generates an import bundle — two YAML documents you copy to the target:

  • klusterlet-crds.yaml — the CRDs the klusterlet needs (Klusterlet, KlusterletAddon, ManifestWork CRDs).
  • import.yaml — the klusterlet Deployment, ServiceAccount, ClusterRole, bootstrap-hub-kubeconfig Secret, and a Klusterlet CR that wires everything together.

You then run, on the target:

oc apply -f klusterlet-crds.yaml
oc apply -f import.yaml

What happens next is the load-bearing part. The klusterlet’s registration-agent Pod starts, reads the bootstrap-hub-kubeconfig (which has a short-lived token good only for submitting a CSR), and submits a CertificateSigningRequest to the hub’s API server with a subject like system:open-cluster-management:spoke-dc-v6. The hub’s registration controller sees the CSR. If hubAcceptsClient: true is set on the corresponding ManagedCluster, the controller auto-approves it. The signed certificate is materialized back into a Secret on the spoke (hub-kubeconfig-secret), and the klusterlet rolls out its work-agent Pod using the long-lived credentials. From here on, all spoke→hub traffic is outbound, mTLS, and authenticated as that ServiceAccount.

The mistake to avoid is treating the bootstrap-hub-kubeconfig like a long-lived token. It’s a one-shot — good only for the CSR exchange. If you re-run import.yaml on a different cluster, the registration controller will register that cluster under the same name and you’ll lose visibility into the original.

Provision with Hive

Hive is an upstream operator that installs OpenShift clusters on a target infrastructure. ACM ships Hive and orchestrates it. The CR triad:

  • ClusterDeployment“please make me a cluster”. Points at a ClusterImageSet (the OpenShift version), an InstallConfig secret (the install-config.yaml the OpenShift installer would normally read), and an infrastructure credentials Secret (AWS access key, vSphere credentials, baremetal BMC details).
  • ClusterImageSet — a named reference to an OpenShift release image. Reusable across many ClusterDeployments. E.g., img-4.17.10-x86-64quay.io/openshift-release-dev/ocp-release@sha256:....
  • MachinePool (optional) — describes the worker nodes; if absent, Hive uses the InstallConfig’s defaults.

When you oc apply the ClusterDeployment, Hive runs the OpenShift installer in a Pod on the hub, watches the install to completion, and stores the new cluster’s kubeconfig as a Secret. ACM then automatically imports the freshly-installed cluster — you don’t run import.yaml separately, because Hive already has admin credentials for the target and can apply the klusterlet manifests directly.

Use Hive when you want programmatic, infrastructure-as-code cluster provisioning — fleets of identical clusters, ephemeral CI clusters, ClusterPool-backed dev environments. Don’t use Hive for clusters you’ve already installed by hand, or for clusters whose install was driven by an out-of-band process (assisted-installer, agent-based installer that the team operates separately, etc.). For those, import.

The ACM console import wizard

Half the field will use the console first. The wizard at Infrastructure → Clusters → Import cluster asks for the cluster name, a few labels, a target namespace, and then generates the same klusterlet-crds.yaml + import.yaml we just walked through. It does not run anything on the target — it gives you a copy-pasteable oc apply command (or, optionally, your kubeconfig and a button to apply for you).

What the wizard hides is what the CR shape underneath looks like. Two minutes after the wizard, the resulting ManagedCluster is in YAML you can oc edit like any other Kubernetes object. The wizard is convenient for the first cluster; everything after should be in Git.

Labels and ManagedClusterSets

Labels on a ManagedCluster are not metadata — they’re an API. Two things consume them:

  • Placement selectors — when you target a Policy, an ApplicationSet, or an ObservabilityAddon at a subset of clusters, you do it by label selector. The lab’s gitops-managed Placement selects env=dc, role=spoke, gitops-managed=true, which today matches exactly spoke-dc-v6 and tomorrow would match any new spoke that gets the same labels.
  • ManagedClusterSet membership — a ManagedClusterSet is a named group of managed clusters, used for RBAC. You bind a ManagedClusterSet into a namespace with ManagedClusterSetBinding, and users with RoleBindings in that namespace can target the set’s clusters with applications and policies. The default set is, predictably, named default; new sets exist to scope team-A’s policies away from team-B’s clusters.

If your fleet has more than a handful of clusters and more than one ops team, write the ManagedClusterSet boundaries first. Retrofitting RBAC across an established fleet is painful — you have to move clusters between sets, update PlacementBindings, and reconcile ApplicationSet ownership all at once.

Upgrades

ACM does not upgrade managed clusters itself. Each OpenShift cluster’s own Cluster Version Operator (CVO) runs the upgrade. ACM’s role is to be the conductor — to tell the CVO which version to upgrade to and when.

The CR is ClusterCurator. A minimal upgrade looks like:

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: ClusterCurator
metadata:
  name: spoke-dc-v6
  namespace: spoke-dc-v6
spec:
  desiredCuration: upgrade
  upgrade:
    desiredUpdate: 4.17.10
    channel: stable-4.17
    monitorTimeout: 120

ACM picks this up, patches the spoke’s ClusterVersion resource to set the desired update, and watches the CVO complete the upgrade. You get fleet-wide upgrade visibility in the ACM console — cluster A is at 4.16, cluster B is upgrading 4.16 → 4.17, cluster C is 4.17 — without losing the property that each cluster owns its own reconcile.

The implication: you can still write to a managed cluster’s ClusterVersion directly (break-glass) and ACM will reflect the change. ACM is the conductor, not the gatekeeper. For policy-enforced upgrade gates (“no spoke may be more than one minor behind hub”), wrap the desired state in a Policy bound to a fleet-wide Placement — covered in Module 04.

Destroying a cluster

This is where the wizard betrays you most often. oc delete managedcluster <name> on the hub does not tear down the underlying cluster. It removes the hub-side state, revokes the klusterlet’s credentials, and stops shipping ManifestWork — the cluster keeps running, just no longer managed. This is called “detaching.”

For Hive-provisioned clusters, deleting the ClusterDeployment is what actually destroys the cluster. Hive runs the OpenShift installer’s destroy flow — releases cloud resources, deletes machines, cleans up DNS. The ClusterDeployment finalizer guarantees Hive completes that flow before the CR disappears.

If you ever import a cluster, then later oc delete managedcluster, then later oc delete clusterdeployment (for example, you forgot the cluster was Hive-managed) — Hive will not destroy it, because Hive only owns clusters whose lifecycle it started. You’ll need to run the OpenShift installer’s destroy flow by hand against the install dir, or tear down infrastructure manually.

ClusterPool flows complicate this further: a cluster taken from a pool via ClusterClaim is destroyed when the claim is deleted. Pools and claims are out of scope here; we cover them in Module 08.

Access control across the fleet

Once you have more than one managed cluster, who can do what to which one stops being a per-cluster question. RHACM adds an authorization layer on top of the per-cluster ones, and the two layers are easy to confuse.

Two layers, not one

RHACM does not replace per-cluster RBAC. Each managed cluster still has its own kube-apiserver, its own OAuth or OIDC identity provider, and its own RoleBindings. When a user oc-logs into the spoke directly, none of the hub’s RBAC is in the path — the spoke authenticates and authorises that request the same way it did before it was managed.

What RHACM adds is a fleet-wide authorisation layer for operations that go through the hub. Creating an ApplicationSet that targets ten clusters, applying a Policy that fans out to a subset, opening the All-Clusters view in the console, running a Search query across the fleet — every one of those is an operation against the hub API, and the hub’s RBAC decides who’s allowed to do them. Per-cluster RBAC then decides what the addon agents are permitted to apply on the target.

You need both layers correct. A user with hub-side policy:admin and no spoke-side identity can still author and propagate policies; their identity stops at the hub. A user with cluster-admin on one spoke but no hub binding can’t see that spoke through the ACM console at all.

ManagedClusterSet and ManagedClusterSetBinding

The hub-side grouping primitive is ManagedClusterSet. A set is a named collection of ManagedClusters, either chosen explicitly or matched by a clusterSelector. Out of the box you get default (everything that doesn’t belong to a named set lands here) and global (every managed cluster, always).

A ManagedClusterSetBinding is what scopes a set into a particular hub-side namespace. Without a binding in your namespace, your Placement resources cannot target the set — the PlacementDecision will come back empty no matter how good your label selectors are. The binding is the gate that stops team-a in namespace app-team-a-policies from accidentally targeting the prod-eu set that belongs to team-b.

The mental model: the set is the grouping, the binding is the permission to consume the grouping from a specific namespace.

The bind / admin / view ClusterRole pattern

ACM ships three ClusterRoles per set, named with a predictable pattern:

ClusterRoleWhat the holder can do
open-cluster-management:managedclusterset:view:<name>See the set and its members; read-only.
open-cluster-management:managedclusterset:bind:<name>Create a ManagedClusterSetBinding referencing this set from any namespace the user has write access to.
open-cluster-management:managedclusterset:admin:<name>Full control over the set and the ManagedCluster, ClusterDeployment, and ClusterClaim resources that carry the matching set label.

The bind role is the load-bearing one and the one people forget. A user with admin on a set still cannot make the set usable from their namespace unless they also have bind — without it, the binding-creation request is rejected. In a tenant onboarding flow, bind is the role you actually delegate; admin stays with the platform team.

How identities reach the spoke

A second misconception: ACM does not project user identities into managed cluster RBAC by default. The hub knows that Alice is alice@corp with hub-side permissions; the spoke has no idea Alice exists unless you’ve solved the identity problem separately.

Two patterns work:

  • Federated identity provider. Every cluster — hub and spokes — uses the same OAuth/OIDC source, so the same user logs into all of them with the same identity. Spoke-side ClusterRoleBindings reference the same alice@corp username (or group claim) that the hub uses. The lab takes this path with Keycloak: see /docs/openshift-platform/lab-infrastructure/other-platform-vms/keycloak-oidc. One IdP, every cluster trusts it, identities line up by name.
  • Service-account projection per cluster. Where federation isn’t available — disconnected sites, third-party tenants — ACM addons run as their own ServiceAccount on the spoke, and you map those SAs to spoke-side ClusterRoles via the cluster-permission component. This is how RHACS, GitOps, and other addons get whatever spoke privileges they need without bringing user identities into the loop.

Most labs and most enterprises end up with the first model. The second is the fallback when policy or topology forbids a shared IdP.

A worked example

A small but complete set of resources that scopes a team’s reach to a single environment:

apiVersion: cluster.open-cluster-management.io/v1beta2
kind: ManagedClusterSet
metadata:
  name: production
spec:
  clusterSelector:
    selectorType: LabelSelector
    labelSelector:
      matchLabels:
        env: prod
---
apiVersion: cluster.open-cluster-management.io/v1beta2
kind: ManagedClusterSetBinding
metadata:
  name: production
  namespace: app-team-foo
spec:
  clusterSet: production

Now grant app-team-foo the bind role on the set, plus the usual namespace admin for their own policies and applications:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: app-team-foo-bind-production
  namespace: app-team-foo
subjects:
  - kind: Group
    name: app-team-foo
roleRef:
  kind: ClusterRole
  name: open-cluster-management:managedclusterset:bind:production
  apiGroup: rbac.authorization.k8s.io

With these three resources, anyone in the app-team-foo group can author Placement, ApplicationSet, and Policy resources in their namespace that target the production set — and only that set. Their PlacementDecisions will list cluster names from the set; nothing else.

klusterlet-work-sa — the audit identity on the spoke

When a hub-side Application or Policy lands a Deployment on a spoke, the request to the spoke kube-apiserver is not authenticated as the user who created the hub-side resource. It’s authenticated as the klusterlet-work-sa ServiceAccount, which lives in open-cluster-management-agent on the spoke. Every action ACM takes on a spoke shows up in the spoke’s audit log as this identity.

Auditors notice this. The mitigation is twofold: keep the klusterlet-work-sa scoped tightly via the addon’s RBAC, and make sure the hub-side audit log captures the user-identity-to-Application-creation chain — so you can reconstruct Alice asked for X, ACM applied X as klusterlet-work-sa on cluster Y.

Try this

A short tour you can run on a hub with at least one managed cluster:

oc get managedclusterset
oc get managedclustersetbinding -A
oc -n <your-namespace> auth can-i create placement

The first two commands list every set on the hub and every namespace that has a binding. The third tells you whether your identity in <your-namespace> can actually create a Placement — useful before you spend ten minutes writing one only to discover the binding was missing.

Common failure modes

A user has admin on a set but cannot place anything. Almost always the ManagedClusterSetBinding is missing in their namespace. admin is necessary but not sufficient; without a binding, Placement resources in that namespace see zero clusters.

A Policy fans out to nobody. The Placement looks correct, the labels look correct, but PlacementDecision is empty. Check that the Placement’s clusterSets field references a set that’s bound into the same namespace as the Policy. A typo in the set name or a missing binding produces the same symptom.

A user can see clusters in the console but operations fail on the spoke. They have hub-side view, but no projected identity on the spoke. Either wire the federated IdP, or accept that this user is a hub-only operator and route any direct-to-spoke work through the addon SA.

The lab’s actual flow

spoke-dc-v6 was imported, not provisioned. It was installed standalone (compact-3-AIO + 3 baremetal workers via agent-based installer), then once the install was stable, six manifests in platform-gitops landed on hub-dc-v6 to register it — the namespace, the ManagedCluster, the KlusterletAddonConfig, a ManagedClusterSetBinding, a Placement, and a GitOpsCluster. The bootstrap import bundle was applied once on the spoke as a break-glass operation; everything after is GitOps.

The Placement that activates work on spoke-dc-v6 selects env=dc, role=spoke, gitops-managed=true. The naming reflects the pre-v6 → v6 transition: pre-v6 had four clusters (hub-dc, spoke-dc, hub-dr, spoke-dr); v6 retained the dc and spoke labels so the Placement shape stays stable when (if) a DR pair is built later. See /docs/openshift-platform/openshift-platform/acm-multicluster/managedcluster-registration/ for the full manifest set.

Try this

Three exercises with rising difficulty.

1. Read the live ManagedCluster CR. On a cluster with ACM, run:

oc get managedcluster -o yaml
oc get managedcluster <name> -o jsonpath='{.metadata.labels}{"\n"}'
oc get managedcluster <name> -o jsonpath=\
  '{.status.conditions[?(@.type=="ManagedClusterConditionAvailable")].status}{"\n"}'

Inspect the labels. Inspect .status.conditions. The ManagedClusterConditionAvailable=True condition is the one that says “the work-agent is reaching back to the hub.” Anything else means the spoke can’t talk to the hub right now.

2. Write a ManagedClusterSet and ManagedClusterSetBinding. Imagine your ops team needs to target every cluster labelled tier=prod with their internal-baseline policies. Sketch the YAML:

  • A ManagedClusterSet named prod-fleet.
  • A ManagedClusterSetBinding that exposes prod-fleet into the ops-policies namespace.
  • A RoleBinding giving the ops group the open-cluster-management:managedclusterset:bind:prod-fleet ClusterRole in ops-policies.

Without all three pieces, the ops team’s Placements in ops-policies will return zero decisions.

3. Sketch a compliance-narrowing Placement. Write the Placement YAML that targets only managed clusters labelled compliance=pci-dss, tolerating the unreachable taint so transient network blips don’t drop clusters from the decision (the lab’s pattern — see the registration doc). Don’t bind it to anything; just write the Placement.

Common failure modes

The CSR auto-approve loop stalls. Symptom: ManagedCluster.status.conditions[?(@.type=="ManagedClusterConditionAvailable")].status stays Unknown. Most often the spoke’s clock is more than 5 minutes off the hub’s. The CSR’s signed certificate has a NotBefore from the hub’s perspective; the klusterlet rejects it as not-yet-valid. Fix: chrony working on both sides. Less commonly: the registration controller is misconfigured to not auto-approve (hubAcceptsClient: false or the CSR-approver controller crashed).

The klusterlet bootstrap fails silently. Symptom: you ran oc apply -f import.yaml on the spoke, the registration-agent Pod started, then nothing. The hub-side ManagedCluster never flips to Available. Almost always a network reachability issue — the spoke can’t reach api.<hub>.<domain>:6443. Check from the spoke: curl -kv https://api.<hub>.<domain>:6443/version. If that fails, fix DNS, firewall rules, or proxy config first; the klusterlet’s logs will only say “context deadline exceeded” with no hint about why.

The import command succeeds but no add-ons appear. Symptom: ManagedCluster is Available=True, the console shows the cluster green, but Policy, ApplicationSet, and observability all behave as if the cluster doesn’t exist. Cause: you forgot to create the KlusterletAddonConfig for the cluster. Without it, ACM doesn’t ship application-manager, policy-controller, or cert-policy-controller to the spoke. Fix: add the KlusterletAddonConfig (same name as the ManagedCluster, in the matching namespace) and watch the addons land within a minute.

You imported a cluster and now want to delete it forever. Detach with oc delete managedcluster, then either (a) for Hive-provisioned, oc delete clusterdeployment to tear down the infrastructure, or (b) for imported clusters, run the OpenShift installer’s destroy flow against the original install directory or decommission the underlying VMs / hardware manually. ACM doesn’t track the install metadata for imported clusters — only the cluster admin who installed it does.

Where this is heading

You can now make a managed cluster appear, give it labels that mean something, and watch it disappear cleanly. Everything in Module 04 is about expressing what should be true on every cluster the Placement selects, and Module 05 is about shipping workloads the same way.

Next: Module 04 — Policies — write a Policy, propagate it via PlacementBinding, watch compliance flow back, and flip from inform to enforce without breaking production.

References