Application lifecycle across the fleet

Ship one declared workload to many clusters with a single ApplicationSet on the hub — generators, Placements, sync waves, and the lab's actual flow.

Module 05 set up GitOps and the pull model. Now we use it. The job: take one declared workload — a Helm chart, a kustomize overlay, a directory of YAML — and ship it to many clusters without copy-pasting an Argo Application per cluster.

ACM gives you two ways. ApplicationSet (Argo CD-native) is what 80% of new work uses. Subscription (ACM’s pre-Argo model) is still supported and worth a paragraph because some old manifests still wire it in. This module spends most of its time on ApplicationSet.

The two models, again

Model	Engine	When to pick
ApplicationSet	Argo CD	Default. Git is the source, Argo is the engine, ACM provides the cluster inventory and placement.
Subscription	ACM (pre-Argo)	Vanilla-Kubernetes managed clusters with no Argo, or maintaining legacy `application.app.k8s.io/v1beta1` Applications already wired in.

Both are “actively supported” in ACM 2.x. New work should pick ApplicationSet unless you have a concrete reason — and “we already wrote a Subscription five years ago” doesn’t count.

ApplicationSet generators that matter for multicluster

An ApplicationSet is a template plus one or more generators. The generator emits parameters; the template fills in those parameters and produces one Application per emission. The Argo CD documentation lists eight generators; four matter for multicluster work.

Cluster generator

The primary fan-out tool. It emits one parameter set per registered Argo cluster, with optional filtering by a Placement (in ACM) or a label selector (vanilla Argo). One ApplicationSet, one Application per matched cluster, identical manifests applied everywhere.

List generator

A fixed cartesian list of parameter sets. Useful for ad-hoc environments where the cluster list is short and human-curated — “deploy to staging-1, staging-2, perf-eu” — and you don’t want a label-driven Placement.

Git generator (directory or file)

Emits one parameter set per matching directory or YAML file in a Git repo. Useful when “the list of things to deploy” lives in Git as folders — for example, one folder per tenant. Combine with the cluster generator via the matrix generator to get “for every cluster, for every overlay, make an Application.”

Pull-request generator

Emits one parameter set per open pull request matching a filter. The review-app pattern: every PR gets a temporary deploy, every merge cleans it up. Rare in platform work, common in product engineering.

A matrix example

This deploys three overlays (dev, stg, prd) of one Helm chart across every spoke labeled env=dc:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: web-tenant
  namespace: openshift-gitops
spec:
  generators:
    - matrix:
        generators:
          - clusters:
              selector:
                matchLabels:
                  env: dc
          - git:
              repoURL: https://gitlab.example/apps/web-tenant.git
              directories:
                - path: overlays/*
  template:
    metadata:
      name: 'web-{{name}}-{{path.basename}}'
    spec:
      destination: { server: '{{server}}', namespace: 'web' }
      source:
        repoURL: https://gitlab.example/apps/web-tenant.git
        path: '{{path}}'

For two spokes that’s six Applications generated. Add a third cluster, you get nine — no edits.

Placement and labels

Placement is ACM’s cluster-selector primitive. Its job is producing a PlacementDecision — a list of ManagedCluster names that match the predicate. Argo’s ApplicationSet cluster generator can consume that decision directly via the ACM integration.

Predicates are label selectors. Common shapes:

env=dc, role=spoke — all production data-centre spokes.
compliance=pci-dss — the regulated subset.
region in (eu-west, eu-central) — geo fan-out.
vendor=Red Hat, openshiftVersion>=4.18 — version-gated rollouts.

ACM auto-labels managed clusters with vendor, cloud, region, openshiftVersion, and a few others. You add your own (env, role, compliance, feature=experimental) as flat-key labels on the ManagedCluster resource.

The lab today runs one spoke, so its primary ApplicationSet has a trivial Placement that matches cluster=spoke-dc-v6 and emits exactly one decision. See /docs/openshift-platform/openshift-platform/gitops-operating-model/application-and-applicationset/ for the actual definition. The shape is identical to a fleet of fifty — the predicate just gets more selective.

app-repo (kustomize overlays)

ApplicationSet (cluster generator)

Placement (label predicate)

PlacementDecision (matched clusters)

Application for spoke-dc-v6

Application for spoke-dr-v6

spoke-dc-v6 Argo CD

spoke-dr-v6 Argo CD

Reading the diagram: the Placement and PlacementDecision live on the hub; the ApplicationSet templates one Application per matched cluster; each spoke’s local Argo (the pull-model agent) reaches up and reconciles its assigned Application against the Git repo.

The Subscription model, briefly

A Subscription on the hub points at a Helm chart or a Git path; a Channel resource describes where to discover updates; a SubscriptionAdmin controls which subscriptions can deploy where. ACM aggregates the rolled-up deployments into an application.app.k8s.io/v1beta1 Application for the console view.

It predates ACM’s Argo CD integration and still works fine. The cases where it earns its keep:

Managed clusters running plain Kubernetes (no Argo CD operator installed).
Existing operational habits and dashboards built around the old model.
Helm-chart fan-out where the chart already lives in a Helm repo and re-wrapping it in an ApplicationSet feels like overhead.

If you’re starting fresh on OpenShift, pick ApplicationSet. The console even surfaces ApplicationSet-generated Applications inside the same application.app.k8s.io view, so the UX gap is small.

The promotion problem

ApplicationSet does not have promotion semantics. It fans out the same Application — pointed at the same Git ref — to every matched cluster. There is no built-in “promote from dev to staging” knob.

The pattern that works: your app-repo holds kustomize overlays per environment, each pinned to a specific image digest. The ApplicationSet’s path parameter changes per environment, so env=dev clusters get overlays/dev, env=prd clusters get overlays/prd. Promotion = a PR to the app repo that bumps the prd overlay to the digest currently running in stg.

See /docs/openshift-platform/application-delivery/build-paths/build-once-promote-by-digest/ for how the lab wires this. The short version: CI builds once, tags an image digest, opens a PR to bump the next env’s overlay to that digest, a human approves the PR, ArgoCD reconciles.

Sync waves and dependencies

Argo’s argocd.argoproj.io/sync-wave annotation orders manifests within a single Application sync. Lower wave numbers go first. Managed-cluster Applications inherit this — when the spoke’s Argo reconciles, it respects the same waves.

The lab’s documented convention (/docs/openshift-platform/openshift-platform/gitops-operating-model/sync-wave-conventions/) reserves:

Wave 1 — cert-manager, namespace setup.
Wave 2 — External Secrets Operator (ESO).
Wave 3 — ClusterSecretStore and tenant SecretStore resources.
Wave 10 — operator installs and CR roll-outs.
Wave 20 — application workloads that need the operators above.

You almost never see waves 4–9 used in the lab; the gap is room to wedge in new platform layers without renumbering.

Argo Rollouts — progressive delivery across clusters

ApplicationSet says where a workload runs. Argo Rollouts says how a new version of that workload replaces the old one. The two are independent — you can run Rollouts without ApplicationSet, or vice versa — but the interesting wins show up when you compose them across a fleet.

For a single cluster, the question is well-understood: a blue/green swap, or a canary that shifts 10% of traffic to the new version, observes some metric, and either promotes or rolls back. The multicluster questions are different and more interesting:

Can you canary a release on one cluster while the rest of the fleet stays on the previous version? (Yes — that’s per-cluster blast radius.)
Can you stagger a fleet-wide rollout in waves, gating each wave on the previous wave’s health? (Yes — combine with Placement waves.)
Can a hub-side metrics endpoint gate promotion on every spoke’s canary? (Yes — point the AnalysisTemplate at hub Thanos.)

Each of these is what Argo Rollouts buys you on top of ApplicationSet’s fan-out.

The primitives

Argo Rollouts ships as a separate operator from Argo CD — installed by the Red Hat OpenShift GitOps operator, configured via a RolloutManager CR. Once installed, three resource kinds matter:

CR	Replaces	Role
`Rollout`	`Deployment`	A managed ReplicaSet pair (stable + canary) plus a strategy that says how to swap traffic between them.
`AnalysisTemplate` + `AnalysisRun`	(nothing)	A reusable specification for “query metric X, succeed if <= threshold, fail if > threshold.” A `Rollout` step references the template; the controller creates an `AnalysisRun` to evaluate it.
`Experiment`	(nothing)	Spins up a short-lived parallel ReplicaSet with its own traffic share for A/B testing without committing to a rollout.

A Rollout looks like a Deployment but with a strategy block that’s structured rather than free-form — blueGreen or canary, each with its own ergonomics.

Strategies

Blue/green with previewService and activeService. The Rollout maintains two Services; the active one points at the stable ReplicaSet, the preview points at the new one. You verify the new version against the preview Service (an in-cluster test rig, a manual smoke), then promote, which swaps the selector on the active Service. Traffic shifts in one step.

Canary with traffic-shifting. The Rollout walks through declared steps — setWeight: 10, pause: 60s, setWeight: 25, pause: 5m, etc. Traffic shaping happens via an integrated traffic provider: OpenShift Routes, Istio, AWS ALB, NGINX, or SMI. The pause can be timed (pause: 60s) or indefinite (pause: {}) — indefinite waits for a human kubectl argo rollouts promote.

Header- or cookie-based dark-launch. Through the traffic provider, you can route a specific cookie or header to the canary while everyone else still hits stable. Useful for letting your own engineers hit the new build before any production user does.

Multicluster patterns with ApplicationSet

The natural pairing is to wrap the Rollout (and its Services, the AnalysisTemplate) in an ApplicationSet whose cluster generator fans out across spokes. Two distinct patterns sit on top:

Per-cluster independent rollouts. Every matched cluster runs its own Rollout, each driven by Argo Rollouts on that spoke. If you canary, each spoke independently walks through the steps with its own AnalysisRun querying the spoke’s local Prometheus. This is the simplest pattern; the cost is that “is the rollout going well?” is answered per cluster, not fleet-wide.

Shared analysis against hub metrics. The Rollout CR is still per cluster, but the AnalysisTemplate’s metric provider points at the hub’s Thanos query endpoint. Every spoke’s canary blocks on the same metric, queried with a cluster label. The benefit is a single SLO that gates rollouts everywhere; the cost is that the hub becomes load-bearing for promotion — if Thanos query is down, no spoke promotes.

For staggered fleet rollout, the lever is Placement rather than the Rollout itself. Wave 1 of your Placement selects env=canary clusters only; the ApplicationSet generates Rollouts there first. After a soak period, you flip the Placement to include env=prod-eu; the ApplicationSet generates Rollouts on those next, picking up the same already-promoted image digest. This is the multicluster equivalent of canary-cluster-then-fleet.

A worked Rollout

A canary that pauses at 10% for human approval, then auto-walks to 100% only if the error rate stays sane:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata: { name: web }
spec:
  replicas: 6
  strategy:
    canary:
      steps:
        - setWeight: 10
        - pause: {}                 # wait for manual promote
        - setWeight: 50
        - analysis:
            templates: [{ templateName: error-rate-p99 }]
        - setWeight: 100
  selector: { matchLabels: { app: web } }
  template:
    metadata: { labels: { app: web } }
    spec:
      containers:
        - { name: web, image: registry.lab/web:GIT_SHA }

The companion AnalysisTemplate:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata: { name: error-rate-p99 }
spec:
  metrics:
    - name: error-rate
      interval: 30s
      successCondition: result[0] <= 0.01
      provider:
        prometheus:
          address: http://thanos-query.observability.svc:9090
          query: |
            sum(rate(http_requests_total{app="web",code=~"5.."}[5m]))
              / sum(rate(http_requests_total{app="web"}[5m]))

If the 5xx rate stays under 1% for the analysis window, the rollout proceeds to setWeight: 100. If it breaches, the AnalysisRun fails and the Rollout auto-rolls-back to stable.

The gotcha for banking

For a payment service you almost never want fully-automatic promotion. The CAB-grade pattern is pause: {} (indefinite) at 10%, with a human running kubectl argo rollouts promote web after the change-board sign-off. You still get automated traffic-shifting and automated rollback on metric breach; you just don’t surrender the promotion decision to an interval timer. Argo Rollouts is comfortable with that pattern — the manual gate is the default for high-stakes services in regulated environments.

For lower-stakes internal services, fully-automated promotion with a metric gate is fine. Don’t apply the same gating ceremony everywhere; pick the gate that matches the blast radius.

Failure-mode shape

Three states tell you what happened.

Rollout Paused. Either intentional (a step has pause: {} waiting on a human) or accidental (nobody clicked promote). kubectl argo rollouts get rollout web shows which step it’s parked on; the UI gives you a button if you’d rather click.

Rollout Degraded. Either an AnalysisRun’s metric threshold breached and the rollout aborted to stable, or the new ReplicaSet failed health checks and the controller refused to advance. The AnalysisRun’s status tells you which metric and which value tripped it; the ReplicaSet’s pod events tell you the readiness story.

Rollout Healthy but the app is broken. The most painful failure mode. The metric you chose for promotion gating didn’t capture the regression — say, you gated on 5xx rate but the bug is a 200-OK response with the wrong value. Argo Rollouts can’t catch what you didn’t tell it to measure. The fix is on the AnalysisTemplate side: better SLIs, more of them, or a synthetic transaction check alongside the volume metrics.

References

Drift handling

Argo’s selfHeal: true reverts manual edits. prune: true removes resources that were dropped from Git. Both are typical for managed-cluster Applications, because the whole point of pull-model GitOps is that the cluster matches Git, not the operator’s last hotfix.

There are a few cases where you turn drift on its head with ignoreDifferences:

Fields that controllers default at admission time (spec.clusterIP, metadata.uid).
Fields a mutating webhook rewrites — for example, a service mesh injecting sidecar annotations.
Replica counts when a HorizontalPodAutoscaler owns them.

The lab’s pattern: write the ignoreDifferences block once in a shared base, not per Application. Drift exceptions that proliferate are a smell — usually it means a controller is fighting GitOps and you should pull that controller’s config into Git instead.

The lab’s actual flow

On hub-dc-v6, the openshift-gitops instance hosts a small set of ApplicationSets in clusters/hub-dc-v6/gitops-control/ (in the platform-gitops repo):

platform-bootstrap — the day-one stack for any spoke (operators, ESO, cert-manager, monitoring addons).
app-tenants — generates one tenant scaffolding Application per tenant directory.
placementdecision-rbac — issues the spoke-side ClusterRole bindings that the pull-model Argo extension needs.

The spoke runs its own openshift-gitops (the pull-model agent). It receives Applications generated on the hub via the gitops-addon, then reconciles each one against Git directly — no manifest ever flows through the hub at runtime. ACM’s job stops at “here is the Application you should be reconciling”; Argo on the spoke does the rest.

Try this

Read the lab’s ApplicationSet definitions for app-tenants and placementdecision-rbac in clusters/hub-dc-v6/gitops-control/ of platform-gitops. Trace one tenant from the generator to the spoke Application it produces.
Add a label feature=experimental to one spoke (oc label managedcluster spoke-dc-v6 feature=experimental). Write a Placement that selects only feature=experimental and watch a PlacementDecision get produced. Remove the label, see the decision empty out.
Build a matrix-generator ApplicationSet that deploys three overlays of one Helm chart across two clusters. Compare the count of generated Applications to what you predicted on paper.

Common failure modes

Generator generates, but the templated Application stays Unknown / OutOfSync. The spoke’s Argo lacks RBAC for the resource kind. See Module 05 — the argocd-platform-extensions ClusterRole consolidates the API groups the spoke needs. Add the missing API group, restart the spoke’s Argo application-controller.
PlacementDecision is empty. Your predicate matches nothing. Run oc get managedcluster --show-labels and compare against the selector.
An old DeployableSubscription shows up alongside an Application. Pre-Argo subscription path is still wired in some old manifests. Decide: rewrite the manifest as an ApplicationSet, or leave the Subscription alone — but don’t run both for the same workload, they will fight.
metadata.name collisions in the template. Matrix generators can produce duplicate names if you forget to interpolate every dimension into the name. Always include {{name}} (cluster) and {{path.basename}} (overlay) or you’ll get one Application overwriting another.
selfHeal reverts something a human needs to set live. That’s the whole point of selfHeal. Either move the setting into Git, mark the field ignoreDifferences, or accept that selfHeal will keep undoing the change.

References

ACM application lifecycle docs — https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/
Argo CD ApplicationSet generators — https://argo-cd.readthedocs.io/en/stable/operator-manual/applicationset/Generators/
Argo CD sync waves and resource hooks — https://argo-cd.readthedocs.io/en/stable/user-guide/sync-waves/
Open Cluster Management Placement API — https://open-cluster-management.io/concepts/placement/
Lab — /docs/openshift-platform/openshift-platform/gitops-operating-model/application-and-applicationset/
Lab — /docs/openshift-platform/openshift-platform/gitops-operating-model/sync-wave-conventions/
Lab — /docs/openshift-platform/application-delivery/build-paths/build-once-promote-by-digest/

Next: Module 07 — Observability and Search covers what to do once Applications are landing on many clusters and you want one screen to see them all.