OpenShift Service Mesh 3 — ambient mode

How OSSM 3 is installed on spoke-dc-v6 using the sail-operator's openshift-ambient profile (Istio + IstioCNI + ZTunnel), how ambient differs from sidecar mode, the bank-employees-jboss-chat pilot, and why bank-payment was rolled back.

OpenShift Service Mesh 3 is the productised wrap of upstream Istio. On spoke-dc-v6 it runs in ambient mode — no per-pod sidecars; an L4 proxy DaemonSet (ztunnel) and an optional per-namespace L7 proxy (waypoint) instead. This page is the operator/operand layout, why ambient, and how workloads opt in.

Why ambient mode

Sidecar mode (Istio 1.x classic) injects an Envoy sidecar into every pod. It is mature and well-known, but has two persistent costs:

  • Memory/CPU per pod. Every workload pod carries an Envoy. On a busy cluster that adds up to gigabytes of memory and noticeable extra CPU.
  • Operational friction. Sidecar injection is a mutating webhook; pods need annotations; existing apps need restart cycles. Restarts to upgrade Istio.

Ambient (Istio 1.22+) splits the data plane:

  • ztunnel — a per-node DaemonSet that handles L4 mTLS between pods.
  • Waypoint proxy — a per-namespace (or per-service-account) Envoy Pod for L7 features (HTTP routing, authorisation policies, retries, fault injection).
  • No sidecar. Workload pods are unmodified.

Apps that need only L4 mTLS pay only the per-node ztunnel cost (a few pods cluster-wide). Apps that need L7 pay for a Waypoint, but only those namespaces. Upgrades reroll ztunnel or waypoints — not every workload.

The trade-offs are real:

Ambient proSidecar pro
Lower per-pod overheadWorkload-attributed CPU/memory accounting
No app restart for upgradesMore mature security profile (longer in production)
No mutating webhook gotchaPer-pod traffic control independent of namespace
L4-only is a real free tierSome advanced Envoy features only in sidecars

The lab is on ambient. The platform-gitops files are under platform-services/service-mesh/.

Architecture

Reading the diagram:

  • The sail-operator is the OSSM 3 operator. It owns three CRs: Istio, IstioCNI, ZTunnel.
  • Istioistiod is the control plane.
  • IstioCNIistio-cni DaemonSet runs the CNI plugin that redirects pod traffic into ztunnel.
  • ZTunnelztunnel DaemonSet is the per-node L4 proxy.
  • Workload pods in labelled namespaces have their traffic redirected by the CNI to the local ztunnel for outbound encryption.
  • A Waypoint is optional; it provides L7 features for namespaces or service accounts that need them.

The operand CRs

Istio

apiVersion: sailoperator.io/v1
kind: Istio
metadata:
  name: default
  annotations:
    argocd.argoproj.io/sync-wave: "26"   # after IstioCNI
spec:
  namespace: istio-system
  profile: openshift-ambient
  version: v1.28-latest
  updateStrategy:
    type: InPlace
    inactiveRevisionDeletionGracePeriodSeconds: 30
FieldWhy this value
namespace: istio-systemConventional namespace for the Istio control plane.
profile: openshift-ambientOSSM-flavoured ambient profile — sets OCP SCC handling, ambient defaults, and Istio knobs Red Hat support.
version: v1.28-latestPins to 1.28; -latest follows patches within 1.28.
updateStrategy.type: InPlaceUpgrade-in-place, vs RevisionBased which keeps two revisions side-by-side.
inactiveRevisionDeletionGracePeriodSeconds: 30If we ever switch to RevisionBased, this is the cleanup window.

IstioCNI

apiVersion: sailoperator.io/v1
kind: IstioCNI
metadata:
  name: default
  annotations:
    argocd.argoproj.io/sync-wave: "25"
spec:
  namespace: istio-cni
  profile: openshift-ambient
  version: v1.28-latest

The CNI DaemonSet handles the iptables/eBPF redirect that captures pod traffic for ambient mode. It needs the OCP-tuned profile because OVN-Kubernetes has specific config-directory conventions for CNI chaining.

ZTunnel

apiVersion: sailoperator.io/v1
kind: ZTunnel
metadata:
  name: default
  annotations:
    argocd.argoproj.io/sync-wave: "27"   # after Istio
spec:
  namespace: ztunnel
  version: v1.28-latest

Three CRs, three DaemonSets/Deployments, in three namespaces. The split lets you upgrade ztunnel independently of istiod when only a data-plane patch is needed.

How a namespace joins the mesh

Three options:

ModeHow to opt inEffect
Ambient (L4)kubectl label ns my-ns istio.io/dataplane-mode=ambientPods get L4 mTLS via ztunnel. No sidecar.
Ambient + waypoint (L7)Add a Gateway resource of type waypoint in the nsAdds a per-ns waypoint Pod for L7.
Sidecar (legacy)kubectl label ns my-ns istio-injection=enabledStandard Istio injection. Works alongside ambient on the same cluster.

The lab convention is: ambient by default; waypoints opt-in per namespace; sidecar reserved for workloads that need Envoy-specific features not exposed by ambient.

# Opt-in a tenant namespace to ambient
oc label namespace apps-team-x istio.io/dataplane-mode=ambient

# Verify the redirect is wired
oc -n apps-team-x exec -it deploy/sample -- ss -tlnp
# Look for the Istio dnat hooks.

No pod restarts are required to join ambient — the CNI redirect captures live traffic.

mTLS

Ambient automatically does strict mTLS between ambient-labelled pods. There is no per-workload knob; the data plane upgrades all in-cluster traffic to mTLS as soon as both sides are ambient. Traffic to non-ambient destinations falls back to plaintext (configurable).

For ambient policy enforcement you write AuthorizationPolicy CRs in istio-system or in the workload namespace; they apply to ambient traffic the same way as sidecar traffic.

The lab’s tenant convention is to additionally pin PeerAuthentication at the namespace level:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: <tenant-ns>
spec:
  mtls:
    mode: STRICT

STRICT rejects any plaintext that somehow reaches a workload pod, even from inside the same namespace. Ambient already encrypts pod-to-pod via ztunnel, but PeerAuthentication is the explicit declaration that a tenant accepts mTLS-only — it’s what an auditor will look for in the cluster, not a label on a namespace.

ztunnel workload-identity RBAC

For ztunnel to mint SPIFFE certs on behalf of a tenant’s ServiceAccounts, it needs serviceaccounts/token create + serviceaccounts impersonate on that namespace. This is namespace-scoped — not cluster-wide — so each tenant that joins ambient ships its own RoleBinding to the ztunnel SA:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ztunnel-workload-identity
  namespace: <tenant-ns>
rules:
  - apiGroups: [""]
    resources: ["serviceaccounts/token"]
    verbs: ["create"]
  - apiGroups: [""]
    resources: ["serviceaccounts"]
    verbs: ["impersonate"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: ztunnel-workload-identity
  namespace: <tenant-ns>
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ztunnel-workload-identity
subjects:
  - kind: ServiceAccount
    name: ztunnel
    namespace: ztunnel

Missing this RoleBinding produces the most painful failure mode of ambient mode: istiod’s CSR endpoint returns request impersonation authentication failure on every workload-identity request. Pods come up, but every kubelet probe fails because ztunnel can’t authenticate the workload to itself.

Live tenants

Tenant nsModeNotes
bank-employees-jboss-chatAmbient L4 mTLSPilot — MRs !130/!131/!132. ztunnel metrics confirm connection_security_policy="mutual_tls" for BFF→chat-backend and BFF→eap-domain-group-a/b. No waypoint.
ossm3-demoAmbient L4 mTLSFirst mesh demo on the cluster; proves the ambient enrollment path.
bank-paymentRolled backAttempted on feature/bank-payment-ambient-mesh then reverted on feature/bank-payment-mesh-rollback. See below.

bank-employees-jboss-chat — the working ambient pilot

The change footprint was small:

  • namespace.yaml adds istio.io/dataplane-mode=ambient.
  • A ztunnel-workload-identity namespace-scoped Role + RoleBinding.
  • NetworkPolicy adjustments to permit intra-namespace mesh traffic (including the HBONE port 15008) plus the app service ports.
  • App-side: probes converted to localhost exec so they aren’t ambient-redirected (Recreate strategy on mutable-image demo Deployments).

Validation evidence — ztunnel metrics scraped from spoke-dc-v6-worker-1 showed:

istio_tcp_connections_opened_total{
  reporter="source",
  source_workload="bff",
  destination_service="chat-backend.bank-employees-jboss-chat.svc.cluster.local",
  connection_security_policy="mutual_tls"
} 14

…plus the same connection_security_policy="mutual_tls" for BFF → eap-domain-group-a and BFF → eap-domain-group-b. No waypoint is deployed; L7 features (HTTPRoute, AuthorizationPolicy at L7, retries) are not in scope for this pilot.

bank-payment — deliberately rolled back

bank-payment was enrolled on feature/bank-payment-ambient-mesh (commit 08f14fc) and reverted on feature/bank-payment-mesh-rollback (commit ad0c652) the same day. The commit message records the cause:

ztunnel cannot fetch SPIFFE certs for workloads in bank-payment: istiod returns request impersonation authentication failure on the CSR. The mesh-level RBAC grant that lets ztunnel issue certs for tenant ServiceAccounts has not been wired up on this cluster yet, so adding the namespace to ambient just breaks every pod’s kubelet probes.

Read: ambient enrollment depends on the namespace-scoped ztunnel-workload-identity Role + RoleBinding shipped from the bank-employees-jboss-chat pattern. Without it, the dataplane-mode label is a foot-gun — every pod in the namespace comes up, ztunnel intercepts the kubelet probe, the CSR fails, the probe times out, the pod restarts. The rollback was deliberate, not a regression; bank-payment re-enrolment will land once the workload-identity pattern is verified one more time on a non-production tenant.

L7 with Waypoints

A waypoint is a Gateway resource of type istio.io/waypoint:

apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: waypoint
  namespace: apps-team-x
  labels:
    istio.io/waypoint-for: service
spec:
  gatewayClassName: istio-waypoint
  listeners:
    - name: mesh
      port: 15008
      protocol: HBONE

Once the Gateway exists, all L7 features (HTTP routing, retries, faults, request/response transformations, authorization at L7) can be applied via HTTPRoute or Istio’s VirtualService and AuthorizationPolicy. The waypoint runs as a single Envoy Pod (Deployment), not a sidecar.

Per-namespace waypoint is the lab default. Per-service-account waypoints are an option for workloads that need different L7 policy by SA.

Validation

K=/home/<user>/.kube/configs/spoke-dc-v6.kubeconfig

# Operator
oc --kubeconfig "$K" -n openshift-operators get sub,csv | grep -i servicemesh

# CRs
oc --kubeconfig "$K" -n istio-system get istio default
oc --kubeconfig "$K" -n istio-cni    get istiocni default
oc --kubeconfig "$K" -n ztunnel      get ztunnel default

# Workloads
oc --kubeconfig "$K" -n istio-system get deploy/istiod
oc --kubeconfig "$K" -n istio-cni    get ds
oc --kubeconfig "$K" -n ztunnel      get ds

# An ambient namespace
oc --kubeconfig "$K" get ns apps-team-x -o jsonpath='{.metadata.labels.istio\.io/dataplane-mode}{"\n"}'

Expected:

  • Istio, IstioCNI, ZTunnel all Ready=True.
  • istiod Deployment 1/1; istio-cni and ztunnel DaemonSets at full readiness.
  • Ambient namespaces show ambient for their dataplane-mode label.

Failure modes

SymptomRoot causeFixPrevention
Traffic between two ambient pods is plaintext.CNI plugin not active on the node; or namespace not labelled.Verify oc get ds -n istio-cni; relabel ns.Lint workloads’ namespaces; require the ambient label on app namespaces.
Service Mesh upgrade rolls everything.updateStrategy: InPlace.Expected behaviour for InPlace. For zero-downtime, switch to RevisionBased.Plan upgrades during low-traffic windows.
L7 policies have no effect.Namespace has no Waypoint Gateway.Add a Gateway of class istio-waypoint.Onboarding template includes the waypoint if the tenant requests L7.
istiod cannot push xDS.Network policy in istio-system blocks egress.oc get networkpolicy -n istio-system; allow controller→ztunnel/waypoint.Don’t change istio-system policies without rerunning the conformance set.
Sidecar-mode workload coexists badly.Mixed mode on the same ns.Pick one mode per ns.Document the convention; lint catches mixed labels.

Operator versions on spoke-dc-v6

OperatorChannel / versionSync-waveNotes
servicemeshoperator3stable8The sail-operator that owns Istio/IstioCNI/ZTunnel
kiali-ossmOSSM v2.22.210Kiali bound to OpenShift monitoring; see §14.7 Kiali

Both operators are mirrored into the cluster’s app-catalog and pinned by version; the sync-waves keep the operator install before the operand CRs.

References

  • Issue #278 — APP-JBOSS-CHAT8 ambient pilot; bank-employees-jboss-chat MRs !130–!132 + app repo MRs !31/!33.
  • Issue OSSM3-DEMO1 — ztunnel workload-identity RBAC reference.
  • DEV-OCP-1.4 #178 (Istio + Kiali OSSM 3 ambient).
  • platform-gitops/clusters/spoke-dc-v6/platform-services/service-mesh/.
  • platform-gitops/clusters/spoke-dc-v6/tenants/ossm3-demo/ztunnel-workload-identity-rbac.yaml — the namespace-scoped pattern.
  • feature/bank-payment-ambient-mesh (enrol) and feature/bank-payment-mesh-rollback (revert) — the rollback branches.
  • Sail Operator docs: Istio, IstioCNI, ZTunnel v1.
  • Istio ambient docs: dataplane-mode, Waypoint, HBONE.

Last reviewed: 2026-05-12