Brac POC — bank-payment app
End-to-end payment workflow on OpenShift — designed, built and shipped with the bank's existing pipeline. Single sign-on, canary release, browser-trusted TLS at the edge, mTLS in the cluster, live health view, and standard web hardening — all delivered through GitOps.
Demo asset for the BRAC engagement. A working payment workflow with customer and approver roles, single sign-on, a canary release of the customer page, browser-trusted TLS at the edge, mutual TLS between the internal components, a real-time health view of the backend services, and three of the standard web-hardening controls every audit asks for. Built end-to-end on top of the bank’s existing platform — no new vendor, no new toolchain.
At a glance
| Where it runs | OpenShift cluster spoke-dc-v6 |
| Customer page | https://payments.apps.spoke-dc-v6.sub.comptech-lab.com |
| Approver page | https://approver.apps.spoke-dc-v6.sub.comptech-lab.com |
| Health view | https://payments.apps.spoke-dc-v6.sub.comptech-lab.com/obs/ |
| Login | WSO2 Identity Server, tenant bank-payment |
| Browser-trusted TLS | Public Let’s Encrypt wildcard, auto-renewed |
| Internal TLS | mTLS between the in-cluster proxy and the backends, certs issued by the lab CA |
| Observability | Two parallel surfaces — OpenShift native (Tempo + user-workload Prometheus + LokiStack + Perses dashboard) and SigNoz on the lab VM (traces, metrics, logs, APM, alerts, SLO) |
| Delivery | One Git repo → Jenkins → Nexus → Argo CD → OpenShift |
What got built
Four pieces in one namespace:
| Piece | Role | Who uses it |
|---|---|---|
| payment-client-svc | Customer payment API. Owns the accounts and the transfer history. | All users (under the hood) |
| payment-approver-svc | Admin API. Reviews and decides on pending transfers. | Admin only |
| client-frontend | Customer web page. Balance, send-payment form, transaction history. | zahid, shaikat |
| approver-frontend | Admin web page. Pending queue, approve / decline, audit log. | admin |
Who logs in
All three users share one demo password: Hkj38djf&&&.
| User | Account | Balance | Role |
|---|---|---|---|
zahid | 5005-0001-23 | 50,000 BDT | customer |
shaikat | 5005-0004-56 | 75,000 BDT | customer |
admin | — | — | approver |
Login is single sign-on through WSO2 — the bank doesn’t manage usernames or passwords inside the app itself.
Technology choices
Every layer of the app uses something the bank’s operations team already runs in production. Nothing exotic, nothing that needs a new vendor relationship to support.
| Layer | What it uses | Why this choice |
|---|---|---|
| Backend services | Open Liberty (Java application server, MicroProfile 6.1) | Same runtime the bank’s existing platform already runs. Long-term enterprise support from IBM / Red Hat. Built-in health and metrics endpoints — no extra agent required. |
| Web frontends | React (single-page apps), built with Vite, styled with Tailwind | Industry-standard frontend stack. Light, fast to load, easy to hand off to any frontend team. |
| Reverse proxy + canary + hardening | NGINX | The de-facto edge proxy. Used here for header-based canary routing, rate limiting, branded error pages, and the security headers. Zero learning curve for ops teams. |
| Identity & SSO | WSO2 Identity Server | The bank’s existing identity platform. Two service-provider entries, one tenant, regular OpenID Connect — no custom auth code in the apps. |
| Container platform | OpenShift (Kubernetes-based, by Red Hat) | The bank’s existing container platform. Standard images, standard deployment objects. |
| Public TLS certificate | Let’s Encrypt via cert-manager | Globally-trusted certificate, refreshed automatically — no procurement cycle, no manual rotation. |
| Internal TLS | Lab Certificate Authority via cert-manager | Signs the in-cluster mTLS certs. The CA root is already trusted by every Java pod and every node. |
| Source control | GitLab | The bank’s existing development pipeline. |
| Build pipeline | Jenkins | The bank’s existing CI/CD. |
| Image registry | Nexus Repository | The bank’s existing artifact store. Same registry the rest of the platform pulls from. |
| Deployment pipeline | Argo CD (GitOps) | One commit to the Git repository becomes one change on the cluster, automatically. No engineer logs in to the running environment to make a change. |
| Tracing | Red Hat build of OpenTelemetry + Tempo | The OpenTelemetry collector receives spans over OTLP and stores them in Tempo. Every distributed payment flow is one trace. |
| Metrics | OpenShift user-workload Prometheus | Scrapes Liberty’s /metrics directly via a ServiceMonitor. |
| Logs | OpenShift Logging + LokiStack | Vector collects pod stdout; Loki stores and indexes it. |
| Dashboards | Cluster Observability Operator + Perses | Dashboards declared as YAML, rendered inside the console at Observe → Dashboards. |
| APM / parallel observability | SigNoz (on a lab VM) | Receives the same spans / metrics / logs via OTLP from an in-cluster fan-out collector. Adds APM (service map, RED metrics, exceptions), trace ↔ log correlation, alerts, SLO tracking. Used side-by-side with the OpenShift native surface — same data, richer querying. |
How the app was designed
A small set of decisions, made up front, that the rest of the implementation followed:
| Decision | What it means in practice |
|---|---|
| Two backend services, one customer, one approver | The customer service is the system of record for accounts and transfers. The approver service is a thin admin facade — it reviews and decides, but doesn’t own data. Separation makes auditing simpler: every approver decision is recorded on the customer service with the admin’s identity. |
| Two separate web frontends, one per role | The customer and admin experiences are different products with different audiences, different navigation, different theming. Splitting them eliminates accidental cross-exposure of admin features to customers. |
| Reserve-on-create, complete-on-approve | When a customer submits a payment, the funds leave their visible balance immediately. The transfer can’t be double-spent during review. On Approve the funds land in the recipient; on Decline they’re refunded. |
| No service can speak for the user | Every API call carries the user’s own bearer token. When the approver service calls the customer service to settle a transfer, it forwards the admin’s token, not a service account. The audit trail captures the actual deciding human. |
| No new vendor, no new toolchain | Every layer (runtime, build, identity, registry, ingress) maps to something the bank already runs. |
How the app is built
A single push to GitLab triggers everything else.
| Step | What happens | Where |
|---|---|---|
1. Developer pushes to main | GitLab webhook fires | gitlab.apps.sub.comptech-lab.com/divisions/payment/bank-payment |
| 2. Jenkins runs the pipeline | Defined in Jenkinsfile at the root of the repo | Jenkins (lab VM) |
| 3. Four container images built in parallel | One per app — backend services and frontends | Jenkins build agent (ct-shared-build) |
| 4. Each image scanned for high / critical vulnerabilities | Trivy. Build fails if any HIGH or CRITICAL CVE is found. | Same agent |
| 5. Images tagged twice and pushed | :latest and :<short-commit-sha> | Nexus (app-registry.apps.sub.comptech-lab.com) |
| 6. GitOps repo updated with the new digest | A small script writes the immutable digest into the Argo overlay and commits back | Same pipeline |
| 7. Argo CD reconciles the new digest onto the cluster | Picks up the commit, rolls the deployment | OpenShift |
The whole sequence is four to six minutes end-to-end. The build fails closed — a single critical CVE blocks the push to Nexus, which blocks the deployment. No image ever lands on the cluster without having been scanned.
GitOps deployment
The cluster’s state is a rendered copy of two Git repositories.
No engineer ever runs oc apply against the running environment.
| Repo | What lives there | Who can write to it |
|---|---|---|
divisions/payment/bank-payment | The four app sources, the Containerfiles, the kustomize manifests (Deployments, Services, Routes, NGINX config, mTLS Certificates), the Jenkinsfile | The payment-division developers |
comptech-platform/openshift-ops/openshift-platform-gitops | Tenant scaffolding (namespace, quotas, RBAC, network policy, Argo Application) | Platform admins |
On the cluster, Argo CD watches both repositories and continuously reconciles their contents onto OpenShift. Two important properties fall out of this:
- Self-healing. If anything drifts from what’s in Git — someone patches a Service directly, a resource gets deleted by mistake — Argo reverts it within seconds.
- Auditable. Every change to the running app is a Git commit, with an author, a timestamp, and a diff. Compliance gets the audit trail for free.
The deployment itself uses two-side reconciliation: the cluster
declares it wants the image at digest sha256:abc…, the image already
exists in Nexus, OpenShift pulls it, and rolls the deployment. The
whole sequence after a Jenkins build is typically under a minute.
Traffic flow — browser to pod
A request from zahid’s browser passes through six named layers
before it reaches an application pod:
| Hop | What it does | Where it lives |
|---|---|---|
| 1. Cloudflare DNS | Authoritative DNS for comptech-lab.com. Returns NS records that delegate the sub.comptech-lab.com subdomain to the lab’s own DNS server. | Public, managed by Cloudflare |
| 2. PowerDNS | Authoritative DNS for the sub.comptech-lab.com zone. Returns the public IP of the lab’s edge load balancer for any *.apps.spoke-dc-v6.sub.comptech-lab.com query. | Lab VM (pdns) |
| 3. HAProxy | Edge TCP / SNI load balancer for the whole lab. Reads the SNI hostname from the inbound TLS ClientHello and forwards the connection to the right backend — the spoke OpenShift cluster’s ingress in this case. TLS is passed through end-to-end. | Lab VM (haproxy at 30.30.30.1) |
| 4. OpenShift Router | The cluster’s ingress. Terminates the public Let’s Encrypt TLS, looks up the matching Route, forwards the request as plain HTTP to the in-cluster Service. | Cluster (openshift-ingress namespace) |
| 5. NGINX reverse proxy | The in-cluster gateway for the customer page. Applies header-based canary routing, per-IP rate limiting, branded error pages, and the security response headers. For the live health view, also presents a client certificate over mutual TLS to the backend. | Cluster (bank-payment namespace) |
| 6. App pod | The actual web frontend or backend service — Open Liberty for the APIs, httpd for the static SPA bundles. | Cluster (bank-payment namespace) |
The public TLS terminates at step 4. From step 5 onward, the in-cluster hop between the proxy and the backend service is also encrypted, using mTLS — see Internal mTLS below.
System view — interactive
Hint: drag any box to rearrange. Click a box to flip its colors (useful for highlighting a path you’re walking through). The lines are live — the animation shows the direction traffic flows.
Reading the diagram:
| Line colour | What it means |
|---|---|
| Solid black | User traffic (HTTPS) — DNS → edge → in-cluster |
| Solid amber | The canary hop (header-triggered) and the mTLS hop (/obs/* from NGINX to Liberty) |
| Solid green | Cross-service call (approver-svc → client-svc with the admin’s bearer forwarded) |
| Dashed grey | DNS resolution |
| Dashed magenta | Identity-related — sign-in redirect, JWKS verify, cert distribution from cert-manager |
| Dashed amber | Build pipeline (GitLab → Jenkins → Trivy → Nexus → digest pin) |
| Dashed green | GitOps reconcile (Argo watching the repos, pulling images, rolling Deployments) |
How a payment moves
- Zahid logs in on the customer page. Browser is redirected to WSO2, signs in, comes back signed in.
- He picks a recipient from the dropdown — accounts are listed by name and account number; balances are not exposed across customers.
- He submits a payment for an amount. Funds are reserved against his balance immediately.
- Admin logs in on the approver page. The pending queue auto-refreshes every five seconds.
- Admin clicks Approve. Funds move into the recipient’s balance, the transfer is logged with the admin’s name on the audit trail.
- Decline would refund the sender. Either decision is one-shot — the app won’t let the same transfer be settled twice.
Canary release of the customer page
Two versions of the customer page run side-by-side. Visitors get the stable v1 unless they explicitly ask for v2.
| v1 (stable) | v2 (canary) | |
|---|---|---|
| Appearance | White background | Light-yellow background, “v2 (canary)” tag next to the title |
| Who reaches it | Everyone by default | Only requests that carry an explicit X-Canary: v2 opt-in header |
| Replicas | 2 | 1 |
| Switching is | Instant on the next page load, no redeploy | Instant on the next page load |
To show v2 in a demo: install the ModHeader browser extension,
add a single rule (Name: X-Canary, Value: v2), reload. Disable the
rule to flip back to v1.
The routing decision is visible in the x-served-by response
header — open Chrome’s dev tools → Network → click the page request →
Headers → Response Headers — to see which version served you.
Internal mTLS
Public traffic to the app uses a browser-trusted Let’s Encrypt certificate; that part is unchanged. The hop inside the cluster between the NGINX reverse proxy and the health-view endpoints on the backend services now uses mutual TLS — both sides authenticate with certificates, both sides verify each other’s certificate against the lab certificate authority.
| Client (NGINX proxy) | Server (Liberty backend) | |
|---|---|---|
| Identity | Cert issued by the lab CA, CN = client-frontend-canary-proxy.bank-payment.svc.cluster.local | Cert issued by the lab CA, CN = payment-client-svc.bank-payment.svc.cluster.local |
| Cert lifetime | 90 days, rotated automatically 30 days before expiry by cert-manager | Same |
| What happens without a cert | The Liberty backend refuses the connection at the TLS handshake | — |
To prove mTLS is doing its job, two commands from a terminal:
# 1. Without the client cert — handshake rejected
oc -n bank-payment exec deploy/client-frontend-canary-proxy -- \
curl -k --max-time 5 https://payment-client-svc:9445/health/live
# → SSL_ERROR_ZERO_RETURN
# 2. With the client cert — accepted, returns 200 + JSON
oc -n bank-payment exec deploy/client-frontend-canary-proxy -- \
curl --cert /etc/nginx-mtls/tls.crt --key /etc/nginx-mtls/tls.key \
--cacert /etc/nginx-mtls/ca.crt \
https://payment-client-svc.bank-payment.svc.cluster.local:9445/health/live
# → {"status":"UP",...}
The plain-HTTP port on the backend (9080) stays open for the
OpenShift platform’s own probes and for the public-facing API Routes,
which are already protected by the bearer token from WSO2.
Egress controls — what the pods can talk to
By default the pods in the bank-payment namespace cannot reach
anything outside their own namespace, not even the wider cluster.
Every outbound destination is opened by an explicit allow-rule. This
limits the blast radius of a compromised pod — a leaked credential
can’t be exfiltrated to a public server, the pod can’t pivot into
other tenant namespaces, and the only external destination it can
reach is the bank’s own identity service.
| Destination | Why it’s needed | How it’s allowed |
|---|---|---|
In-cluster DNS (openshift-dns) | Resolving Service names like payment-client-svc.bank-payment.svc.cluster.local | NetworkPolicy allow-egress-dns — UDP/TCP 53 |
Cluster monitoring (openshift-monitoring) | Prometheus scrape from the platform’s monitoring stack | NetworkPolicy allow-egress-cluster-monitoring — TCP 9091-9093 |
WSO2 Identity Server (lab VM, 160.30.63.134/138:443) | Liberty fetches the JWKS signing keys here to validate every incoming user token | NetworkPolicy allow-egress-wso2-is — restricted to those exact two IPs on TCP 443 |
| Other pods in the same namespace | The approver service calling the customer service, the NGINX proxy calling the backends, etc. | NetworkPolicy allow-intra-namespace |
| The public internet | — | Not allowed. No NetworkPolicy opens this. |
| Other tenant namespaces | — | Not allowed. Pods in bank-payment cannot speak to pods in any other tenant. |
The full set of NetworkPolicies for this tenant lives in the platform GitOps repository — adding a new outbound destination is a small Git commit, reviewed and audited like everything else.
Observability — two parallel surfaces
Every signal the app emits (traces, metrics, logs) is shipped to two independent observability backends in parallel — the OpenShift-native stack (in the cluster console, owned by the platform team) and SigNoz running on a lab VM (a full observability platform with APM, alerts, SLO tracking).
The split is intentional. The platform team uses OpenShift’s native view because it lives in the same console they already operate the cluster from. App developers and the on-call rotation use SigNoz because it has richer per-endpoint analytics and a side-by-side trace / log / metric view. Both surfaces show the same data — the app pushes once, an in-cluster collector fans out.
Where each signal lands
| Signal | OpenShift native | SigNoz (lab VM) |
|---|---|---|
| Traces | OpenTelemetry collector → Tempo in the tracing namespace. View at console Observe → Traces. | Same span stream is also shipped over OTLP to SigNoz, which stores it in ClickHouse for richer querying. |
| Metrics | User-workload Prometheus scrapes Liberty’s /metrics every 30s. View at console Observe → Metrics. | Liberty also pushes the same metric set over OTLP; SigNoz stores it in ClickHouse and exposes it for dashboards / alerts. |
| Logs | OpenShift Logging’s Vector ships pod stdout to LokiStack. View at console Observe → Logs. | Liberty’s MicroProfile Telemetry log handler pushes every message via OTLP with the trace_id / span_id stamped on each line, so SigNoz can jump straight from a trace to the matching log lines. |
Tail sampling — kept the interesting traces, dropped the noise
The in-cluster fan-out collector applies tail-based sampling to traces before forwarding to SigNoz. Sampling decisions are made on the complete trace:
| Trace shape | Kept? |
|---|---|
| Contains any errored span | Always kept |
| Total duration > 500 ms | Always kept |
| All-healthy under 500 ms | 10% kept (probabilistic) |
Net effect: anomalies survive at full fidelity, normal traffic is ten-times-cheaper to store, and the trace storage bill scales with meaningful traffic instead of raw volume.
Business-aware traces
Spans on the transfer and approve / decline endpoints carry the business context as searchable attributes — not just generic HTTP metadata:
| Attribute | Example |
|---|---|
transfer.id | 1a2b-3c4d-… |
transfer.from · transfer.to | zahid · shaikat |
transfer.amount · transfer.status | 2500.00 · PENDING |
decision (on approver path) | approve / decline |
bank.user · bank.user.role | admin · admin |
In SigNoz, you can paste a transfer ID into the trace search and find that exact customer payment — including the cross-service hop to the approver, the ledger write, every Liberty handler — in one waterfall.
What SigNoz gives you on top of the data
The OTLP plumbing is the same as for Tempo, but SigNoz adds an APM layer that needs no extra wiring:
| SigNoz feature | What it gives the demo |
|---|---|
| Service Map | Auto-drawn graph of every service-to-service call, arrow thickness = traffic volume. Visually proves the approver → client cross-call is happening. |
| APM / RED metrics per endpoint | Auto-derived from spans — request Rate, Error rate, Duration (p50/p95/p99) for every API path. No code change. |
| Slowest endpoints leaderboard | Sorted list of the slowest operations across both backends. |
| Exception view | When Liberty throws, the exception + Java stack lands directly on the span — visible in one click. |
| Trace → log jump | Click any span → SigNoz pulls the matching log lines automatically via trace_id. Single pane of glass. |
| Alerts (UI-driven) | Pre-staged: p99 on /v1/transfers > 500 ms · backend down (up < 1) · heap utilization > 80 %. |
| SLO panels (UI-driven) | “99.5% of /v1/transfers < 500 ms over 30 days” — burn-rate tracker baked in. |
| Saved views | A pre-built “find a transfer by ID” view that filters across both services. |
Curated dashboard (Perses, in the OpenShift console)
A pre-built Perses dashboard ships alongside the app and renders under Observe → Dashboards → Bank Payment Overview in the console. Three collapsible grids:
| Grid | Panels |
|---|---|
| At a glance | Running pods · Heap utilization (avg %) · JVM threads (avg) · HTTP request rate (req/s) |
| JVM resources | Heap used (MB) per pod · Thread count per pod · Process CPU load (%) per pod · GC time rate |
| HTTP traffic | Request rate by route × status code · p95 latency by route |
The dashboard YAML lives in the same repo as the app; the Cluster Observability Operator’s Perses operator reconciles it into the console.
Quick-glance in-app status page
Independent of the platform tooling and SigNoz, the app also serves
a tiny status page at /obs/ directly on the customer hostname.
Two cards, one per backend, refreshed every five seconds. Convenient
when you want a one-click health check without leaving the app.
Metric on /obs/ | What it tells you |
|---|---|
| Liveness | Is the service process responding at all? Red if it isn’t. |
| Readiness | Is the service ready to take traffic? Red during a restart. |
| JVM thread count | Snapshot of in-flight work. |
| Heap used (MB) | Memory pressure. |
| CPU (recent %) | Live CPU usage. |
This in-app page goes through the mTLS hop described above; the console-side observability and SigNoz go through their own collectors which run in their own namespaces / on the lab VM.
Web hardening at the edge
Three controls applied in front of every request, configured in one NGINX file:
| Control | What it does | How to verify in 10 seconds |
|---|---|---|
| HSTS (Strict-Transport-Security) | Browser remembers for 1 year that this site is HTTPS-only. Even typing http:// won’t downgrade. | Dev tools → Network → reload → click request → Headers → look for strict-transport-security. |
| Branded error pages | 404 / 429 / 5xx all return a bank-branded page instead of nginx’s grey default. | Fire 30 rapid requests at the health view — the 429 page renders. |
| Per-source rate limiting | Caps how fast one source can hit a path. Stops dictionary attacks on login forms and stops scrapers hammering the health view. | for i in {1..30}; do curl -sS -o /dev/null -w '%{http_code} ' …/obs/client/metrics; done — you’ll see a mix of 200s and 429s. |
Two additional headers travel alongside HSTS: X-Content-Type-Options: nosniff (browsers can’t misinterpret file types) and Referrer-Policy: strict-origin-when-cross-origin (other sites don’t see internal paths in their referrer logs).
Where everything lives
| URL | |
|---|---|
| Customer page | https://payments.apps.spoke-dc-v6.sub.comptech-lab.com |
| Approver page | https://approver.apps.spoke-dc-v6.sub.comptech-lab.com |
| In-app health view | https://payments.apps.spoke-dc-v6.sub.comptech-lab.com/obs/ |
| Console dashboard | OpenShift console → Observe → Dashboards → Bank Payment Overview |
| SigNoz UI | https://signoz.apps.sub.comptech-lab.com — Services / Traces / Logs / Metrics / Alerts / Dashboards |
| Customer API | https://client-api.apps.spoke-dc-v6.sub.comptech-lab.com |
| Approver API | https://approver-api.apps.spoke-dc-v6.sub.comptech-lab.com |
| WSO2 IS console | https://is.apps.sub.comptech-lab.com/t/bank-payment/console |
| Git repo (app) | gitlab.apps.sub.comptech-lab.com/divisions/payment/bank-payment |
| Git repo (platform / Argo) | gitlab.apps.sub.comptech-lab.com/comptech-platform/openshift-ops/openshift-platform-gitops |
| Image registry | app-registry.apps.sub.comptech-lab.com/bank-payment/* |
Roadmap (intentionally not yet built)
- Persistent storage for accounts + transfer history. The demo uses in-memory state; a service restart wipes the ledger. JDBC + PostgreSQL or a PVC-backed JSON snapshot is the obvious next step.
- API Gateway in front of the two services — the gateway definitions are already in the repo, waiting on WSO2 APIM.
- Service mesh for fleet-wide mTLS and a richer canary story beyond a single header rule.
- Audit event stream — every approver decision emitted to Kafka for downstream reconciliation, plus the same event flowing through the OpenTelemetry pipeline for correlation with the trace data.