Kafka brokers (KRaft cluster)
Three combined broker/controller VMs running Kafka 4.2.0 in KRaft mode at 30.30.30.24/25/26, with a pinned JMX exporter exposing ~10k kafka_* metrics per broker on :9404.
Kafka is deployed as three private VMs in KRaft mode — no Zookeeper. Each VM is a combined broker + controller voter. PLAINTEXT only on the internal LAN today; TLS/SASL/ACLs are deferred gates. The brokers expose Prometheus-format metrics on :9404 via a pinned jmx_prometheus_javaagent jar; OpenShift-side scraping is tracked separately under the Kafka monitoring follow-up.
What it is
| Property | Value |
|---|---|
| Brokers | kafka-0, kafka-1, kafka-2 |
| IPs | 30.30.30.24, 30.30.30.25, 30.30.30.26 |
| Bootstrap | kafka-bootstrap.sub.comptech-lab.com:9092 (PLAINTEXT) |
| Mode | KRaft (no Zookeeper); 3-voter quorum, each node is broker + controller |
| Kafka version | 4.2.0 |
| JVM | OpenJDK 21 |
| Cluster id | fBNALO9WTje8UgaGET7XoQ |
| VM size | 4 vCPU / 8 GiB RAM / 80 G OS + 200 G data |
| Data path | /var/lib/kafka/kraft-combined-logs |
| Listener ports | 9092 (client PLAINTEXT), 9093 (controller, per-broker only) |
| JMX exporter port | :9404 open to 30.30.0.0/16 |
| Public exposure | None — private lab only |
| TLS / SASL / ACLs | No (gate; issue #17) |
| Backups / DR | No (gate; issue #12) |
DNS is served from PowerDNS at 30.30.30.53. The kafka-bootstrap name is a multi-A round-robin across all three broker IPs.
Topology — KRaft, not Zookeeper
Each VM runs a single kafka.service (systemd) that is both broker and controller. KRaft replaces Zookeeper with an internal Raft quorum of the three controllers. Practical consequence: there is no separate Zookeeper ensemble to operate, no zkCli.sh, and metadata reads go through kafka-metadata-quorum.sh describe --status (3 voters expected; LeaderId must be non-empty).
The :9093 controller listener is firewalled to just the three broker IPs — no other host on the lab network can reach the controller plane. Client traffic uses :9092.
JMX exporter wiring
Metrics are collected by jmx_prometheus_javaagent loaded into the Kafka JVM as a -javaagent. The agent listens on :9404 and serves /metrics in Prometheus exposition format. Wiring is in three pieces:
1. Pinned jar artifact
| Property | Value |
|---|---|
| Artifact | jmx_prometheus_javaagent-1.0.1.jar |
| Path on each broker | /opt/jmx-exporter/jmx_prometheus_javaagent.jar |
| Owner / mode | root:root 0644 |
| SHA-256 | 7d61f737fd661610ccc14aea79764faa1ea94a340cbc8f0029b3d2edea3d80c1 |
| Source | Maven Central repo.maven.apache.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/1.0.1/ |
| Integrity | Maven Central publishes .sha1 + .md5 siblings only (no .sha256 sibling) — both Maven-published sums verified at download, SHA-256 recorded for our records and re-verified on every broker after copy |
The jar is pinned by version and by SHA-256. Re-verify the hash after any redistribution.
2. Systemd drop-in
The drop-in lives at /etc/systemd/system/kafka.service.d/10-jmx-exporter.conf (identical on all three brokers):
[Service]
Environment="KAFKA_OPTS=-javaagent:/opt/jmx-exporter/jmx_prometheus_javaagent.jar=9404:/opt/jmx-exporter/kafka-jmx-config.yml"
Crucially the drop-in only adds KAFKA_OPTS. It does not touch server.properties, log4j2.yaml, KAFKA_HEAP_OPTS, LOG_DIR, or KAFKA_LOG4J_OPTS. Confirmed post-install with systemctl show kafka -p Environment.
3. Host firewall (UFW)
9404/tcp ALLOW IN 30.30.0.0/16 # kafka jmx exporter
Added with sudo ufw allow proto tcp from 30.30.0.0/16 to any port 9404 comment 'kafka jmx exporter'. UFW default is deny (incoming) / allow (outgoing). No nftables direct rules in play.
JMX config — 17 rules, specific → generic
The exporter rule file (/opt/jmx-exporter/kafka-jmx-config.yml, sha256 21168263d8…3d92ab0c) is 13 specific rules + 4 generic catch-all rules. The order matters — JMX exporter evaluates rules top-to-bottom and stops at the first match, so per-topic / per-partition rules go above generic <>Value ones.
| # | Rule | What it captures |
|---|---|---|
| 1 | Per-topic per-partition <>Value gauges | Partition-level offsets, lag, ISR membership |
| 2 | Per-topic broker metrics <>Count | BytesInPerSec, BytesOutPerSec, MessagesInPerSec per topic |
| 3 | Per-broker replica fetcher metrics | Fetcher max lag, follower fetch rate |
| 4 | KRaft raft-metrics *-total / *-rate | Raft message counts and rates |
| 5 | KRaft raft other attrs → gauges | Quorum state, current leader, voter ids |
| 6 | KRaft raft channel metrics | Inter-controller channel send/receive rates |
| 7 | Per-user / per-client quota metrics | Throttle and quota gauges |
| 8 | Network processor connection metrics | Active connections per processor |
| 9 | Request metrics counters | RequestsPerSec, ErrorsPerSec by API |
| 10 | Request metrics percentiles | Local time, total time, queue time p50/p95/p99 |
| 11 | Generic <>Count counters | Catch-all counters (rule 11 of 13 specific) |
| 12 | Generic <>Percentile gauges | Catch-all percentiles |
| 13 | Generic <>Value gauges | Includes UnderReplicatedPartitions and ActiveControllerCount |
| 14 | JVM heap / non-heap memory | JVM memory pools |
| 15 | JVM GC count / time | GC collector metrics |
| 16 | JVM thread counters | Thread state counts |
| 17 | Tail catch-all kafka.* → kafka_generic_* UNTYPED | Unknowns are surfaced rather than blackholed |
Cardinality note
Rule 17 (the tail catch-all) emits about 4,045 of the ~10,000 kafka_* lines on each broker — roughly 40% of the scrape volume. Functional and intentional (“don’t blackhole the unknown”), but at fleet scale this may double UWM ingestion volume vs a more curated config. If scrape latency or cardinality becomes a problem, the follow-up is either:
- Drop low-value subtrees at scrape time with
metricRelabelConfigson theServiceMonitor, keeping the broker-side config permissive; or - Tighten rule 17 to specific subtrees and accept fewer unknowns.
Scrape latency observed on localhost:9404/metrics (3 runs on kafka-0): 0.624s / 0.631s / 0.564s; response size ~1.34 MB. Line counts per broker: kafka-0 ~10,114, kafka-1 ~10,066, kafka-2 ~10,028.
Rolling restart — zero-URP procedure
Restarting the Kafka cluster requires one broker at a time and a between-each UnderReplicatedPartitions=0 (URP) check. The procedure used during the JMX rollout was:
-
Pre-flight on every broker.
ssh ze@kafka-0 'sudo /usr/local/bin/kafka-topics.sh \ --bootstrap-server kafka-bootstrap.sub.comptech-lab.com:9092 \ --under-replicated-partitions'Expect empty output. If anything comes back, abort — do not restart with pre-existing URP.
-
Restart one broker.
ssh ze@kafka-0 'sudo systemctl restart kafka' -
Wait for the broker to come back up.
Poll
:9092listening andcurl -fsS localhost:9404/metricssucceeding, 5-second interval, 90-second budget. The 90-second budget is generous; observed times to heal were 9–14 s per broker during the JMX rollout. -
Verify KRaft quorum.
ssh ze@kafka-0 'sudo /usr/local/bin/kafka-metadata-quorum.sh \ --bootstrap-server localhost:9092 \ describe --status'Expect 3 voters and a non-empty
LeaderId. Leadership transitions during the rolling restart are normal and not a fault condition — they just mean one voter took over while the previous leader was down. -
Re-verify URP is zero before moving to the next broker.
Same
kafka-topics.sh --under-replicated-partitionscall as step 1. Empty output is required. -
Repeat for
kafka-1, thenkafka-2.Order is
kafka-0→kafka-1→kafka-2. If URP is non-zero after a restart, stop; restart only resumes after URP returns to 0.
The discipline matters because Kafka tolerates one broker out at a time without partition unavailability. Two brokers out simultaneously, with a partition whose two replicas are on those two brokers, leaves that partition offline.
Operational guardrails
- No public IPs, no public DNS, no HAProxy wildcard for Kafka. Private LAN only until TLS/SASL/ACLs close.
- Don’t restart all three at once. Use the rolling procedure above. Even for cosmetic changes — the URP=0 gate is the only thing that prevents a half-restart from going wrong.
- Don’t touch
server.propertiesorlog4j2.yamlfrom the JMX wiring. The drop-in addsKAFKA_OPTSand nothing else. - Don’t widen UFW.
:9404is open to30.30.0.0/16only. Don’t open it wider; UWM scraping reaches it from the OpenShift workers, all on the same/16. - Don’t break the rule order in
kafka-jmx-config.yml. Specific rules go above generic ones; moving the tail catch-all up would mask everything below.
Issue #269 — Phase 1 broker side done; OCP scraping blocked
Issue #269 covers Kafka monitoring end-to-end. The broker side of Phase 1 is complete:
- Pinned JMX exporter is loaded on all three brokers and serving on
:9404. - 17-rule config landed; ~10k
kafka_*lines per broker. - Rolling restart completed with URP=0 throughout (3-voter quorum maintained).
- UFW updated to
30.30.0.0/16only for:9404.
The OCP side of Phase 1 is blocked. The spoke’s argocd-cm resource.exclusions blocks both core/Endpoints and discovery.k8s.io/EndpointSlice from Argo sync. Without an Endpoints/EndpointSlice flavor that Argo will sync, UWM cannot scrape an external-IP target. Phase 2 (issue #273) either relaxes that exclusion or picks a target representation Argo will sync. The platform-side Kafka monitoring stack ships a kafka-exporter deployment which scrapes :9092 (broker protocol), not the brokers’ :9404 JMX endpoint — those four exporter alerts are independent of the broker-side javaagent work documented here.
Validation
# DNS
dig @30.30.30.53 kafka-0.sub.comptech-lab.com A +short
dig @30.30.30.53 kafka-bootstrap.sub.comptech-lab.com A +short
# Listener ports
ssh ze@kafka-0 'ss -lntp | grep -E "(9092|9093|9404)"'
# JMX exporter
ssh ze@kafka-0 'curl -fsS localhost:9404/metrics | wc -l'
ssh ze@kafka-0 'curl -fsS localhost:9404/metrics | grep -E "^kafka_server_replicamanager_underreplicatedpartitions|^kafka_controller_kafkacontroller_activecontrollercount"'
# KRaft quorum
ssh ze@kafka-0 'sudo /usr/local/bin/kafka-metadata-quorum.sh \
--bootstrap-server localhost:9092 describe --status'
# URP
ssh ze@kafka-0 'sudo /usr/local/bin/kafka-topics.sh \
--bootstrap-server kafka-bootstrap.sub.comptech-lab.com:9092 \
--under-replicated-partitions'
A scripted validation lives at opp-full-plat/scripts/rebuild/kafka/validate-kafka-kraft.sh.
Failure modes
Symptom: UnderReplicatedPartitions > 0 after a restart
Root cause. The just-restarted broker hasn’t caught up on replication yet; or a replica’s data disk filled; or a NIC dropped.
Fix. Wait. URP normally clears within seconds of the broker coming back. If it stays non-zero past a minute, check the broker logs (journalctl -u kafka -n 200) and replica fetcher metrics. Do not restart the next broker until URP returns to 0.
Prevention. Use the rolling restart procedure above; respect the URP=0 gate.
Symptom: :9404 returns nothing or connection refused
Root cause. The systemd drop-in didn’t get reloaded (systemctl daemon-reload missed), or KAFKA_OPTS was overridden by another env source, or UFW denied the source IP.
Fix. Verify systemctl show kafka -p Environment includes the -javaagent: line. Check ufw status for 9404/tcp ALLOW IN 30.30.0.0/16. If both look correct, look at journalctl -u kafka for jmx-exporter startup errors.
Prevention. systemctl daemon-reload before systemctl restart kafka when changing drop-ins.
Symptom: KRaft quorum reports only 2 voters
Root cause. One broker is down, or the controller listener on :9093 is blocked between voters.
Fix. Check kafka.service status on each broker. Verify :9093 is firewalled allow-listed for the other two broker IPs (not the wider lab /16). Restart the failed broker; quorum should re-form.
Prevention. Don’t change the :9093 firewall rules without re-verifying the three-IP allow list.
Symptom: a client reports LEADER_NOT_AVAILABLE after a broker restart
Root cause. Client cached the leader from before the restart; the client’s metadata refresh hasn’t fired yet.
Fix. Usually self-heals on the next metadata refresh. Verify the cluster itself is healthy (URP=0, 3 voters). If the client doesn’t recover, restart the client.
Prevention. Use a Kafka client library version that refreshes metadata on LEADER_NOT_AVAILABLE.
Symptom: ~40% of :9404 output looks like kafka_generic_* lines
Root cause. Rule 17 (tail catch-all) intentionally emits UNTYPED kafka_generic_* lines for un-mapped beans, so unknowns surface rather than disappear.
Fix. Not a fault. If scrape volume is a real problem, drop these at the ServiceMonitor with metricRelabelConfigs rather than removing rule 17.
Prevention. Don’t blackhole at the broker; filter at scrape time.
References
opp-full-plat/connection-details/kafka.md(planned) — runbook.opp-full-plat/plans/disconnected-rebuild/environments/dc-lab/kafka-kraft-plan.md— VM plan.- Issue #269 — Kafka monitoring Phase 1.
- Issue #273 — Kafka monitoring Phase 2 (broker JMX scrape from UWM).
- Issue #17 — TLS/SASL/ACL hardening gate.
- Issue #12 — retention, backup, and data durability.
- HAProxy backend conventions — Kafka SNI passthrough notes if and when Kafka is fronted by HAProxy.