SigNoz Overview

The lab SigNoz observability VM — standalone Docker Compose deployment, OTLP ingestion model, ClickHouse backing store, and the production-side observability track per ADR 0010.

SigNoz is the lab’s production-side observability target — the OTLP-native traces, metrics, and logs platform deployed as a standalone Docker Compose stack on the signoz VM. Per ADR 0010 and reference_lab_infrastructure.md, SigNoz is the intended destination for application telemetry; the parallel monitoring-0 VM (LGTM components on bare systemd) is a learning sandbox for trying observability primitives, not a production telemetry destination.

This page is the SigNoz overview: what it is, what’s behind it, the OTLP ingestion model, and how it relates to the two observability VMs and their distinct roles. The auth v0.122 wrinkle has its own page (Auth Quirk); the ClickHouse storage layer has its own page (ClickHouse Storage); the monitoring-0 LGTM sandbox is documented at Monitoring VM.

Architecture

OpenShift workload (OTLP exporter)

docker-runtime app (OTLP exporter)

Operator browser

OTLP HTTP :4318 (signoz VM)

OTLP gRPC :4317 (signoz VM)

HAProxy signoz.apps.* (LE wildcard)

SigNoz v0.122.0 EE (Docker Compose)

ClickHouse (traces/metrics/logs)

SQLite signoz-sqlite volume (orgs, users, dashboards)

Zookeeper (ClickHouse coordination)

The path:

OpenShift workloads and docker-runtime apps export OTLP traces/metrics/logs to the SigNoz VM directly over the lab network — gRPC on :4317 or HTTP on :4318. The OTLP endpoint is not HAProxy-fronted (telemetry is private, plain HTTP/gRPC inside the lab).
Operator browsers reach the SigNoz UI via https://signoz.apps.sub.comptech-lab.com — that path is HAProxy-fronted with the LE wildcard.
SigNoz parses OTLP, stores traces and metrics in ClickHouse, stores user/org/dashboard config in an embedded SQLite, and uses Zookeeper for ClickHouse cluster coordination.

The diagram shows the production-grade EE topology (v0.122.0) but the lab runs the open-source EE build with the same general topology.

What it is

Property	Value
VM	`signoz`
Private FQDN	`signoz.sub.comptech-lab.com`
Public hostname	`https://signoz.apps.sub.comptech-lab.com`
HAProxy backend	host-specific → SigNoz UI `:8080`
TLS terminator	HAProxy edge VM, LE wildcard `*.apps.sub.comptech-lab.com`
OS	Ubuntu 24.04 LTS cloud-init
Runtime	Docker Engine + Docker Compose
SigNoz version	`v0.122.0` EE (open-source build)
OTLP gRPC	port `4317` on the VM (private, no TLS)
OTLP HTTP	port `4318` on the VM (private, no TLS)
Backing stores	ClickHouse (telemetry), SQLite (org/user/dashboard config), Zookeeper
SQLite host path	`/var/lib/docker/volumes/signoz-sqlite/_data/`
Default admin user	`zahid` (lab convention)
Admin password custody	`secrets/signoz-vm/` (Git-ignored, mode-restricted)

Why a standalone VM (not in-cluster)

Per ADR 0010 the user explicitly reintroduced SigNoz as a standalone Ubuntu cloud-init VM service. The decision is driven by:

Decouple from OpenShift cluster lifecycle. If a cluster goes down or is being rebuilt, telemetry should still be reachable and historical traces should still be queryable.
The official upstream self-host path is Docker Compose on Linux. Following the upstream supported path keeps upgrades predictable.
External observability for the platform itself. OpenShift workloads emit telemetry to a VM outside the cluster, which is the correct posture for cluster-wide observability.
Avoid mixing with the retired RKE2/OpenShift SigNoz manifests. Those are read-only history; the new install does not reactivate them as desired state.

The lab decision is to keep SigNoz OpenShift-facing only (per ADR 0010 framing). It is an observability support service for OpenShift core operations, not a general application catalogue entry; it must not be exposed publicly without explicit source restrictions, TLS, and auth on the OTLP listener.

Why two observability VMs?

signoz and monitoring-0 exist in parallel:

Property	`signoz`	`monitoring-0`
Role	Production telemetry destination	Learning sandbox
Components	SigNoz EE (single product)	Grafana + Prometheus + Alertmanager + Loki + Tempo + Pyroscope + Alloy + exporters
Packaging	Docker Compose	Native systemd services
OTLP endpoint	`:4317`/`:4318` (private)	`:4317`/`:4318` (owned by Alloy, fan-out)
Per	ADR 0010	ADR 0012
Memory	”intended production track"	"testing / learning sandbox; not the prod telemetry destination”

Apps shipping OTLP to the lab must deliberately choose which target they’re shipping to. The two VMs have the same port numbers but different semantics — the same telemetry hitting monitoring-0:4318 lands in Alloy and is fanned out to Tempo/Loki/Prometheus for sandbox exploration; the same telemetry hitting signoz:4318 lands in SigNoz and stays there.

The monitoring-0 track is covered in its own page.

OTLP ingestion

SigNoz accepts OTLP over both HTTP and gRPC:

Protocol	Port	Path (HTTP only)	Notes
OTLP HTTP	`4318`	`/v1/traces`, `/v1/metrics`, `/v1/logs`	Plain HTTP; no TLS in v0.122 install
OTLP gRPC	`4317`	(binary)	Plain gRPC; no TLS

The OTLP listener is intentionally not behind HAProxy. SigNoz fronts it directly on the VM. The path is plain HTTP/gRPC because v0.122’s standard install does not terminate TLS on the OTLP listener and HAProxy does not proxy :4318 either. NetworkPolicy in the emitting namespace must allow egress to the SigNoz VM’s OTLP port.

A successful OTLP ingest returns 200 partialSuccess:{}. The span surfaces under /api/v1/services within roughly 30 seconds after ingestion (ClickHouse buffer flush window).

If OTLP needs to be exposed externally later (cross-cluster, off-network), ADR 0010 requires that exposure be done with explicit hostnames, source restrictions, TLS, and auth — not by widening the current plain-HTTP exposure.

Where things are stored

Traces, metrics, logs: ClickHouse — covered in its own page.
Organizations, users, dashboards, alert rules, alert channels: SQLite. The DB file lives in the Docker volume signoz-sqlite at /var/lib/docker/volumes/signoz-sqlite/_data/signoz.db. The DB uses WAL mode (the .db-wal file is part of the live state).
ClickHouse cluster coordination: Zookeeper. Part of the Compose stack.

The SQLite location matters for the auth quirk: the org-id UUID required by the v0.122 login API is only readable from this SQLite, not from any unauthenticated API. See the auth quirk page.

Validation

# DNS
dig @<lab-dns> signoz.sub.comptech-lab.com A +short
dig @<lab-dns> signoz.apps.sub.comptech-lab.com A +short

# UI
curl -sSI https://signoz.apps.sub.comptech-lab.com/login | head -1

# Version (unauthenticated, GET — NOT HEAD; HEAD hits SPA fallback)
curl -sS https://signoz.apps.sub.comptech-lab.com/api/v1/version

# Health
curl -sS https://signoz.apps.sub.comptech-lab.com/api/v1/health

# OTLP HTTP probe (from inside the lab)
curl -sS -X POST -H 'Content-Type: application/json' \
  -d '{}' http://signoz.sub.comptech-lab.com:4318/v1/traces

Expected:

DNS resolves.
/login returns HTTP/2 200.
/api/v1/version returns version JSON.
/api/v1/health returns health JSON.
The OTLP probe returns 200 with {"partialSuccess":{}} (an empty OTLP body is accepted).

Operational guardrails

Per ADR 0010 + the SigNoz connection-details runbook:

Keep telemetry private until explicit TLS/auth/source-restriction is decided.
Don’t store admin or generated keys in Git. Custody under secrets/signoz-vm/. ClickHouse passwords, OTLP tokens (if added later), API keys all stay out of trackers.
HAProxy scope is narrow — only the SigNoz UI hostname.
Treat first deployment as lab-bootstrap — production observability readiness still requires TLS/auth decisions on ingestion, ClickHouse backup/restore, retention policy, monitoring of SigNoz itself, restart drills, and upgrade rehearsal.

Known issues (high-level)

v0.122 auth API moved. v0.121 → v0.122 broke the login endpoint, the response field name, and made orgID a required input that is not retrievable from any unauthenticated API. The fix path requires reading SQLite directly. See the auth quirk page.
HEAD requests hit the SPA fallback. curl -I returns the SPA index.html for any API path; always use GET.
No initial dashboards or alerts. A freshly-installed SigNoz is empty by design. Build dashboards and alert rules as the application footprint grows.
SSH host key has rotated at least once during the v0.122 install window. Clean stale known_hosts entries.

Failure modes

Symptom: OTLP POST returns 200 but spans never appear

Root cause. Either the receiver dropped the body (malformed OTLP), or ClickHouse buffer flush hasn’t run yet, or service-name resource attribute is missing so the span lands in an unnamed bucket.

Fix. Wait 30+ seconds. Verify the OTLP body includes service.name resource attribute. Check /api/v1/services for the expected service name; check /api/v1/services?start=<ms>&end=<ms> with an explicit recent time window.

Prevention. OpenTelemetry instrumentation in every emitting app must set service.name and service.namespace. Pipeline templates encode this.

Symptom: HAProxy returns 502 on the SigNoz UI

Root cause. SigNoz container is restarting; ClickHouse OOM; container can’t reach Zookeeper.

Fix. SSH to the signoz VM, docker compose ps, find the unhealthy container, check logs.

Prevention. Monitor SigNoz container health; alert on restarts.

Symptom: ClickHouse disk is full

Root cause. Trace/log volume exceeded the retention sizing.

Fix. Adjust retention policy in SigNoz settings (default ~15 days for traces), or grow the data disk. See ClickHouse storage page for the storage model.

Prevention. Monitor disk usage; size retention to expected volume.

References

opp-full-plat/connection-details/signoz.md — runbook for the live service.
opp-full-plat/adr/0010-signoz-standalone-vm-observability.md — VM design decision.
ClickHouse storage
Auth quirk (v0.122)
Monitoring VM (LGTM sandbox)