Metrics
Overview
The server exposes runtime and quality telemetry at:
- Relative API path:
GET /metrics - Full default URL:
GET http://127.0.0.1:8080/api/v1/metrics
The endpoint returns a JSON snapshot intended for operators and the embedded web UI Metrics tab.
Endpoint
Method and path
GET /metrics
Query parameters
format(optional):json | prom- Default:
json promreturns Prometheus text exposition instead of the JSON snapshot.
- Default:
window(optional):1h | 6h | 24h | 48h- Limits the
trends.windowspayload to one window.
- Limits the
include(optional): comma-separated sections- Values:
all,health,jobs,api,quality,insights,trends - Example:
include=health,jobs,quality
- Values:
limit_domains(optional): integer1..100- Caps
insights.domains.items.
- Caps
limit_batches(optional): integer1..100- Caps
insights.batches.items.
- Caps
Response codes
200: metrics snapshot JSON or Prometheus text400: structured error for invalid query params
Caching behavior
The server caches rendered metrics responses for 1 second per unique query option set.
When format=prom is used, JSON-only query params (window, include, limit_domains, limit_batches) are ignored.
Snapshot structure
Top-level fields:
schema_versiongenerated_athealthjobsapiqualityinsightstrends
Commonly used fields:
health.queue_depthhealth.in_flight_jobsjobs.completed_totalquality.outcomes.success_ratequality.outcomes.failed_ratequality.outcomes.failed_totalquality.severity.totals(NOTICE,WARNING,ERROR,CRITICAL)insights.domains.items[]insights.batches.items[]trends.windows["1h"|"6h"|"24h"|"48h"]trends.windows[...].points[].dns_cache_hit_rate(per-bucket fraction in the0..1range)
Examples
Fetch default snapshot:
curl -s "http://127.0.0.1:8080/api/v1/metrics" | jq .Fetch Prometheus exposition:
curl -s "http://127.0.0.1:8080/api/v1/metrics?format=prom"Fetch health/jobs/quality only:
curl -s "http://127.0.0.1:8080/api/v1/metrics?include=health,jobs,quality" | jq .Fetch 1h trend window with bounded insights:
curl -s "http://127.0.0.1:8080/api/v1/metrics?window=1h&limit_domains=10&limit_batches=10" | jq .Invalid query example:
curl -s "http://127.0.0.1:8080/api/v1/metrics?limit_domains=0" | jq .Example error response:
{
"error": {
"code": "invalid_limit_domains",
"message": "limit_domains must be between 1 and 100"
}
}Metrics Tab (Embedded UI)
The embedded web UI (/) has a Metrics tab backed by GET /api/v1/metrics.
It provides:
- Top cards for queue/flow/quality signals (queue depth, in-flight, rates, durations, finished/failed totals, severity totals).
- Trend mini-charts for throughput, failures, DNS query rates, and cache hit rate.
- Insight tables for top domains and error-heavy batches.
- Refresh controls (
Refresh metrics, auto-refresh toggle, trend window, insight limits). - Loading/empty/error states and retry behavior.
Auto-refresh runs on a 10-second interval while the tab is active.
Prometheus Format
GET /api/v1/metrics?format=prom exposes low-cardinality Prometheus metric families under the
gonemaster_ prefix.
Included families:
- Server gauges for build info, start time, worker counts, queue paused, queue depth, and in-flight jobs.
- DNS counters for external queries by family, cache lookups by result, and cache evictions.
- Job lifecycle counters and current-status gauges.
- API request counters plus per-route/per-method latency histograms.
- Job duration histogram, severity totals, and locale usage totals.
Excluded from Prometheus output:
- Domain and batch insight tables.
- Trend windows / sparkline point data.
- Precomputed ratios and percentiles that Prometheus should derive from counters and histograms.
References
- API schema:
docs/openapi.yaml - Server overview:
docs/server.md