Skip to main content
Version: Next

Observability

QueryFlux exposes three observability surfaces: Prometheus metrics (real-time operational), a Grafana dashboard (visual ops view), and QueryFlux Studio (admin UI with query history, cluster management, and config).


Prometheus metrics

QueryFlux exposes a /metrics endpoint (default port 9000) in standard Prometheus text format.

Scrape target: http://<host>:9000/metrics

Exposed metrics

MetricTypeLabelsDescription
queryflux_queries_totalCounterengine_type, cluster_group, status, protocolTotal queries by outcome and engine
queryflux_query_duration_secondsHistogramengine_type, cluster_groupEnd-to-end query duration from proxy receipt to result delivery
queryflux_translated_queries_totalCountersrc_dialect, tgt_dialectQueries where SQL dialect translation ran
queryflux_running_queriesGaugecluster_group, cluster_nameCurrently executing queries per cluster
queryflux_queued_queriesGaugecluster_groupQueries waiting for a free cluster slot

The metrics pipeline uses MultiMetricsStore to fan out to Prometheus (real-time) and optionally Postgres (historical). BufferedMetricsStore wraps the Postgres store to avoid blocking query execution on I/O.

Prometheus config

The prometheus/prometheus.yml file configures a single scrape job:

scrape_configs:
- job_name: queryflux
static_configs:
- targets: ["host.docker.internal:9000"]
metrics_path: /metrics

When running via make dev, Prometheus starts in Docker and reaches QueryFlux on the host via host.docker.internal. On Linux, the docker-compose.yml adds extra_hosts: ["host.docker.internal:host-gateway"] to bridge this.


Grafana dashboard

A pre-built dashboard (grafana/dashboards/queryflux.json) is auto-provisioned when Grafana starts via Docker Compose. It auto-refreshes every 10 seconds.

Access: http://localhost:3000 (credentials: admin / admin)

Panels

PanelTypeWhat it shows
Query RateStatQueries per minute, current window
Error RateStatFraction of failed queries (%)
p95 LatencyStat95th-percentile end-to-end duration
Translation RateStatFraction of queries that ran through sqlglot
Query Throughput by StatusTime seriesSuccess / Failed / Cancelled over time
Query Throughput by EngineTime seriesPer-engine query rate over time
Latency Percentiles (p50 / p95 / p99)Time seriesLatency distribution trends
SQL Translations by Dialect PairTime seriesTranslation volume per src→tgt pair
Query Throughput per ClusterTime seriesPer-cluster query rate
Queued Queries per Cluster GroupTime seriesQueue depth per group — spikes indicate saturation

Provisioning

Grafana is auto-configured via two provisioning files:

  • grafana/provisioning/datasources/prometheus.yml — registers the Prometheus datasource pointing at http://prometheus:9090
  • grafana/dashboards/ — dashboards loaded automatically at startup; no manual import needed

QueryFlux Studio (admin UI)

QueryFlux Studio is a Next.js web UI served separately from the proxy. It talks to the Admin REST API on port 9000.

Start: cd ui/queryflux-studio && npm run dev (or build and serve for production)

Default URL: http://localhost:3001

Pages

PageWhat it shows
DashboardLive query rate, error rate, avg latency, translation rate; cluster health grid; recent queries
ClustersAll cluster groups and member clusters — health, running/queued counts, enable/disable, max concurrency
QueriesSearchable, filterable query history — SQL, status, duration, engine, routing trace
EnginesEngine registry — supported engines, connection types, config fields

Studio requires Postgres persistence to be configured — query history, cluster config, and dashboard stats are read from the DB. Without Postgres, the clusters page works (from in-memory state) but history pages return empty.


Admin REST API

The admin API is served on port 9000 alongside /metrics. An OpenAPI spec is available at http://localhost:9000/openapi.json and the Swagger UI at http://localhost:9000/docs.

Endpoints

MethodPathDescription
GET/healthHealth check
GET/metricsPrometheus metrics
GET/admin/clustersLive cluster state (all groups)
PATCH/admin/clusters/{group}/{cluster}Update cluster (enable/disable, max concurrency)
GET/admin/queriesQuery history (paginated, filterable)
GET/admin/statsAggregated stats for the last hour
GET/admin/engine-stats?hours=NPer-engine aggregated stats
GET/admin/group-stats?hours=NPer-cluster-group aggregated stats
GET/admin/enginesDistinct engine types in query log
GET/admin/engine-registryFull engine descriptor catalog
GET/PUT/DELETE/admin/config/clusters/{name}Cluster config CRUD (Postgres required)
GET/admin/config/clustersList all persisted cluster configs
GET/PUT/DELETE/admin/config/groups/{name}Cluster group config CRUD (Postgres required)
GET/admin/config/groupsList all persisted group configs
GET/openapi.jsonOpenAPI spec
GET/docsSwagger UI

Config CRUD endpoints (/admin/config/*) require Postgres persistence. The in-memory store supports reading live cluster state but not persisted config management.