VectorFlow
Reference

VectorFlow Metrics Reference

VectorFlow exposes a Prometheus-compatible metrics endpoint at GET /api/metrics.

Authentication

The endpoint requires a service account Bearer token with the metrics.read permission:

Authorization: Bearer vf_<your-service-account-key>

Generate a service account key in Settings → Service Accounts.


Prometheus Scrape Configuration

Add this job to your prometheus.yml:

scrape_configs:
  - job_name: vectorflow
    scrape_interval: 30s
    scrape_timeout: 10s
    scheme: https                      # use http for local dev
    metrics_path: /api/metrics
    authorization:
      credentials: vf_<your-key>       # or use credentials_file
    static_configs:
      - targets:
          - your-vectorflow-host:443
        labels:
          env: production

For Docker Compose environments, replace the target with the service name and port (e.g. vectorflow:3000).


Metrics

All VectorFlow metric names are prefixed with vectorflow_. Metrics are exposed in Prometheus text format 0.0.4.

Implementation note: Throughput counters (events_in_total, events_out_total, etc.) are registered as Gauge types in prom-client but store cumulative totals sourced from the database. They are monotonically increasing across the lifetime of a pipeline run and behave correctly with rate() and increase() in PromQL.


Node Metrics

vectorflow_node_status

Node health status.

FieldValue
TypeGauge
Labelsnode_id, node_name, environment_id

Value mapping:

ValueStatusMeaning
1HEALTHYNode is reachable and operating normally
2DEGRADEDNode is reachable but reporting issues
3UNREACHABLENode cannot be contacted
0UNKNOWNStatus has not been determined yet

Example queries:

# All unhealthy nodes
vectorflow_node_status != 1

# Fraction of healthy nodes
(count(vectorflow_node_status == 1) or vector(0)) / count(vectorflow_node_status)

# Alert: any node unreachable for >2 min
vectorflow_node_status == 3

Pipeline Metrics

All pipeline metrics carry the labels node_id and pipeline_id.

vectorflow_pipeline_status

Pipeline process status.

FieldValue
TypeGauge
Labelsnode_id, pipeline_id

Value mapping:

ValueStatusMeaning
1RUNNINGPipeline is actively processing events
2STARTINGPipeline process is initialising
3STOPPEDPipeline was stopped gracefully
4CRASHEDPipeline process exited unexpectedly
0PENDINGPipeline has not started yet

vectorflow_pipeline_events_in_total

Cumulative count of events received by the pipeline since it started.

FieldValue
TypeGauge (cumulative total)
UnitEvents
Labelsnode_id, pipeline_id

Example queries:

# Current ingest rate (events/sec)
rate(vectorflow_pipeline_events_in_total[2m])

# Total events ingested across all pipelines
sum(vectorflow_pipeline_events_in_total)

vectorflow_pipeline_events_out_total

Cumulative count of events emitted by the pipeline since it started.

FieldValue
TypeGauge (cumulative total)
UnitEvents
Labelsnode_id, pipeline_id

Example queries:

# Outbound throughput rate
rate(vectorflow_pipeline_events_out_total[2m])

# Drop rate: events consumed but not forwarded
rate(vectorflow_pipeline_events_in_total[2m])
  - rate(vectorflow_pipeline_events_out_total[2m])

vectorflow_pipeline_errors_total

Cumulative count of errors encountered by the pipeline.

FieldValue
TypeGauge (cumulative total)
UnitErrors
Labelsnode_id, pipeline_id

Example queries:

# Error rate
rate(vectorflow_pipeline_errors_total[2m])

# Error ratio (errors per inbound event)
rate(vectorflow_pipeline_errors_total[5m])
  / (rate(vectorflow_pipeline_events_in_total[5m]) > 0)

vectorflow_pipeline_events_discarded_total

Cumulative count of events intentionally discarded (e.g. by a filter or drop transform).

FieldValue
TypeGauge (cumulative total)
UnitEvents
Labelsnode_id, pipeline_id

vectorflow_pipeline_bytes_in_total

Cumulative byte volume received by the pipeline since it started.

FieldValue
TypeGauge (cumulative total)
UnitBytes
Labelsnode_id, pipeline_id

Example queries:

# Inbound throughput in bytes/sec
rate(vectorflow_pipeline_bytes_in_total[2m])

vectorflow_pipeline_bytes_out_total

Cumulative byte volume emitted by the pipeline since it started.

FieldValue
TypeGauge (cumulative total)
UnitBytes
Labelsnode_id, pipeline_id

vectorflow_pipeline_utilization

Fractional CPU/processing utilisation of the pipeline, as reported by the Vector process. Range: 0.0 (idle) to 1.0 (fully saturated).

FieldValue
TypeGauge
UnitRatio (0–1)
Labelsnode_id, pipeline_id

Example queries:

# Pipelines over 80% utilisation
vectorflow_pipeline_utilization > 0.8

# Average utilisation across all running pipelines
avg(vectorflow_pipeline_utilization > 0)

vectorflow_pipeline_latency_mean_ms

Mean end-to-end pipeline latency in milliseconds, sourced from the latest PipelineMetric snapshot stored in the database. This metric only appears when latency data has been reported.

FieldValue
TypeGauge
UnitMilliseconds
Labelspipeline_id, node_id

Example queries:

# Pipelines with mean latency > 1 second
vectorflow_pipeline_latency_mean_ms > 1000

# 95th percentile latency across pipelines (approximate via max)
max(vectorflow_pipeline_latency_mean_ms)

Internal Metrics

vectorflow_metric_store_streams

Number of active metric streams held in the in-process MetricStore. Each stream corresponds to a live metric time series being accumulated in memory before persistence.

FieldValue
TypeGauge
UnitCount
LabelsNone

vectorflow_metric_store_memory_bytes

Estimated memory consumed by the in-process MetricStore, in bytes.

FieldValue
TypeGauge
UnitBytes
LabelsNone

Example queries:

# Alert if MetricStore exceeds 100 MiB
vectorflow_metric_store_memory_bytes > 104857600

Summary Table

MetricTypeLabelsUnit
vectorflow_node_statusGaugenode_id, node_name, environment_idEnum (0–3)
vectorflow_pipeline_statusGaugenode_id, pipeline_idEnum (0–4)
vectorflow_pipeline_events_in_totalGauge (cumulative)node_id, pipeline_idEvents
vectorflow_pipeline_events_out_totalGauge (cumulative)node_id, pipeline_idEvents
vectorflow_pipeline_errors_totalGauge (cumulative)node_id, pipeline_idErrors
vectorflow_pipeline_events_discarded_totalGauge (cumulative)node_id, pipeline_idEvents
vectorflow_pipeline_bytes_in_totalGauge (cumulative)node_id, pipeline_idBytes
vectorflow_pipeline_bytes_out_totalGauge (cumulative)node_id, pipeline_idBytes
vectorflow_pipeline_utilizationGaugenode_id, pipeline_idRatio (0–1)
vectorflow_pipeline_latency_mean_msGaugepipeline_id, node_idMilliseconds
vectorflow_metric_store_streamsGaugeCount
vectorflow_metric_store_memory_bytesGaugeBytes

Pre-built Dashboards and Rules

FileDescription
monitoring/grafana/vectorflow-overview.jsonGrafana 10+ dashboard — import via Dashboards → Import
monitoring/prometheus/vectorflow.rules.ymlRecording rules and alerting rules — reference from prometheus.yml

Loading the Grafana dashboard

  1. Open Grafana → Dashboards → Import.
  2. Upload monitoring/grafana/vectorflow-overview.json or paste its contents.
  3. Select your Prometheus data source when prompted.
  4. Click Import.

Loading the Prometheus rules

Add a reference in prometheus.yml:

rule_files:
  - /etc/prometheus/rules/vectorflow.rules.yml

Then copy monitoring/prometheus/vectorflow.rules.yml to that path and reload Prometheus:

curl -X POST http://localhost:9090/-/reload

Verify rules loaded successfully:

curl http://localhost:9090/api/v1/rules | jq '.data.groups[] | select(.name | startswith("vectorflow"))'

On this page