Schedule a demo

Prometheus metric reference

All Zato metrics follow Prometheus naming conventions:

  • zato_ prefix on every metric
  • _total suffix on all counters
  • _seconds for duration measurements
  • _bytes for size measurements

For an overview of the endpoint, architecture, and configuration, see Prometheus metrics.


REST channel metrics

zato_rest_channel_requests_total

TypeCounter
HELPTotal REST channel requests handled by this server, by channel, HTTP status class, and error attribution

Labels:

LabelPossible valuesDescription
channel_nameAny configured channel nameThe REST channel that received the request
status_code2xx, 3xx, 4xx, 5xx, 0xxHTTP response status class
error_sourcenone, gateway, upstream, auth, rate_limitWhere the error originated, none for successes

Trigger: Incremented once per completed REST channel request, after the response is written.

Example PromQL:

sum by (channel_name) (rate(zato_rest_channel_requests_total[5m]))

zato_rest_channel_request_duration_seconds

TypeHistogram
HELPDuration of REST channel requests in seconds, from request accept to response write complete
Buckets0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0

Labels:

LabelPossible valuesDescription
channel_nameAny configured channel nameThe REST channel that received the request

Trigger: Observed once per completed REST channel request with the wall-clock duration from request accept to response write.

Example PromQL:

histogram_quantile(0.99,
  sum by (channel_name, le) (rate(zato_rest_channel_request_duration_seconds_bucket[5m]))
)

zato_rest_channel_request_size_bytes

TypeHistogram
HELPSize of inbound REST request bodies in bytes, by channel name
Buckets64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216

Labels:

LabelPossible valuesDescription
channel_nameAny configured channel nameThe REST channel that received the request

Trigger: Observed once per completed REST channel request with the Content-Length of the inbound request body.

Example PromQL:

histogram_quantile(0.95,
  sum by (channel_name, le) (rate(zato_rest_channel_request_size_bytes_bucket[5m]))
)

zato_rest_channel_response_size_bytes

TypeHistogram
HELPSize of outbound REST response bodies in bytes, by channel name
Buckets64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216

Labels:

LabelPossible valuesDescription
channel_nameAny configured channel nameThe REST channel that handled the request

Trigger: Observed once per completed REST channel request with the size of the outbound response body.

Example PromQL:

histogram_quantile(0.95,
  sum by (channel_name, le) (rate(zato_rest_channel_response_size_bytes_bucket[5m]))
)

REST outgoing metrics

zato_rest_outgoing_requests_total

TypeCounter
HELPTotal outgoing REST requests sent to external systems, by connection, HTTP status class, and error attribution

Labels:

LabelPossible valuesDescription
connection_nameAny configured outgoing REST connection nameThe outgoing connection used
status_code2xx, 3xx, 4xx, 5xx, 0xxHTTP response status class
error_sourcenone, gateway, upstream, auth, rate_limitWhere the error originated

Trigger: Incremented once per completed outgoing REST request, after the response is received or an error occurs.

Example PromQL:

sum by (connection_name, error_source) (rate(zato_rest_outgoing_requests_total[5m]))

zato_rest_outgoing_request_duration_seconds

TypeHistogram
HELPDuration of outgoing REST requests in seconds, from send to response received
Buckets0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0

Labels:

LabelPossible valuesDescription
connection_nameAny configured outgoing REST connection nameThe outgoing connection used

Trigger: Observed once per completed outgoing REST request with the wall-clock duration from send to response received.

Example PromQL:

histogram_quantile(0.99,
  sum by (connection_name, le) (rate(zato_rest_outgoing_request_duration_seconds_bucket[5m]))
)

zato_rest_outgoing_request_size_bytes

TypeHistogram
HELPSize of outgoing REST request bodies in bytes, by connection name
Buckets64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216

Labels:

LabelPossible valuesDescription
connection_nameAny configured outgoing REST connection nameThe outgoing connection used

Trigger: Observed once per outgoing REST request with the size of the request body sent.

Example PromQL:

histogram_quantile(0.95,
  sum by (connection_name, le) (rate(zato_rest_outgoing_request_size_bytes_bucket[5m]))
)

zato_rest_outgoing_response_size_bytes

TypeHistogram
HELPSize of inbound REST response bodies from external systems in bytes, by connection name
Buckets64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216

Labels:

LabelPossible valuesDescription
connection_nameAny configured outgoing REST connection nameThe outgoing connection used

Trigger: Observed once per completed outgoing REST request with the size of the response body received.

Example PromQL:

histogram_quantile(0.95,
  sum by (connection_name, le) (rate(zato_rest_outgoing_response_size_bytes_bucket[5m]))
)

Service metrics

zato_service_invocations_total

TypeCounter
HELPTotal invocations of Zato services, by service name and outcome

Labels:

LabelPossible valuesDescription
service_nameAny deployed service nameThe service that was invoked
outcomeok, errorWhether the invocation succeeded or raised an exception

Trigger: Incremented once per service invocation after handle() returns or raises.

Example PromQL:

sum by (service_name) (rate(zato_service_invocations_total{outcome="error"}[5m]))
  /
sum by (service_name) (rate(zato_service_invocations_total[5m]))

zato_service_duration_seconds

TypeHistogram
HELPDuration of service handle() execution in seconds, by service name
Buckets0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0

Labels:

LabelPossible valuesDescription
service_nameAny deployed service nameThe service that was invoked

Trigger: Observed once per service invocation with the wall-clock duration of the handle() method.

Example PromQL:

histogram_quantile(0.95,
  sum by (service_name, le) (rate(zato_service_duration_seconds_bucket[5m]))
)

Pub/sub metrics

zato_pubsub_messages_published_total

TypeCounter
HELPTotal messages published to pub/sub topics

Labels:

LabelPossible valuesDescription
topic_nameAny configured topic nameThe topic the message was published to

Trigger: Incremented once per message successfully published to a topic via the Redis Streams backend.

Example PromQL:

sum by (topic_name) (rate(zato_pubsub_messages_published_total[5m]))

zato_pubsub_messages_delivered_total

TypeCounter
HELPTotal messages delivered from pub/sub topics to subscribers

Labels:

LabelPossible valuesDescription
topic_nameAny configured topic nameThe topic the message was delivered from

Trigger: Incremented once per message successfully delivered to a subscriber (non-expired, acknowledged).

Example PromQL:

sum by (topic_name) (rate(zato_pubsub_messages_delivered_total[5m]))

Server operational metrics

zato_server_info

TypeInfo
HELPStatic information about this Zato server instance

Labels (set once at startup):

LabelDescription
server_nameName of this server instance as configured in the cluster
versionZato version, e.g. 4.1

Trigger: Set once when the server process starts. Does not change during the lifetime of the process.

Example PromQL:

zato_server_info

zato_server_uptime_seconds

TypeGauge
HELPTime in seconds since the server process started

Labels: None.

Trigger: Updated to the current elapsed time on every /metrics scrape.

Example PromQL:

zato_server_uptime_seconds

zato_server_requests_in_flight

TypeGauge
HELPNumber of HTTP requests currently being processed

Labels: None.

Trigger: Incremented when a request enters the HTTP handler, decremented when the response is written. Reflects the instantaneous concurrency at scrape time.

Example PromQL:

zato_server_requests_in_flight

zato_server_config_reloads_total

TypeCounter
HELPTotal configuration reload operations, by result

Labels:

LabelPossible valuesDescription
resultsuccess, failureWhether the hot-deploy reload succeeded

Trigger: Incremented once per hot-deploy configuration event after the reload completes or fails.

Example PromQL:

rate(zato_server_config_reloads_total{result="failure"}[5m])

zato_server_config_last_reload_timestamp_seconds

TypeGauge
HELPUnix timestamp of the last successful configuration reload

Labels: None.

Trigger: Set to the current Unix timestamp after each successful hot-deploy configuration reload.

Example PromQL:

time() - zato_server_config_last_reload_timestamp_seconds

This gives you the number of seconds since the last successful reload.


zato_outgoing_health

TypeGauge
HELPHealth status of outgoing connections, 1 = healthy, 0 = unhealthy

Labels:

LabelPossible valuesDescription
connection_nameAny configured outgoing connection nameThe connection being monitored
addressThe target address of the connectione.g. https://api.example.com

Trigger: Updated periodically by the outgoing health-check subsystem.

Example PromQL:

zato_outgoing_health == 0

Returns all unhealthy outgoing connections.


zato_tls_certificate_expiry_timestamp_seconds

TypeGauge
HELPUnix timestamp when a TLS certificate expires

Labels:

LabelPossible valuesDescription
cert_nameName of the certificate as configuredThe certificate being monitored
listenerThe listener or endpoint the certificate is bound toe.g. https://0.0.0.0:17010

Trigger: Updated when the certificate store is loaded or refreshed.

Example PromQL:

(zato_tls_certificate_expiry_timestamp_seconds - time()) / 86400

Returns the number of days until each certificate expires. Alert when this drops below 30.


Scheduler metrics

All scheduler metrics are produced by the Rust scheduler process and merged into the server's /metrics response automatically.

zato_scheduler_jobs_total

TypeGauge (integer)
HELPTotal number of jobs known to the scheduler

Labels: None.

Trigger: Updated on every scheduler tick to reflect the current total number of jobs (active and paused).

Example PromQL:

zato_scheduler_jobs_total

zato_scheduler_jobs_active

TypeGauge (integer)
HELPNumber of active (enabled) scheduler jobs

Labels: None.

Trigger: Updated on every scheduler tick to reflect the current number of enabled jobs.

Example PromQL:

zato_scheduler_jobs_active

zato_scheduler_jobs_in_flight

TypeGauge (integer)
HELPNumber of jobs currently in flight

Labels: None.

Trigger: Updated on every scheduler tick to reflect jobs that have been dispatched but have not yet completed.

Example PromQL:

zato_scheduler_jobs_in_flight

zato_scheduler_ticks_total

TypeCounter (integer)
HELPTotal iterations of the scheduler main loop

Labels: None.

Trigger: Incremented once per iteration of the scheduler's main loop.

Example PromQL:

rate(zato_scheduler_ticks_total[5m])

zato_scheduler_clock_jumps_total

TypeCounter (integer)
HELPTotal wall-clock jump events detected

Labels: None.

Trigger: Incremented when the scheduler detects that the wall clock has jumped forward or backward by more than the expected tick interval. This can happen after VM suspension, NTP corrections, or container migration.

Example PromQL:

increase(zato_scheduler_clock_jumps_total[1h])

zato_scheduler_executions_total

TypeCounter (integer, vector)
HELPTotal scheduler job executions, by job name and outcome

Labels:

LabelPossible valuesDescription
job_nameAny configured scheduler job nameThe job that was executed
outcomeok, error, fired, timeout, skipped_already_in_flight, skipped_holidayThe execution outcome

Trigger: Incremented once per job execution when the job fires, completes, times out, or is skipped.

Example PromQL:

sum by (job_name) (rate(zato_scheduler_executions_total{outcome="ok"}[5m]))
  /
sum by (job_name) (rate(zato_scheduler_executions_total[5m]))

zato_scheduler_execution_duration_seconds

TypeHistogram
HELPDuration of scheduler job executions in seconds
Buckets0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0

Labels:

LabelPossible valuesDescription
job_nameAny configured scheduler job nameThe job that was executed

Trigger: Observed when a job execution completes (success or timeout) with the wall-clock duration from fire to completion.

Example PromQL:

histogram_quantile(0.99,
  sum by (job_name, le) (rate(zato_scheduler_execution_duration_seconds_bucket[5m]))
)

zato_scheduler_uptime_seconds

TypeGauge
HELPTime in seconds since the scheduler process started

Labels: None.

Trigger: Updated periodically by the scheduler's main loop.

Example PromQL:

zato_scheduler_uptime_seconds