Prometheus metric reference

All Zato metrics follow Prometheus naming conventions:

zato_ prefix on every metric
_total suffix on all counters
_seconds for duration measurements
_bytes for size measurements

For an overview of the endpoint, architecture, and configuration, see Prometheus metrics.

REST channel metrics

zato_rest_channel_requests_total


Type	Counter
HELP	Total REST channel requests handled by this server, by channel, HTTP status class, and error attribution

Labels:

Label	Possible values	Description
`channel_name`	Any configured channel name	The REST channel that received the request
`status_code`	`2xx`, `3xx`, `4xx`, `5xx`, `0xx`	HTTP response status class
`error_source`	`none`, `gateway`, `upstream`, `auth`, `rate_limit`	Where the error originated, `none` for successes

Trigger: Incremented once per completed REST channel request, after the response is written.

Example PromQL:

sum by (channel_name) (rate(zato_rest_channel_requests_total[5m]))

zato_rest_channel_request_duration_seconds


Type	Histogram
HELP	Duration of REST channel requests in seconds, from request accept to response write complete
Buckets	`0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0`

Labels:

Label	Possible values	Description
`channel_name`	Any configured channel name	The REST channel that received the request

Trigger: Observed once per completed REST channel request with the wall-clock duration from request accept to response write.

Example PromQL:

histogram_quantile(0.99,
  sum by (channel_name, le) (rate(zato_rest_channel_request_duration_seconds_bucket[5m]))
)

zato_rest_channel_request_size_bytes


Type	Histogram
HELP	Size of inbound REST request bodies in bytes, by channel name
Buckets	`64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216`

Labels:

Label	Possible values	Description
`channel_name`	Any configured channel name	The REST channel that received the request

Trigger: Observed once per completed REST channel request with the Content-Length of the inbound request body.

Example PromQL:

histogram_quantile(0.95,
  sum by (channel_name, le) (rate(zato_rest_channel_request_size_bytes_bucket[5m]))
)

zato_rest_channel_response_size_bytes


Type	Histogram
HELP	Size of outbound REST response bodies in bytes, by channel name
Buckets	`64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216`

Labels:

Label	Possible values	Description
`channel_name`	Any configured channel name	The REST channel that handled the request

Trigger: Observed once per completed REST channel request with the size of the outbound response body.

Example PromQL:

histogram_quantile(0.95,
  sum by (channel_name, le) (rate(zato_rest_channel_response_size_bytes_bucket[5m]))
)

REST outgoing metrics

zato_rest_outgoing_requests_total


Type	Counter
HELP	Total outgoing REST requests sent to external systems, by connection, HTTP status class, and error attribution

Labels:

Label	Possible values	Description
`connection_name`	Any configured outgoing REST connection name	The outgoing connection used
`status_code`	`2xx`, `3xx`, `4xx`, `5xx`, `0xx`	HTTP response status class
`error_source`	`none`, `gateway`, `upstream`, `auth`, `rate_limit`	Where the error originated

Trigger: Incremented once per completed outgoing REST request, after the response is received or an error occurs.

Example PromQL:

sum by (connection_name, error_source) (rate(zato_rest_outgoing_requests_total[5m]))

zato_rest_outgoing_request_duration_seconds


Type	Histogram
HELP	Duration of outgoing REST requests in seconds, from send to response received
Buckets	`0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0`

Labels:

Label	Possible values	Description
`connection_name`	Any configured outgoing REST connection name	The outgoing connection used

Trigger: Observed once per completed outgoing REST request with the wall-clock duration from send to response received.

Example PromQL:

histogram_quantile(0.99,
  sum by (connection_name, le) (rate(zato_rest_outgoing_request_duration_seconds_bucket[5m]))
)

zato_rest_outgoing_request_size_bytes


Type	Histogram
HELP	Size of outgoing REST request bodies in bytes, by connection name
Buckets	`64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216`

Labels:

Label	Possible values	Description
`connection_name`	Any configured outgoing REST connection name	The outgoing connection used

Trigger: Observed once per outgoing REST request with the size of the request body sent.

Example PromQL:

histogram_quantile(0.95,
  sum by (connection_name, le) (rate(zato_rest_outgoing_request_size_bytes_bucket[5m]))
)

zato_rest_outgoing_response_size_bytes


Type	Histogram
HELP	Size of inbound REST response bodies from external systems in bytes, by connection name
Buckets	`64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216`

Labels:

Label	Possible values	Description
`connection_name`	Any configured outgoing REST connection name	The outgoing connection used

Trigger: Observed once per completed outgoing REST request with the size of the response body received.

Example PromQL:

histogram_quantile(0.95,
  sum by (connection_name, le) (rate(zato_rest_outgoing_response_size_bytes_bucket[5m]))
)

Service metrics

zato_service_invocations_total


Type	Counter
HELP	Total invocations of Zato services, by service name and outcome

Labels:

Label	Possible values	Description
`service_name`	Any deployed service name	The service that was invoked
`outcome`	`ok`, `error`	Whether the invocation succeeded or raised an exception

Trigger: Incremented once per service invocation after handle() returns or raises.

Example PromQL:

sum by (service_name) (rate(zato_service_invocations_total{outcome="error"}[5m]))
  /
sum by (service_name) (rate(zato_service_invocations_total[5m]))

zato_service_duration_seconds


Type	Histogram
HELP	Duration of service handle() execution in seconds, by service name
Buckets	`0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0`

Labels:

Label	Possible values	Description
`service_name`	Any deployed service name	The service that was invoked

Trigger: Observed once per service invocation with the wall-clock duration of the handle() method.

Example PromQL:

histogram_quantile(0.95,
  sum by (service_name, le) (rate(zato_service_duration_seconds_bucket[5m]))
)

Pub/sub metrics

These metrics track message flow in Zato's publish/subscribe system.

zato_pubsub_messages_published_total


Type	Counter
HELP	Total messages published to pub/sub topics

Labels:

Label	Possible values	Description
`topic_name`	Any configured topic name	The topic the message was published to

Trigger: Incremented once per message successfully published to a topic via the Redis Streams backend.

Example PromQL:

sum by (topic_name) (rate(zato_pubsub_messages_published_total[5m]))

zato_pubsub_messages_delivered_total


Type	Counter
HELP	Total messages delivered from pub/sub topics to subscribers

Labels:

Label	Possible values	Description
`topic_name`	Any configured topic name	The topic the message was delivered from

Trigger: Incremented once per message successfully delivered to a subscriber (non-expired, acknowledged).

Example PromQL:

sum by (topic_name) (rate(zato_pubsub_messages_delivered_total[5m]))

Server operational metrics

zato_server_info


Type	Info
HELP	Static information about this Zato server instance

Labels (set once at startup):

Label	Description
`server_name`	Name of this server instance as configured in the cluster
`version`	Zato version, e.g. `4.1`

Trigger: Set once when the server process starts. Does not change during the lifetime of the process.

Example PromQL:

zato_server_info

zato_server_uptime_seconds


Type	Gauge
HELP	Time in seconds since the server process started

Labels: None.

Trigger: Updated to the current elapsed time on every /metrics scrape.

Example PromQL:

zato_server_uptime_seconds

zato_server_requests_in_flight


Type	Gauge
HELP	Number of HTTP requests currently being processed

Labels: None.

Trigger: Incremented when a request enters the HTTP handler, decremented when the response is written. Reflects the instantaneous concurrency at scrape time.

Example PromQL:

zato_server_requests_in_flight

zato_server_config_reloads_total


Type	Counter
HELP	Total configuration reload operations, by result

Labels:

Label	Possible values	Description
`result`	`success`, `failure`	Whether the hot-deploy reload succeeded

Trigger: Incremented once per hot-deploy configuration event after the reload completes or fails.

Example PromQL:

rate(zato_server_config_reloads_total{result="failure"}[5m])

zato_server_config_last_reload_timestamp_seconds


Type	Gauge
HELP	Unix timestamp of the last successful configuration reload

Labels: None.

Trigger: Set to the current Unix timestamp after each successful hot-deploy configuration reload.

Example PromQL:

time() - zato_server_config_last_reload_timestamp_seconds

This gives you the number of seconds since the last successful reload.

zato_outgoing_health


Type	Gauge
HELP	Health status of outgoing connections, 1 = healthy, 0 = unhealthy

Labels:

Label	Possible values	Description
`connection_name`	Any configured outgoing connection name	The connection being monitored
`address`	The target address of the connection	e.g. `https://api.example.com`

Trigger: Updated periodically by the outgoing health-check subsystem.

Example PromQL:

zato_outgoing_health == 0

Returns all unhealthy outgoing connections.

zato_tls_certificate_expiry_timestamp_seconds


Type	Gauge
HELP	Unix timestamp when a TLS certificate expires

Labels:

Label	Possible values	Description
`cert_name`	Name of the certificate as configured	The certificate being monitored
`listener`	The listener or endpoint the certificate is bound to	e.g. `https://0.0.0.0:17010`

Trigger: Updated when the certificate store is loaded or refreshed.

Example PromQL:

(zato_tls_certificate_expiry_timestamp_seconds - time()) / 86400

Returns the number of days until each certificate expires. Alert when this drops below 30.

Scheduler metrics

All scheduler metrics are produced by the Rust scheduler process and merged into the server's /metrics response automatically.

zato_scheduler_jobs_total


Type	Gauge (integer)
HELP	Total number of jobs known to the scheduler

Labels: None.

Trigger: Updated on every scheduler tick to reflect the current total number of jobs (active and paused).

Example PromQL:

zato_scheduler_jobs_total

zato_scheduler_jobs_active


Type	Gauge (integer)
HELP	Number of active (enabled) scheduler jobs

Labels: None.

Trigger: Updated on every scheduler tick to reflect the current number of enabled jobs.

Example PromQL:

zato_scheduler_jobs_active

zato_scheduler_jobs_in_flight


Type	Gauge (integer)
HELP	Number of jobs currently in flight

Labels: None.

Trigger: Updated on every scheduler tick to reflect jobs that have been dispatched but have not yet completed.

Example PromQL:

zato_scheduler_jobs_in_flight

zato_scheduler_ticks_total


Type	Counter (integer)
HELP	Total iterations of the scheduler main loop

Labels: None.

Trigger: Incremented once per iteration of the scheduler's main loop.

Example PromQL:

rate(zato_scheduler_ticks_total[5m])

zato_scheduler_clock_jumps_total


Type	Counter (integer)
HELP	Total wall-clock jump events detected

Labels: None.

Trigger: Incremented when the scheduler detects that the wall clock has jumped forward or backward by more than the expected tick interval. This can happen after VM suspension, NTP corrections, or container migration.

Example PromQL:

increase(zato_scheduler_clock_jumps_total[1h])

zato_scheduler_executions_total


Type	Counter (integer, vector)
HELP	Total scheduler job executions, by job name and outcome

Labels:

Label	Possible values	Description
`job_name`	Any configured scheduler job name	The job that was executed
`outcome`	`ok`, `error`, `fired`, `timeout`, `skipped_already_in_flight`, `skipped_holiday`	The execution outcome

Trigger: Incremented once per job execution when the job fires, completes, times out, or is skipped.

Example PromQL:

sum by (job_name) (rate(zato_scheduler_executions_total{outcome="ok"}[5m]))
  /
sum by (job_name) (rate(zato_scheduler_executions_total[5m]))

zato_scheduler_execution_duration_seconds


Type	Histogram
HELP	Duration of scheduler job executions in seconds
Buckets	`0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0`

Labels:

Label	Possible values	Description
`job_name`	Any configured scheduler job name	The job that was executed

Trigger: Observed when a job execution completes (success or timeout) with the wall-clock duration from fire to completion.

Example PromQL:

histogram_quantile(0.99,
  sum by (job_name, le) (rate(zato_scheduler_execution_duration_seconds_bucket[5m]))
)

zato_scheduler_uptime_seconds


Type	Gauge
HELP	Time in seconds since the scheduler process started

Labels: None.

Trigger: Updated periodically by the scheduler's main loop.

Example PromQL:

zato_scheduler_uptime_seconds