All Zato metrics follow Prometheus naming conventions:
zato_ prefix on every metric_total suffix on all counters_seconds for duration measurements_bytes for size measurementsFor an overview of the endpoint, architecture, and configuration, see Prometheus metrics.
| Type | Counter |
| HELP | Total REST channel requests handled by this server, by channel, HTTP status class, and error attribution |
Labels:
| Label | Possible values | Description |
|---|---|---|
channel_name | Any configured channel name | The REST channel that received the request |
status_code | 2xx, 3xx, 4xx, 5xx, 0xx | HTTP response status class |
error_source | none, gateway, upstream, auth, rate_limit | Where the error originated, none for successes |
Trigger: Incremented once per completed REST channel request, after the response is written.
Example PromQL:
| Type | Histogram |
| HELP | Duration of REST channel requests in seconds, from request accept to response write complete |
| Buckets | 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0 |
Labels:
| Label | Possible values | Description |
|---|---|---|
channel_name | Any configured channel name | The REST channel that received the request |
Trigger: Observed once per completed REST channel request with the wall-clock duration from request accept to response write.
Example PromQL:
histogram_quantile(0.99,
sum by (channel_name, le) (rate(zato_rest_channel_request_duration_seconds_bucket[5m]))
)
| Type | Histogram |
| HELP | Size of inbound REST request bodies in bytes, by channel name |
| Buckets | 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216 |
Labels:
| Label | Possible values | Description |
|---|---|---|
channel_name | Any configured channel name | The REST channel that received the request |
Trigger: Observed once per completed REST channel request with the Content-Length of the inbound request body.
Example PromQL:
histogram_quantile(0.95,
sum by (channel_name, le) (rate(zato_rest_channel_request_size_bytes_bucket[5m]))
)
| Type | Histogram |
| HELP | Size of outbound REST response bodies in bytes, by channel name |
| Buckets | 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216 |
Labels:
| Label | Possible values | Description |
|---|---|---|
channel_name | Any configured channel name | The REST channel that handled the request |
Trigger: Observed once per completed REST channel request with the size of the outbound response body.
Example PromQL:
histogram_quantile(0.95,
sum by (channel_name, le) (rate(zato_rest_channel_response_size_bytes_bucket[5m]))
)
| Type | Counter |
| HELP | Total outgoing REST requests sent to external systems, by connection, HTTP status class, and error attribution |
Labels:
| Label | Possible values | Description |
|---|---|---|
connection_name | Any configured outgoing REST connection name | The outgoing connection used |
status_code | 2xx, 3xx, 4xx, 5xx, 0xx | HTTP response status class |
error_source | none, gateway, upstream, auth, rate_limit | Where the error originated |
Trigger: Incremented once per completed outgoing REST request, after the response is received or an error occurs.
Example PromQL:
| Type | Histogram |
| HELP | Duration of outgoing REST requests in seconds, from send to response received |
| Buckets | 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0 |
Labels:
| Label | Possible values | Description |
|---|---|---|
connection_name | Any configured outgoing REST connection name | The outgoing connection used |
Trigger: Observed once per completed outgoing REST request with the wall-clock duration from send to response received.
Example PromQL:
histogram_quantile(0.99,
sum by (connection_name, le) (rate(zato_rest_outgoing_request_duration_seconds_bucket[5m]))
)
| Type | Histogram |
| HELP | Size of outgoing REST request bodies in bytes, by connection name |
| Buckets | 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216 |
Labels:
| Label | Possible values | Description |
|---|---|---|
connection_name | Any configured outgoing REST connection name | The outgoing connection used |
Trigger: Observed once per outgoing REST request with the size of the request body sent.
Example PromQL:
histogram_quantile(0.95,
sum by (connection_name, le) (rate(zato_rest_outgoing_request_size_bytes_bucket[5m]))
)
| Type | Histogram |
| HELP | Size of inbound REST response bodies from external systems in bytes, by connection name |
| Buckets | 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216 |
Labels:
| Label | Possible values | Description |
|---|---|---|
connection_name | Any configured outgoing REST connection name | The outgoing connection used |
Trigger: Observed once per completed outgoing REST request with the size of the response body received.
Example PromQL:
histogram_quantile(0.95,
sum by (connection_name, le) (rate(zato_rest_outgoing_response_size_bytes_bucket[5m]))
)
| Type | Counter |
| HELP | Total invocations of Zato services, by service name and outcome |
Labels:
| Label | Possible values | Description |
|---|---|---|
service_name | Any deployed service name | The service that was invoked |
outcome | ok, error | Whether the invocation succeeded or raised an exception |
Trigger: Incremented once per service invocation after handle() returns or raises.
Example PromQL:
sum by (service_name) (rate(zato_service_invocations_total{outcome="error"}[5m]))
/
sum by (service_name) (rate(zato_service_invocations_total[5m]))
| Type | Histogram |
| HELP | Duration of service handle() execution in seconds, by service name |
| Buckets | 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0 |
Labels:
| Label | Possible values | Description |
|---|---|---|
service_name | Any deployed service name | The service that was invoked |
Trigger: Observed once per service invocation with the wall-clock duration of the handle() method.
Example PromQL:
histogram_quantile(0.95,
sum by (service_name, le) (rate(zato_service_duration_seconds_bucket[5m]))
)
| Type | Counter |
| HELP | Total messages published to pub/sub topics |
Labels:
| Label | Possible values | Description |
|---|---|---|
topic_name | Any configured topic name | The topic the message was published to |
Trigger: Incremented once per message successfully published to a topic via the Redis Streams backend.
Example PromQL:
| Type | Counter |
| HELP | Total messages delivered from pub/sub topics to subscribers |
Labels:
| Label | Possible values | Description |
|---|---|---|
topic_name | Any configured topic name | The topic the message was delivered from |
Trigger: Incremented once per message successfully delivered to a subscriber (non-expired, acknowledged).
Example PromQL:
| Type | Info |
| HELP | Static information about this Zato server instance |
Labels (set once at startup):
| Label | Description |
|---|---|
server_name | Name of this server instance as configured in the cluster |
version | Zato version, e.g. 4.1 |
Trigger: Set once when the server process starts. Does not change during the lifetime of the process.
Example PromQL:
| Type | Gauge |
| HELP | Time in seconds since the server process started |
Labels: None.
Trigger: Updated to the current elapsed time on every /metrics scrape.
Example PromQL:
| Type | Gauge |
| HELP | Number of HTTP requests currently being processed |
Labels: None.
Trigger: Incremented when a request enters the HTTP handler, decremented when the response is written. Reflects the instantaneous concurrency at scrape time.
Example PromQL:
| Type | Counter |
| HELP | Total configuration reload operations, by result |
Labels:
| Label | Possible values | Description |
|---|---|---|
result | success, failure | Whether the hot-deploy reload succeeded |
Trigger: Incremented once per hot-deploy configuration event after the reload completes or fails.
Example PromQL:
| Type | Gauge |
| HELP | Unix timestamp of the last successful configuration reload |
Labels: None.
Trigger: Set to the current Unix timestamp after each successful hot-deploy configuration reload.
Example PromQL:
This gives you the number of seconds since the last successful reload.
| Type | Gauge |
| HELP | Health status of outgoing connections, 1 = healthy, 0 = unhealthy |
Labels:
| Label | Possible values | Description |
|---|---|---|
connection_name | Any configured outgoing connection name | The connection being monitored |
address | The target address of the connection | e.g. https://api.example.com |
Trigger: Updated periodically by the outgoing health-check subsystem.
Example PromQL:
Returns all unhealthy outgoing connections.
| Type | Gauge |
| HELP | Unix timestamp when a TLS certificate expires |
Labels:
| Label | Possible values | Description |
|---|---|---|
cert_name | Name of the certificate as configured | The certificate being monitored |
listener | The listener or endpoint the certificate is bound to | e.g. https://0.0.0.0:17010 |
Trigger: Updated when the certificate store is loaded or refreshed.
Example PromQL:
Returns the number of days until each certificate expires. Alert when this drops below 30.
All scheduler metrics are produced by the Rust scheduler process and merged into the server's /metrics response automatically.
| Type | Gauge (integer) |
| HELP | Total number of jobs known to the scheduler |
Labels: None.
Trigger: Updated on every scheduler tick to reflect the current total number of jobs (active and paused).
Example PromQL:
| Type | Gauge (integer) |
| HELP | Number of active (enabled) scheduler jobs |
Labels: None.
Trigger: Updated on every scheduler tick to reflect the current number of enabled jobs.
Example PromQL:
| Type | Gauge (integer) |
| HELP | Number of jobs currently in flight |
Labels: None.
Trigger: Updated on every scheduler tick to reflect jobs that have been dispatched but have not yet completed.
Example PromQL:
| Type | Counter (integer) |
| HELP | Total iterations of the scheduler main loop |
Labels: None.
Trigger: Incremented once per iteration of the scheduler's main loop.
Example PromQL:
| Type | Counter (integer) |
| HELP | Total wall-clock jump events detected |
Labels: None.
Trigger: Incremented when the scheduler detects that the wall clock has jumped forward or backward by more than the expected tick interval. This can happen after VM suspension, NTP corrections, or container migration.
Example PromQL:
| Type | Counter (integer, vector) |
| HELP | Total scheduler job executions, by job name and outcome |
Labels:
| Label | Possible values | Description |
|---|---|---|
job_name | Any configured scheduler job name | The job that was executed |
outcome | ok, error, fired, timeout, skipped_already_in_flight, skipped_holiday | The execution outcome |
Trigger: Incremented once per job execution when the job fires, completes, times out, or is skipped.
Example PromQL:
sum by (job_name) (rate(zato_scheduler_executions_total{outcome="ok"}[5m]))
/
sum by (job_name) (rate(zato_scheduler_executions_total[5m]))
| Type | Histogram |
| HELP | Duration of scheduler job executions in seconds |
| Buckets | 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0 |
Labels:
| Label | Possible values | Description |
|---|---|---|
job_name | Any configured scheduler job name | The job that was executed |
Trigger: Observed when a job execution completes (success or timeout) with the wall-clock duration from fire to completion.
Example PromQL:
histogram_quantile(0.99,
sum by (job_name, le) (rate(zato_scheduler_execution_duration_seconds_bucket[5m]))
)
| Type | Gauge |
| HELP | Time in seconds since the scheduler process started |
Labels: None.
Trigger: Updated periodically by the scheduler's main loop.
Example PromQL: