Understanding WebSocket API timeouts

2021-04-12, by Dariusz Suchojad

Zato WebSocket channels let you accept long-running API connections and, as such, they have a few settings to fine tune their usage of timeouts. Let's discover what they are and how to use them.

WebSocket channels

The four timeout settings are listed below. All of the WebSocket clients using a particular channel will use the same timeouts configuration - this means that a different channel is needed if particular clients require different settings.

New token wait time
Token TTL
Ping interval
Threshold

Tokens

New token wait time - when a new WebSocket connection is established to Zato, it has that many seconds to open a session and to send its credentials. If that is not done, Zato immediately closes the connection.
Token TTL - once a session is established and a session token is returned to the client, the token's time-to-live (TTL) will be that many seconds. If there is no message from the client within TTL seconds, Zato considers the token expired and it cannot be used any longer although it is not guaranteed that the connection will be closed immediately after the token becomes expired.

In this context, a message that can extend TTL means one of:

A request sent by the client
A response to a request previously sent by Zato
A response to a ping message sent by Zato

Ping messages

Ping interval - Zato sends WebSocket ping messages once in that many seconds. Each time a response to a ping request is received from the client, the session token's TTL is extended by the same number of seconds.

For instance, supposing a new session token was issued to a client at 15:00:00 with a TTL of 3600 (to 16:00:00) and ping interval is 30 seconds.

First, at 15:00:30 Zato will send a ping message.

If the client responds successfully, the token's TTL will be increased by ping interval seconds more (here, 30) from the time the response arrived, e.g. if it arrives at 15:00:30,789 (after 789 milliseconds), it will be valid up to 16:00:30,789 because this is the result of adding TTL and ping interval seconds from the time the response was received by the server.

Threshold - the threshold of missed ping messages after exceeding of which Zato will close the connection. For instance, if the threshold is 5 and ping interval is 10, Zato will ping the client once in 10 seconds, if there are no 5 responses to the pings in a row (a total of 50 seconds in this case), the connection will be closed immediately.

Note that only pings missed consecutively are counted towards the threshold. For instance, if a client missed 2 out of 5 pings but then replies on the 3rd attempt, its counter of messages missed is reset and it starts from 0 once more as though it never missed a single ping.

A note about firewalls

A great advantage of using WebSocket connections is that they are bidirectional and let one easily send messages to and from clients using the same TCP connection over a longer time.

However, particularly in the relation to ping messages, it needs to be remembered that stateful firewalls in data centers may have their requirements as to how often peers should communicate. This is especially true if the communication is over the Internet rather than in the same data center.

On one hand, this means that the ping interval should be set to a value small enough to ensure that firewalls will not break connections in a belief that Zato does not have anything more to send. Yet, it should not be too small lest, with a huge number of connections, the overhead of pings becomes too burdensome. For instance, pinging each client once a second is almost certainly too much and usually 20-40 seconds are a much better choice.

On the other hand, firewalls may also require the side which initiated the TCP connection (i.e. the WebSocket client) to periodically send some data to keep the connection active, otherwise the firewalls will drop the connection. This means that clients should be also possibly configured to send ping messages and how often they should do it may depend on what the applicable firewalls expect - otherwise, with only Zato pinging the client, it may not be enough for firewalls to understand that a connection is still active.

Python code

Finally, it is worth to keep in mind that all the timeouts, TTLs and pings are managed by the platform automatically and there is no programming needed for them to work.

For instance, the service below, once assigned to a WebSocket channel, will focus on the business functionality rather than on low-level management of timeouts - in other words, there is no additional code required.

```python

-- coding: utf-8 --

Zato

from zato.server.service import Service

class MyService(Service): def handle(self): self.logger.info('My request is %s', self.request.input)