Understanding API rate-limiting techniques

2025-01-13, by Dariusz Suchojad

Enabling rate-limiting in Zato means that access to Zato APIs can be throttled per endpoint, user or service - including options to make limits apply to specific IP addresses only - and if limits are exceeded within a selected period of time, the invocation will fail. Let's check how to use it all.

API rate limiting works on several levels and the configuration is always checked in the order below, which follows from the narrowest, most specific parts of the system (endpoints), through users which may apply to multiple endpoints, up to services which in turn may be used by both multiple endpoints and users.

First, per-endpoint limits
Then, per-user limits
Finally, per-service limits

When a request arrives through an endpoint, that endpoint's rate limiting configuration is checked. If the limit is already reached for the IP address or network of the calling application, the request is rejected.

Next, if there is any user associated with the endpoint, that account's rate limits are checked in the same manner and, similarly, if they are reached, the request is rejected.

Finally, if the endpoint's underlying service is configured to do so, it also checks if its invocation limits are not exceeded, rejecting the message accordingly if they are.

Note that the three levels are distinct yet they overlap in what they allow one to achieve.

For instance, it is possible to have the same user credentials be used in multiple endpoints and express ideas such as "Allow this and that user to invoke my APIs 1,000 requests/day but limit each endpoint to at most 5 requests/minute no matter which user".

Moreover, because limits can be set on services, it is possible to make it even more flexible, e.g. "Let this service be invoked at most 10,000 requests/hour, no matter which user it is, with particular users being able to invoke at most 500 requests/minute, no matter which service, topping it off with per separate limits for REST vs. SOAP vs. JSON-RPC endpoint, depending on what application is invoke the endpoints". That lets one conveniently express advanced scenarios that often occur in practical situations.

Also, observe that API rate limiting applies to REST, SOAP and JSON-RPC endpoints only, it is not used with other API endpoints, such as AMQP, IBM MQ, SAP, task scheduler or any other technologies. However, per-service limits work no matter which endpoint the service is invoked with and they will work with endpoints such as WebSockets, ZeroMQ or any other.

Lastly, limits pertain to with incoming requests only - any outgoing ones, from Zato to external resources - are not covered by it.

Per-IP restrictions

The architecture is made even more versatile thanks to the fact that for each object - endpoint, user or service - different limits can be configured depending on the caller's IP address.

This adds yet another dimension and allows to express ideas commonly witnessed in API-based projects, such as:

External applications, depending on their IP addresses, can have their own limits
Internal users, e.g. employees of the company using VPN, may have hire limits if their addresses are in the 172.x.x.x range
For performance testing purposes, access to Zato from a few selected hosts may have no limits at all

IP-based limits work hand in hand are an integral part of the mechanism - they do not rule out per-endpoit, user or service limits. In fact, for each such object, multiple IP-using limits can be set independently, thus allowing for highest degree of flexibility.

Exact or approximate

Rate limits come in two types:

Exact
Approximate

Exact rate limits are just that, exact - they en that a limit is not exceeded at all, not even by a single request.

Approximate limits may let a very small number of requests to exceed the limit with the benefit being that approximate limits are faster to check than exact ones.

When to use which type depends on a particular project:

In some projects, it does not really matter if callers have a limit of 1,000 requests/minute or 1,005 requests/minute because the difference is too tiny to make a business impact. Approximate limits work best in this case.
In other projects, there may be requirements that the limit never be exceeded no matter the circumstances. Use exact limits here.

Python code and web-admin

Alright, let's check how to define the limits in the Zato Dashboard. We will use the sample service below:

# -*- coding: utf-8 -*-

# Zato
from zato.server.service import Service

class Sample(Service):
    name = 'api.sample'

    def handle(self):

        # Return a simple string on response
        self.response.payload = 'Hello there!\n'

Now, in web-admin, we will configure limits - separately for the service, a new and a new REST API channel (endpoint).

Points of interest:

Configuration for each type of object is independent - within the same invocation some limits may be exact, some may be approximate
There can be multiple configuration entries for each object
A unit of time is "m", "h" or "d", depending on whether the limit is per minute, hour or day, respectively
All limits within the same configuration are checked in the order of their definition which is why the most generic ones should be listed first

Testing it out

Now, all is left is to invoke the service from curl.

As long as limits are not reached, a business response is returned:

$ curl http://my.user:password@localhost:11223/api/sample
Hello there!
$

But if a limit is reached, the caller receives an error message with the 429 HTTP status.

$ curl -v http://my.user:password@localhost:11223/api/sample
*   Trying 127.0.0.1...

...

< HTTP/1.1 429 Too Many Requests
< Server: Zato
< X-Zato-CID: b8053d68612d626d338b02

...

{"zato_env":{"result":"ZATO_ERROR","cid":"b8053d68612d626d338b02eb",
 "details":"Error 429 Too Many Requests"}}
$

Note that the caller never knows what the limit was - that information is saved in Zato server logs along with other details so that API authors can correlate what callers get with the very rate limiting definition that prevented them from accessing the service.

zato.common.rate_limiting.common.RateLimitReached: Max. rate limit of 100/m reached;
from:`10.74.199.53`, network:`*`; last_from:`127.0.0.1;
last_request_time_utc:`2025-01-12T15:30:41.943794;
last_cid:`5f4f1ef65490a23e5c37eda1`; (cid:b8053d68612d626d338b02)

And this is it - we have created a new API rate limiting definition in Zato and tested it out successfully!

More resources

➤ Python API integration tutorials
➤ What is an integration platform?
➤ Python Integration platform as a Service (iPaaS)
➤ What is an Enterprise Service Bus (ESB)? What is SOA?
➤ Open-source iPaaS in Python