Understanding Zato servers

This architecture chapter delves into the details of Zato servers from the platform’s architectural perspective. How they work, how they scale and they can be accessed by developers and administrators.

Active-active

../_images/servers-active.png
  • There are no limits as to how many servers there can be in a single cluster
  • By default, all servers in a cluster are always active and the load balancer will direct traffic to all of them
  • It is possible to take a server offline, e.g. to apply updates, and the load balancer will redirect the traffic to the rest
  • As long as a server is running, it synchronises its state with other members of the clusters, even if that server is offline. For instance, code deployed to any server will be auto-distributed to all the other servers, even if from the load balancer’s perspective any of them is offline.

Containers for services

../_images/servers-container.png
  • Servers are containers onto which API services are deployed
  • There are no limits as to how many services there can be in a single server
  • Each idle service consumes up to 1 MB of RAM. Thus, 1 GB of RAM can mean 1,000 business API or AI services.
  • A service takes less than 1 ms to deploy. It takes less than 1 second to deploy 1,000 business API or AI services.
  • All servers from the same cluster are always mirror images in terms of what code, what services, they execute

Scaling an environment - APIs & AI

../_images/scale-environment.png
  • The most important aspect of whether to add more servers or more clusters with their own servers is understanding the distinction between services that are network-bound vs. services that are CPU-bound.
  • Services are network-bound if they primarily wait for TCP networks. For instance, picture a sample REST or AMQP service that may take 100 ms to complete. Of that, 98 ms are spent waiting for a remote endpoint or server to respond while only 2 ms are actually spent on the actual processing of the data received. This means that the service spends 98% of its time not actively processing anything, it is bound to the network. Hence, the name, network-bound.
  • Services are CPU-bound if they are primarily blocked, waiting for CPUs to compute an expected result. For instance, imagine a service that requires 200 ms to obtain some data and then its AI algorithms require two minutes to complete. In this case, the service spends most of its time waiting for the CPU. Hence the name, CPU-bound. Another example may be parsing and processing of large files, e.g. multi-GB files may require CPU time to parse.
  • Because each Zato server in the same cluster executes the same set of services, network-bound ones should not be mixed with CPU-bound services. If they are mixed, if they are deployed to the same cluster, it may happen that CPU-bound services completely overtake CPUs, leaving no room for network-bound services. For instance, if many AI services require CPU time and a REST (TCP) request arrives, the CPUs may be completely busy with AI calculations, leaving very little or no processing time for network events.
  • It is perfectly fine and expected to have more than one cluster, depending on whether the workload is uniform, e.g. only network-bound or only CPU-bound as opposed to mixed workloads, containing services of both types. With mixed workloads, it is recommended to have more than one cluster.

Scaling a cluster

../_images/scale-servers.png
  • An individual cluster can be scaled by adding more servers with a smaller numbers of CPUs for each server or by adding more CPUs to each server
  • Usually, it is more desirable to add more smaller servers than more CPUs per each server. The reason is that, in the true spirit of cloud computing, there are no limits as to how many servers can be added whereas, broadly, the limit of CPUs per each server is between 6 and 8, depending on a particular CPU make and model, and adding more CPUs above the limit does not significantly improve performance.
  • Servers with services using publish/subscribe are a special case in that they always require exactly 1 CPU per server. In this scenario, clusters are scaled by adding more servers, each with 1 CPU.