This architecture chapter delves into the details of Zato servers from the platform's architectural perspective. How they work, how they scale and they can be accessed by developers and administrators.
The most important aspect of whether to add more servers or more clusters with their own servers is understanding the distinction between services that are network-bound vs. services that are CPU-bound.
Services are network-bound if they primarily wait for TCP networks. For instance, picture a sample REST or AMQP service that may take 100 ms to complete. Of that, 98 ms are spent waiting for a remote endpoint or server to respond while only 2 ms are actually spent on the actual processing of the data received. This means that the service spends 98% of its time not actively processing anything, it is bound to the network. Hence, the name, network-bound.
Services are CPU-bound if they are primarily blocked, waiting for CPUs to compute an expected result. For instance, imagine a service that requires 200 ms to obtain some data and then its AI algorithms require two minutes to complete. In this case, the service spends most of its time waiting for the CPU. Hence the name, CPU-bound. Another example may be parsing and processing of large files, e.g. multi-GB files may require CPU time to parse.
Because each Zato server in the same cluster executes the same set of services, network-bound ones should not be mixed with CPU-bound services. If they are mixed, if they are deployed to the same cluster, it may happen that CPU-bound services completely overtake CPUs, leaving no room for network-bound services. For instance, if many AI services require CPU time and a REST (TCP) request arrives, the CPUs may be completely busy with AI calculations, leaving very little or no processing time for network events.
It is perfectly fine and expected to have more than one cluster, depending on whether the workload is uniform, e.g. only network-bound or only CPU-bound as opposed to mixed workloads, containing services of both types. With mixed workloads, it is recommended to have more than one cluster.
An individual cluster can be scaled by adding more servers with a smaller numbers of CPUs for each server or by adding more CPUs to each server
Usually, it is more desirable to add more smaller servers than more CPUs per each server. The reason is that, in the true spirit of cloud computing, there are no limits as to how many servers can be added whereas, broadly, the limit of CPUs per each server is between 6 and 8, depending on a particular CPU make and model, and adding more CPUs above the limit does not significantly improve performance.
Servers with services using publish/subscribe are a special case in that they always require exactly 1 CPU per server. In this scenario, clusters are scaled by adding more servers, each with 1 CPU.