Understanding statistics - Trends
The series will include:
- Trends (this part)
- Settings and maintenance
- Statistics API
What are trends?
Trends answer the one important question asked by developers and operations alike - what is currently going on and what sort of tendencies can be found on my environments?
From that follows - is what happening right now considered typical or should I expect difficulties ahead soon?
Note that trends are not an early-warning system and as such they don’t substitute monitoring solutions - they are instead meant to be an aid with an intricate knowledge of Zato.
Trends are concerned with relatively short spans of time - last hour, last 20 minutes or last 10 minutes. Essentially, this is a tool to give you an insight into how your services behave right after you receive a signal that there might be something unwanted hapenning right now.
When displaying trends the screen will be usually divided into 4 parts:
- Left side - Slowest services + Most commonly used ones
- Right side - as above but in a different time period, the one we want to compare the left side to
Hovering over any row will highlight matching rows in all the tables - for instance, the most commonly used service doesn’t have to be necessarily the slowest one and the one that was slowest yesterday the same time of day doesn’t have to be so slow today.
Understanding the difference in usage patterns is literally a hover away.
By default top 10 services in each category are shown but the number can be set to any one required.
3 quick links let one answer the most common questions:
- How does it all compare to last hour?
- To yesterday the same hour?
- Last week the same day and hour?
It’s also possible to pick arbitrary start/stop dates to compare to but do note that trends are always generated on fly from Redis and the process is CPU intensive so obtaining information for more than a couple of hours can result in visible CPU spikes.
The 6 columns displayed for each of the slowest services are:
- M - Mean response time of that service (in ms)
- AM - Average mean response time of all services, including that one (in ms)
- U% - The service’s usage share - of all invocations of all services, how many fell on that one
- T% - The service’s time share - of the whole time spent in all services, how many percents that service constitutes
- TU - How many times all the services, including this one, have been invoked
- Trend - A sparkline chart displaying how the service’s mean response time fluctuated in the selected period
Hence the row in the screenshot describes a service whose mean response time was 12.72 ms which is almost 10 times faster than the average mean response time when all services are taken into account (122 ms).
Of out the total 40 invocations of all services (TU) this one took 2.5% of it but in terms of the time spent it was 18.9% of the whole time the 40 invocations took.
From the sparkline, one can also learn that the service was not used at all except for a sudden spike a few minutes ago.
Everything can also be exported to CSV.
Most commonly used
The 6 columns describing for each of the most commonly used services are:
- U - How many times the service has been invoked
- Rq/s - How many requests a second that service processed - note that 0.1 req/s is the smallest degree shown
- U% - The service’s usage share - of all invocations of all services, how many fell on that one (repeated)
- T% - The service’s time share - of the whole time spent in all services, how many percents that service constitutes (repeated)
- TU - How many times all the services, including this one, have been invoked (repeated)
- Trend - A sparkline chart displaying the service’s average request rate per second
Thus in the screenshot one can find a service which was invoked 14 times (U) of all the 40 service invocations (TU) which is exactly 35% of them all.
However, despite consituting 35% of all invocations it took only 9.4% (T%) when CPU time is considered - meaning it was pretty fast.
The trend tells us that the service is mostly idle except for a brief period a couple of minutes ago.
Each service whose usage share exceeds what is expected is marked with a red dot left to its name.
And again, statistics can be exported to CSV.
Trends are a tool used for finding out or confirming whether what an environment does currently conforms to standard usage patterns or not.
Next instalment will cover summaries that let one quickly compare statistics across days, months or years.