Skip to content

Zabbix Server Health

Video Lecture

Zabbix Server Health Zabbix Server Health

Description

Now that we have a few hosts configured in different template configurations and on different networks, we can experiment with Zabbix server health.

Values processed per second

Value processed per second (VPS) indicates how busy your Zabbix server is. This number may be high or low and is used as a guide to help you know when other issues may start to occur. If the number is higher than usual, and you are having no problems indicated in any of the other graphs then you can consider that OK. You can manage this value by enabling/disabling items, triggers and discovery rules for your hosts.

Utilization of data collectors

Depending on the types of items you have set up for your hosts, different pollers (data collectors) will be used to perform the task of requesting or receiving the item data.

Passive checks are managed by the poller data collector, ping checks by the ICMP data collector, web-scenarios by the http data collector, the trapper data collector handles incoming checks from active hosts and there are many other collectors handling different protocols.

When you make changes to a host, you can review this graph to see what impact it had.

Utilization of internal processes

Zabbix runs many internally scheduled tasks to do with housekeeping the SQL database, managing LLD, alerting, preprocessing, writing logs and more. Also monitor this graph to understand the impact of any changes you make.

Cache usage

The value cache is used to speed up calculations of trigger expressions, calculated items, dependent items and other things within Zabbix where it is more optimal to pull historical data straight from memory rather than re querying the database tables every time a value is needed.

The graph summarizes several caches used within Zabbix.

If any of the cache usages go above 80% then consider adjusting the Zabbix servers CacheSize setting.

The CacheSize setting is in the zabbix_server.conf file. The default is 8M. You can change this from 128K to 64GB. You will need to adjust this as you manage more hosts, especially if they have many triggers, calculated items, dependent items and other host related statistics and properties stored in the cache.

Value cache effectiveness

The two important values shown in this graph are related to hits and misses. A Hit is when a value was retrieved from memory. A miss happens when the data is not currently in memory, but needs to be retrieved from the database first. Aim to have as few misses as possible by increasing the CacheSize setting if necessary, or by reducing the amount of items and triggers you are processing for a host.

Queue size

Checks are placed into a queue and the request/response is handled as soon as possible. Some requests on hosts don't resolve quickly due to many reasons, such as the host may be switched off, or may be experiencing other resource issues such as high CPU, low memory, low network bandwidth or just in the process of restarting. And so then there may be a backlog of unanswered requests waiting to be resolved.

In the course we can see that one of the hosts has many unresolved requests in the queue. This can be caused by changing templates often or other adjustments to configurations that you may make to a host. In this example, my issue is caused by many checks not being resolved due to my hosts being switched off at times.

To see a list of items in the queue, and which host they relate to, visit the page Administration ⇾ Queue ⇾ Queue details.

Summary

When adding hosts or making other changes to Zabbix then recheck the Zabbix Health dashboard regularly to get a good feel of what your change has done. Also note that the supplied templates will have many items, triggers, discovery rules and more enabled by default that you don't actually need. Disable everything that isn't critical for your use case to save resources when Zabbix health starts to indicate problems.

Comments