Galera Cluster node count
Job details
Name: | Galera Cluster node count |
Platform: | Mariadb |
Category: | Cluster and Replication |
Description: | Checks the number of nodes currently in the cluster. |
Long description: | Checks the number of nodes currently in the cluster. |
Version: | 1.1 |
Default schedule: | 5,15,25,35,45,55 * * * |
Requires engine install: | Yes |
Compatibility tag: | .[type=‘instance’ & is_mariadb_branch=‘1′]/.[hasengine=‘YES’ & use_global_variables_information_schema = ‘1′ & wsrep_cluster != ‘0′] |
Parameters
Name | Default value | Description |
---|---|---|
cluster node count | 0 | The Galera Cluster node count. |
return status | 2 | Return status value (ALARM – 2, WARNING – 1, or OK – 0) when the Galera Cluster node count is less than “cluster node count” parameter value. |
enable warnings and alarms | YES | If set to “NO” (default), the alert will only collect statistics without returning status warning or alarm. Value “YES“ will activate the alert. |
history threshold | 7 | The maximum number of day to kept statistics for in the historic tables. |
Job Summary
- Purpose: The purpose of this job is to monitor and manage the number of nodes in a MariaDB Galera cluster, ensuring the cluster maintains its optimal size.
- Why: This job is important to ensure that the Galera cluster does not fall below the expected number of nodes, which can affect the overall performance and reliability of the database system. If the number of nodes is less than expected, it could lead to issues in handling database requests efficiently or to potential data consistency problems.
- Manual checking: To manually check the current number of nodes in the Galera cluster, you can use the following SQL command:
SELECT VARIABLE_VALUE FROM information_schema.global_status WHERE variable_name = 'wsrep_cluster_size';
Implementation Details
The monitoring task performs several actions to check and record the number of nodes:
- It creates and maintains historical data about the size of the cluster. This data is stored in two tables — one for the ongoing record and another for historical changes.
- The job compares the current size of the cluster to a predefined threshold (parameter ‘cluster node count’) to decide if the status is OK, WARNING, or ALARM.
- If enabled, the job can trigger alerts when the number of nodes falls below the required count.
Dependency and Cleanup
The job depends on stored procedures and tables it creates:
- dbw_galera_cluster_node_count: Main procedure that calculates and updates records.
- dbw_galera_cluster_node_count_histr and dbw_galera_cluster_node_count_last: Tables for storing historical data and the most recent data respectively.
If the installation or update fails, the job will execute cleanup operations:
- Dropping the previously mentioned tables and the main procedure to ensure that there are no remnants of a failed installation that could interfere with database operations.
Report and Visualization
After collecting data, the job compiles a report summarized in a graphical format, displaying the node count history over time:
- The report contains a category chart derived from the historical table which plots the number of nodes in the Galera Cluster against historical dates.
- This allows for quick visual assessment of node count trends and immediate spotting of discrepancies.
Automation and Scheduling
- The job is scheduled to run periodically at 5,15,25,35,45,55 minutes past every hour, ensuring frequent updates and immediate detection of any issue related to the cluster size.
- Automation of this task helps in maintaining the continuous monitoring without manual intervention, making sure that the database’s high availability and fault tolerance characteristics upheld by properly functioning clusters are intact.