Gelera Cluster network statistics

Job details

Name:	Gelera Cluster network statistics
Platform:	Mariadb
Category:	Cluster and Replication
Description:	This job collects the number of bytes received and sent from from other Galera Cluster nodes.
Long description:	This job collects the number of bytes received and sent from other Galera Cluster nodes.
Version:	1.1
Default schedule:	4,9,14,19,24,29,34,39,44,49,54,59 * * *
Requires engine install:	Yes
Compatibility tag:	.[type=‘instance’ & is_mariadb_branch=‘1′]/.[hasengine=‘YES’ & use_global_variables_information_schema = ‘1′ & wsrep_cluster != ‘0′]

Parameters

Name	Default value	Description
history threshold	7	The maximum number of day to keep statistics for in the history tables.
threshold (time)	15	A period of time (in minutes) the network statistics are calculated for (default 15 minutes)

Job Summary

Purpose: The purpose of this job is to collect and monitor network statistics related to bytes received and sent within a Galera Cluster environment.
Why: This job is important to ensure efficient network performance and detect anomalies in data exchange between nodes in the Galera Cluster. Monitoring these metrics helps maintain the integrity and performance of the cluster. If thresholds are reached, it might indicate network issues or abnormal activities that could affect overall cluster performance.
Manual checking: You can check this manually in the database by issuing these SQL commands:

select round(((bytes_received_diff)/(period/60))/1024,1) as Received, round(((bytes_sent_diff)/(period/60))/1024,1) as Sent, histr_date as "History date" from dbw_galera_cluster_network_stat_histr where histr_date > DATE_ADD(now(), INTERVAL -48 HOUR) order by "History date" asc

Details and Logic

The job involves creating and updating tables to store bytes received and sent.
It utilizes a procedure to measure and record the bytes transferred per minute and evaluate average traffic over the last 24 hours.
Dependencies include a main procedure and two tables which are essential for storing and processing the network statistics.
Implementation code is structured to handle exceptions, perform calculations on the data collected, and update the task values for monitoring purposes.

Execution and Reporting

The job runs periodically as per the default schedule defined.
Reports generated by this job provide insights into network traffic trends over time, highlighting potential bottlenecks or spikes in data transfer.
Visual representations (charts) of the network statistics can be viewed in the dbWatch report template, aiding in a better understanding and quicker decision-making.

Cleanup and Safety Measures

The job has mechanisms to clean up historical data older than a specified number of days to prevent excessive data accumulation and potential performance degradation.
In case of failure during execution, specific cleanup commands are issued to maintain data integrity and system stability.

Installation and Compatibility

The installation of this job is set to replace existing installations forcefully if necessary ensuring the latest version is always in use.
Compatibility queries ensure that this job is only installed on instances that meet required conditions such as having the appropriate engines and configurations.

Metadata and Categorization

The job is categorized under “Cluster and Replication,” reflecting its role in managing and monitoring database clusters and replication processes.