Replica delay
Job details
Name: |
Replica delay |
Platform: |
Mariadb |
Category: |
Cluster and Replication |
Description: |
This job collects the number of bytes received and sent from from other Galera Cluster nodes. |
Long description: |
|
Version: |
1.5 |
Default schedule: |
30s |
Requires engine install: |
No |
Compatibility tag: |
.[type=‘instance’ & databasetype=‘mariadb’]/instance[is_mariadb_branch=‘1′] |
Parameters
Name |
Default value |
Description |
warning_threshold |
120 |
Maximum number of seconds between the replication SQL (applier) thread and the replication I/O (receiver) thread before a warning is triggered. |
alarm_threshold |
600 |
Maximum number of seconds between the replication SQL (applier) thread and the replication I/O (receiver) thread before an alarm is triggered. |
return_status_when_Replica_IO_not_running |
2 |
Return status value (ALARM – 2, WARNING – 1, or OK – 0) when replication I/O (receiver) thread is not started or/and it has not connected successfully to the source. |
return_status_when_Replica_SQL_not_running |
1 |
Return status value (ALARM – 2, WARNING – 1, or OK – 0) when replication I/O (applier) thread is not started. |
Job Summary
- Purpose: The purpose of this monitoring job is to track and manage the delay in replication between a master and a replica in a MariaDB environment.
- Why: This job is important to ensure the data integrity and synchronization between the master database and its replicas. Monitoring the replication delay helps in identifying potential issues that can affect database performance and availability.
- Manual checking: You can check this manually in the database by issuing the following SQL commands:
SHOW SLAVE STATUS;
Job Details
- Name: Replica delay
- Version: 1.5
- Provider: dbwatch.no
- Group: com.dbwatch.job
- Artifact ID: mariadb_replica_delay
- Category: Cluster and Replication
- Compatibility: This job is compatible with MariaDB instances that employ replication features.
Monitoring Details
- Description: This job checks how “late” the replica is by measuring the time difference in seconds between the replication SQL (applier) thread and the replication I/O (receiver) thread.
- Default Schedule: Every 30 seconds
Status Calculation
- The status is evaluated based on the delays between the replication SQL and I/O threads.
- A warning is issued if the delay exceeds 120 seconds but is less than 600 seconds.
- An alarm is triggered if the delay exceeds 600 seconds.
- Additional status values are set based on whether the replication I/O or SQL threads are not running or not connected to the source.
Output and Reporting
Field |
Description |
Status |
Indicates the overall status based on the configured thresholds and the running state of the replication threads (OK, WARNING, ALARM). |
Details |
Provides specifics about the replication delay, including how many seconds the replica is behind the master and the operational state of the replication threads. |
Alerting Logic
- The alerting logic involves several conditions based on the operation state of the replication I/O and SQL threads and their respective connection statuses.
- Messages and statuses are constructed to reflect various faults like non-running threads or issues in connection to the replication source.
- The monitoring job leverages a custom JavaScript engine to process and evaluate the replication status dynamically, based on real-time data from the database.
This monitoring job is crucial for database administrators to keep a close eye on the health and performance of their MariaDB replicas, ensuring data consistency and timely troubleshooting of replication issues.