Replica delay
Job details
Name: | Replica delay |
Platform: | Mysql |
Category: | Cluster and Replication |
Description: | NDB data node status |
Long description: | |
Version: | 1.5 |
Default schedule: | 30s |
Requires engine install: | No |
Compatibility tag: | .[type=‘instance’ & databasetype=‘mysql’]/instance[is_mysql_branch=‘1′] |
Parameters
Name | Default value | Description |
---|---|---|
warning_threshold | 120 | Maximum number of seconds between the replication SQL (applier) thread and the replication I/O (receiver) thread before a warning is triggered. |
alarm_threshold | 600 | Maximum number of seconds between the replication SQL (applier) thread and the replication I/O (receiver) thread before an alarm is triggered. |
return_status_when_Replica_IO_not_running | 2 | Return status value (ALARM – 2, WARNING – 1, or OK – 0) when replication I/O (receiver) thread is not started or/and it has not connected successfully to the source. |
return_status_when_Replica_SQL_not_running | 1 | Return status value (ALARM – 2, WARNING – 1, or OK – 0) when replication I/O (applier) thread is not started. |
Job Summary
- Purpose: The job “Replica delay” is designed to monitor the delay in MySQL replicas by measuring the time difference in seconds between the replication SQL (applier) thread and the replication I/O (receiver) thread.
- Why: Monitoring replica delay is critical for maintaining data consistency and ensuring the high availability and reliability of database systems using replication. If the delay exceeds certain thresholds, it can indicate potential issues affecting data integrity or performance.
- Manual Checking: To manually inspect replica delays, you can run the following SQL commands on the MySQL server:
SHOW SLAVE STATUS;
Job Details and Logic
- This job filters for MySQL instances that are configured as replicas.
- It periodically checks the delay of the replication processes, specifically focusing on the “Seconds_Behind_Master” value from the MySQL server’s replication status.
- Warnings or alarms are triggered based on predefined threshold values for the delay:
- A warning is issued if the delay goes beyond 120 seconds.
- An alarm is triggered if the delay exceeds 600 seconds.
- Additionally, the job checks whether the replication I/O (receiver) and SQL (applier) threads are running:
- It sets the status based on the operational state of these threads and can override the delay-based status if a thread is not running.
Output and Reporting
- The primary output of this job is the status of the replica delay, categorized into OK, WARNING, or ALARM based on the delay and thread activity.
- Detailed status includes not solely the delay in seconds but also an informative message about the running state of the replication threads.
Status Code | Description |
---|---|
0 | OK – Replication delay within acceptable parameters and threads running normally. |
1 | WARNING – Replication delay exceeds the warning threshold or the Replication SQL thread is not running. |
2 | ALARM – Replication delay exceeds the alarm threshold or the Replication I/O thread is not started or not connected successfully. |
- Each status is accompanied by a detailed message elaborating on the current state:
- Example for a warning: “180 seconds behind the source. Connected.”
- Example for an alarm: “The replication I/O (receiver) thread is not running.”
Automation and Execution
- This monitoring job is set to automatically execute every 30 seconds, ensuring timely updates on the state of the replica delay.
- The job’s compatibility is restricted to MySQL instances, reaffirming its specialized purpose.
Visualization
- The job includes a DBWatch report template which generates a readable presentation of the replica delay data, visualizing the details and status in a tabular format accessible through the dbWatch Control Center interface.