Replication delay alert
Job details
Name: |
Replication delay alert |
Platform: |
Postgres |
Category: |
Cluster |
Description: |
Checks for BDR replication lag |
Long description: |
|
Version: |
1.01 |
Default schedule: |
15m |
Requires engine install: |
No |
Compatibility tag: |
.[type=‘instance’ & databasetype=‘postgres’]/.[newer_than_ninetwo = ‘1′] |
Parameters
Name |
Default value |
Description |
warning_threshold |
300 |
Replication delay in seconds in order to generate warning |
alarm_threshold |
600 |
Replication delay in seconds in order to generate alarm |
Job Summary
- Purpose: The purpose of this job is to monitor the replication delay in Postgres servers, specifically targeting instances of version 9.2 or newer.
- Why: Monitoring replication delay is crucial for maintaining the health and performance of distributed databases. It ensures that slave instances are up to date with the master instance, preventing data inconsistency and potential loss. Alerts are generated if the delay exceeds predefined thresholds, enabling timely intervention.
- Manual checking: You can check this manually in the database by issuing the SQL command below:
SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::INT AS replication_delay_seconds;
Details and Configuration
- “Name:” Postgres job Replication delay
- “Version:” 1.01
- “Company:” dbwatch.com
- “Group:” com.dbwatch.job
- “Artifact ID:” postgres_replication_delay
- “Category:” Cluster
Threshold Configuration
Threshold Type |
Seconds |
Warning |
300 |
Alarm |
600 |
Monitoring Logic
- This job runs a SQL query to determine the number of seconds since the last transaction was replayed on the slave instance.
- Based on the output:
- If the delay exceeds the “warning_threshold” (300 seconds), a warning status is issued.
- If the delay exceeds the “alarm_threshold” (600 seconds), an alarm status is issued.
- If no replication is detected, a status of no replication is set.
- JavaScript logic is employed to process these conditions and update the monitoring status accordingly.
Schedule
- This job is scheduled to run every 15 minutes.
Compatibility
- The job is applicable only for Postgres database instances that are version 9.2 or newer.
Reporting
- Report Title: Replication info
- Description: Provides detailed replication information including the username, application name, client addresses, start times, and the state of replication.
- Data Presentation:
- Lists detailed columns such as Username, Application Name, Client Address, Client Hostname, Backend Start, State, and Sync State.
- Data is presented in a tabular format with no borders and specific columns selected for plotting.
Conclusion
This dbWatch Control Center job is an essential tool for administrators to efficiently monitor and address replication delays in Postgres databases, ensuring data reliability and system performance. Regular and automated checks like this not only help in maintaining the health of the database systems but also preemptively alert the administrators about potential issues that could escalate into critical problems if left unattended.