Replication slave client alert
Job details
Name: | Replication slave client alert |
Platform: | Postgres |
Category: | Cluster |
Description: | Checks for BDR replication lag |
Long description: | |
Version: | 1 |
Default schedule: | 15m |
Requires engine install: | No |
Compatibility tag: | .[type=‘instance’ & databasetype=‘postgres’]/.[newer_than_ninetwo = ‘1′ & maj_version =‘9′] |
Parameters
Name | Default value | Description |
---|---|---|
warning_threshold | 1000 | Replication lag in bytes in order to generate warning |
alarm_threshold | 5000 | Replication lag in bytes in order to generate alarm |
Job Summary
- Purpose: The purpose of this job is to monitor the byte lag between the master and slave clients in a PostgreSQL replication setup.
- Why: This job is important because it helps ensure that data replication between master and slave is occurring within acceptable limits to prevent data inconsistency and potential downtime. Monitoring replication lag is crucial for maintaining database availability and performance in distributed database environments.
- Manual checking: You can manually check the replication lag in PostgreSQL by using the following SQL command:
SELECT client_hostname, client_addr, sent_offset - (replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag FROM (SELECT client_hostname, client_addr,('x' || lpad(split_part(sent_lsn::text, '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,('x' || lpad(split_part(replay_lsn::text, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,('x' || lpad(split_part(sent_lsn::text, '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,('x' || lpad(split_part(replay_lsn::text, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offset FROM pg_stat_replication) AS s;
Job Details
- default-schedule: Every 15 minutes
- valid-for: 15 minutes after execution
- version: 1
- company: dbwatch.com
- artifactid: postgres_replication_slave_client
Replication Lag Monitoring Logic
The job leverages both SQL to retrieve data from PostgreSQL using the “pg_stat_replication” view, which provides details like client hostname, client address, and replication lags using log sequence numbers (LSNs). It then uses JavaScript to process these results. The JavaScript logic performs the following checks:
- Retrieves client hostname, address, and byte lag for each replication connection.
- Compares the byte lag against the defined thresholds to set status warnings or alarms:
- If byte lag exceeds the “warning_threshold”, set status to 1 (Warning).
Replication Monitoring Report Generation
Under the dbwatch-report-template, a report named “Replication info” is prepared, which includes details about each replication client such as username, application name, client address, client hostname, backend start time, current state, and synchronization state. Here’s how the resulting report is structured in a tabular format:
Username | Application name | Client address | Client hostname | Backend start | State | Sync state |
---|---|---|---|---|---|---|
data | data | data | data | data | data | data |
This report is generated hourly as specified in the ‘default-schedule’ of the report template, ensuring up-to-date replication monitoring information is available.