Replication slave client alert


Job details

Name: Replication slave client alert
Platform: Postgres
Category: Cluster
Description: Checks for BDR replication lag
Long description:
Version: 1
Default schedule: 15m
Requires engine install: No
Compatibility tag: .[type=‘instance’ & databasetype=‘postgres’]/.[newer_than_ninetwo = ‘1′ & maj_version =‘9′]

Parameters

Name Default value Description
warning_threshold 1000 Replication lag in bytes in order to generate warning
alarm_threshold 5000 Replication lag in bytes in order to generate alarm

Job Summary

SELECT client_hostname, client_addr, sent_offset - (replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag FROM (SELECT client_hostname, client_addr,('x' || lpad(split_part(sent_lsn::text, '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,('x' || lpad(split_part(replay_lsn::text, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,('x' || lpad(split_part(sent_lsn::text, '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,('x' || lpad(split_part(replay_lsn::text, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offset FROM pg_stat_replication) AS s;

Job Details

Replication Lag Monitoring Logic

The job leverages both SQL to retrieve data from PostgreSQL using the “pg_stat_replication” view, which provides details like client hostname, client address, and replication lags using log sequence numbers (LSNs). It then uses JavaScript to process these results. The JavaScript logic performs the following checks:

Replication Monitoring Report Generation

Under the dbwatch-report-template, a report named “Replication info” is prepared, which includes details about each replication client such as username, application name, client address, client hostname, backend start time, current state, and synchronization state. Here’s how the resulting report is structured in a tabular format:

Username Application name Client address Client hostname Backend start State Sync state
data data data data data data data

This report is generated hourly as specified in the ‘default-schedule’ of the report template, ensuring up-to-date replication monitoring information is available.