Skip to main content

Watermarks

Watermark Tracking

LakeXpress maintains watermarks in the LakeXpress DB:

First Sync (Full Load)

Table: orders
Column: o_orderdate
Watermark: NULL -> 2025-12-31 (highest date in table)
Records exported: 1,500,000

Second Sync (Incremental)

Query: SELECT * FROM orders WHERE o_orderdate > 2025-12-24 - 1 hour
Expected records: 50,000 (new orders in last week)
Watermark updated: 2025-12-31 -> 2026-01-05

Safety Lag

The --incremental_safety_lag parameter handles late-arriving data:

./LakeXpress config create \
... \
--incremental_table "events.raw_events:event_timestamp:datetime" \
--incremental_safety_lag 3600 \
...
  • --incremental_safety_lag INT - Lag in seconds (default: 0)

Example with 1-hour lag:

Current time: 2025-01-08 14:00:00
Watermark: 2025-01-08 10:00:00
Query includes: WHERE event_timestamp > 2025-01-08 09:00:00
(1 hour before watermark)

When to use:

  • Asynchronous systems with delayed writes
  • Multi-region databases with replication lag
  • Event streams with out-of-order processing
  • Financial transactions with settlement delays

Querying Watermarks

Inspect tracked watermarks by querying the LakeXpress DB:

-- View all incremental configurations
SELECT
sync_id,
config_name,
source_table,
incremental_column,
last_watermark,
updated_at
FROM sync_configurations
WHERE is_incremental = true
ORDER BY updated_at DESC;

-- View recent watermark updates
SELECT
run_id,
sync_id,
source_table,
previous_watermark,
new_watermark,
rows_exported,
started_at,
completed_at
FROM incremental_watermarks
ORDER BY completed_at DESC
LIMIT 10;

Resetting Watermarks

To do a full reload:

# Option 1: Delete and recreate the configuration
./LakeXpress config delete \
-a credentials.json \
--lxdb_auth_id lxdb_postgres \
--sync_id 20251208-xxxxx

# Then create a new one
./LakeXpress config create ...

# Option 2: Override watermark on next sync
./LakeXpress sync --reset-watermarks

See Also

Copyright © 2026 Architecture & Performance.