Troubleshooting
Missing Recent Data
Symptom: Data added to source appears 2-3 syncs later.
Solution: Reduce --incremental_safety_lag:
# Current: 1-hour lag
--incremental_safety_lag 3600
# Reduced: 10-minute lag
--incremental_safety_lag 600
Duplicate Rows After Resync
Symptom: Running the same sync twice produces duplicates.
Solutions:
- Use upsert strategy (recommended): Configure with
!upsertto use MERGE:
--incremental_table "sales.orders:created_date:date!upsert"
-
Deduplicate in queries: If using append mode, use ROW_NUMBER() or QUALIFY at query time.
-
Use external tables: External tables over Parquet files replace files on each sync rather than inserting rows.
Watermark Not Advancing
Symptom: Watermark stays at the same value across syncs.
Solution: Verify the incremental column has new values:
SELECT MAX(created_at) FROM orders;
-- If unchanged, the watermark is correct
Performance Considerations
Column Selection
Choose the incremental column carefully:
-- GOOD: Indexed column, never updates
--incremental_table "sales.orders:order_id:integer"
-- GOOD: Updated on insert only
--incremental_table "sales.orders:created_date:date"
-- CAUTION: May need resync if updated
--incremental_table "sales.orders:updated_at:datetime"
-- AVOID: May miss updates
--incremental_table "sales.orders:order_amount:integer"
Indexing
Create an index on the watermark column:
CREATE INDEX idx_orders_created_at ON orders(created_at);
Combining Incremental and Full Exports
./LakeXpress config create \
... \
--incremental_table "fact.transactions:txn_date:date" \
--incremental_safety_lag 3600 \
--n_jobs 4 \
--fastbcp_p 8
- Fact tables configured with
--incremental_tablesync incrementally - All other tables in the schema are fully exported (useful for dimension tables)
Multiple Watermark Columns
Track changes in multiple columns with separate configurations:
# Configuration 1: Track by create date
lakexpress config create \
... \
--incremental_table "transactions:created_at:datetime"
# Configuration 2: Track by update date
lakexpress config create \
... \
--incremental_table "transactions:updated_at:datetime"
Run both syncs to capture creates and updates separately.
See Also
- Incremental Sync Overview - Configuration syntax and supported column types
- Loading Strategies - Append vs upsert strategies
- Watermarks - Watermark tracking, safety lag, and resetting