Incremental Sync
Sync only changed data using watermark-based incremental exports.
What is Incremental Sync?
LakeXpress tracks a "high watermark" (the highest value in a timestamp or numeric column) and only exports rows above that watermark on subsequent syncs, instead of exporting entire tables every time.
When to Use Incremental Sync
Use incremental sync when:
- Regularly updated tables: Source tables receive frequent inserts or updates
- Frequent syncs: Exports run multiple times per day or on a schedule
- Cost efficiency: You want to minimize network and compute usage
- Time-series data: Tables have timestamp columns tracking record creation or modification
Examples:
- Daily sales order updates to a data lake
- Event log aggregation from production systems
- Time-series metrics collection
- Transaction processing pipelines
Supported Column Types
| Type | Format | Example | Use Case |
|---|---|---|---|
date | YYYY-MM-DD | 2025-01-08 | Daily transactions, order dates |
datetime | YYYY-MM-DD HH:MM:SS | 2025-01-08 14:30:25 | Precise time tracking |
timestamp | Database timestamp type | 2025-01-08T14:30:25Z | Creation/modification times |
integer | Numeric sequence | 1000001 | Monotonic ID columns, batch IDs |
Configuration Syntax
Define incremental tables with --incremental_table:
./LakeXpress config create \
... \
--incremental_table "schema.table:column:type"
Basic Syntax
schema.table:column:type
Parameters:
schema.table- Fully qualified table namecolumn- Column to track for watermark (should be indexed)type- Column type:date,datetime,timestamp, orinteger
Example:
--incremental_table "sales.orders:created_date:date"
Multiple Tables
Repeat the --incremental_table flag for each table:
./LakeXpress config create \
... \
--incremental_table "sales.orders:created_date:date" \
--incremental_table "sales.returns:return_date:date"
Advanced Syntax (Optional)
schema.table:column:type[:direction][@start_value][!strategy]
Extended Parameters:
:direction- Include (:i, default) or exclude (:e)@start_value- Override the initial watermark value!strategy- Loading strategy:append(default) orupsert
Examples:
# Include direction (explicit)
--incremental_table "sales.orders:created_date:date:i"
# Exclude from incremental sync
--incremental_table "sales.returns:return_date:date:e"
# Set initial watermark
--incremental_table "sales.orders:created_date:date@2025-01-01"
# Use UPSERT/MERGE strategy (updates existing rows)
--incremental_table "sales.orders:created_date:date!upsert"
# Combined: direction + start value + upsert
--incremental_table "sales.orders:created_date:date:i@2025-01-01!upsert"
Non-incremental Tables
Tables not configured with --incremental_table are fully exported on each sync. Useful for small dimension or reference tables.
./LakeXpress config create \
... \
--incremental_table "fact.sales:sale_date:date" \
--n_jobs 4
fact.sales- Exports only new records since last watermark- All other tables in the schema - Fully exported on each sync
See Also
- Loading Strategies - Append vs upsert strategies
- Watermarks - Watermark tracking, safety lag, querying, and resetting
- Examples - Complete step-by-step examples and real-world scenarios
- Troubleshooting - Common issues, performance tips, and advanced topics
- Quick Start Guide - First export walkthrough
- CLI Reference - All available options
- Examples & Recipes - Real-world usage examples
- Intermediate Storage - Cloud storage configuration