Skip to main content

Incremental Sync

Sync only changed data using watermark-based incremental exports.

What is Incremental Sync?

LakeXpress tracks a "high watermark" (the highest value in a timestamp or numeric column) and only exports rows above that watermark on subsequent syncs, instead of exporting entire tables every time.

When to Use Incremental Sync

Use incremental sync when:

  • Regularly updated tables: Source tables receive frequent inserts or updates
  • Frequent syncs: Exports run multiple times per day or on a schedule
  • Cost efficiency: You want to minimize network and compute usage
  • Time-series data: Tables have timestamp columns tracking record creation or modification

Examples:

  • Daily sales order updates to a data lake
  • Event log aggregation from production systems
  • Time-series metrics collection
  • Transaction processing pipelines

Supported Column Types

TypeFormatExampleUse Case
dateYYYY-MM-DD2025-01-08Daily transactions, order dates
datetimeYYYY-MM-DD HH:MM:SS2025-01-08 14:30:25Precise time tracking
timestampDatabase timestamp type2025-01-08T14:30:25ZCreation/modification times
integerNumeric sequence1000001Monotonic ID columns, batch IDs

Configuration Syntax

Define incremental tables with --incremental_table:

./LakeXpress config create \
... \
--incremental_table "schema.table:column:type"

Basic Syntax

schema.table:column:type

Parameters:

  • schema.table - Fully qualified table name
  • column - Column to track for watermark (should be indexed)
  • type - Column type: date, datetime, timestamp, or integer

Example:

--incremental_table "sales.orders:created_date:date"

Multiple Tables

Repeat the --incremental_table flag for each table:

./LakeXpress config create \
... \
--incremental_table "sales.orders:created_date:date" \
--incremental_table "sales.returns:return_date:date"

Advanced Syntax (Optional)

schema.table:column:type[:direction][@start_value][!strategy]

Extended Parameters:

  • :direction - Include (:i, default) or exclude (:e)
  • @start_value - Override the initial watermark value
  • !strategy - Loading strategy: append (default) or upsert

Examples:

# Include direction (explicit)
--incremental_table "sales.orders:created_date:date:i"

# Exclude from incremental sync
--incremental_table "sales.returns:return_date:date:e"

# Set initial watermark
--incremental_table "sales.orders:created_date:date@2025-01-01"

# Use UPSERT/MERGE strategy (updates existing rows)
--incremental_table "sales.orders:created_date:date!upsert"

# Combined: direction + start value + upsert
--incremental_table "sales.orders:created_date:date:i@2025-01-01!upsert"

Non-incremental Tables

Tables not configured with --incremental_table are fully exported on each sync. Useful for small dimension or reference tables.

./LakeXpress config create \
... \
--incremental_table "fact.sales:sale_date:date" \
--n_jobs 4
  • fact.sales - Exports only new records since last watermark
  • All other tables in the schema - Fully exported on each sync

See Also

Copyright © 2026 Architecture & Performance.