Skip to main content
Version: 0.3

Export Performance

LakeXpress auto-detects the best export method per table. Override when needed using CLI flags or the partition_columns table in the LakeXpress DB.

Export Methods

Without Key Column

MethodDatabaseDescription
NoneAllSequential export (no parallelism)
CtidPostgreSQLTuple ID-based parallel export
RowidOracleRow ID-based parallel export
PhyslocSQL ServerPhysical location-based parallel export

With Key Column

MethodDescriptionKey Column
RandomDistributes using column valuesInteger or bigint
DataDrivenSplits on distinct valuesColumn with distinct values (e.g., year, region)
RangeIdRange partitioning via min/maxNumeric or date column
NtileEven distributionNumeric column
TimepartitionPartitions by date/time column into Hive-style pathsDate/datetime column: (col,year,month)

Timepartition

Splits data into time-based chunks where each parallel thread handles a specific time period. Works with any source database -- ideal for time-series data and large historical tables.

Key column format -- a parenthesized list of a date column followed by granularity levels:

FormatOutput paths
(col, year)year=YYYY/
(col, year, month)year=YYYY/month=MM/
(col, year, month, day)year=YYYY/month=MM/day=DD/
(col, year, week)year=YYYY/week=WW/

Exported files follow Hive-style partitioned paths, e.g. year=2008/month=01/CARRIER_CLAIMS.parquet.

Tuning Parameters

ParameterDefaultDescription
--n_jobs4Tables exported simultaneously
--fastbcp_p-4Parallel threads per large table
--large_table_threshold100000Row count below which tables export sequentially

Parallel degree guidelines (--fastbcp_p):

  • < 100K rows: -4 (default auto-scaled)
  • 100K-10M rows: 2-4
  • 10M-100M rows: 4-8
  • > 100M rows: 8-16

Per-Table CLI Override

--fastbcp_table_config "[schema.]table:method:key_column:parallel_degree"

The schema prefix is optional. When omitted, the config applies to all schemas containing that table (a warning is logged). Multiple tables separated by semicolons. Empty fields use defaults.

# Schema-qualified (targets specific schema)
--fastbcp_table_config "dbo.lineitem:DataDriven:YEAR(l_shipdate):8;dbo.orders:Ctid::4"

# Without schema (applies to all schemas, with warning)
--fastbcp_table_config "lineitem:DataDriven:YEAR(l_shipdate):8;orders:Ctid::4"

# Timepartition -- key column uses parenthesized format
--fastbcp_table_config "CARRIER_CLAIMS:Timepartition:(CLM_FROM_DT,year,month):4"

Per-Table Database Override

Insert into the partition_columns table for persistent configuration:

INSERT INTO partition_columns (
source_db_type, source_database, source_schema, source_table,
fastbcp_method, fastbcp_parallel_degree, fastbcp_distribution_key
) VALUES (
'postgres', 'analytics', 'public', 'events',
'Ctid', 4, NULL
);

Database configuration takes priority over CLI configuration, which takes priority over auto-detection.

Monitoring

SELECT source_table, status, finished_at - started_at AS duration, row_count
FROM jobs
WHERE run_id = 'your-run-id'
ORDER BY duration DESC;
Copyright © 2026 Architecture & Performance.