LakeXpress
What is LakeXpress?
LakeXpress is a CLI tool that exports database tables to partitioned Parquet files and publishes them to cloud data platforms. It uses FastBCP to stream data in parallel without exhausting memory.
Architecture
LakeXpress connects four components in a data pipeline:
Source Database
The database you want to export data from. LakeXpress reads tables and views from the source and converts them to Parquet files.
Supported sources: Oracle, PostgreSQL, SQL Server, MySQL, MariaDB, SAP HANA, Teradata.
Intermediate Storage
Parquet files are written to a storage backend before being registered in a target platform. This can be local disk or cloud object storage.
Supported backends: local filesystem, AWS S3, S3-compatible (MinIO, etc.), Google Cloud Storage, Azure Blob Storage.
Target Platform
Once Parquet files are in storage, LakeXpress registers them as external tables in a cloud data platform or catalog.
Supported platforms: Snowflake, Databricks, Microsoft Fabric, Amazon Redshift, BigQuery, MotherDuck, AWS Glue, DuckLake.
LakeXpress DB
A dedicated database where LakeXpress logs run history, job metadata, and exported file details. This enables tracking, auditing, and resuming failed exports.
Supported databases: PostgreSQL, SQL Server, MySQL, SQLite, DuckDB.
What you'll need
| Component | What to do | Details |
|---|---|---|
| LakeXpress binary | Download and unzip | Installation guide |
| Source database user | Create a read-only user with SELECT on target schemas | Database setup |
| LakeXpress DB | A new or existing database with a user that has full privileges | Database setup |
| Storage destination | A local directory or cloud storage credentials | Storage config |
| Publishing target (optional) | Credentials for Snowflake, Databricks, etc. | Snowflake · Databricks · more... |
Key features
- Cross-platform: Native binaries for Windows and Linux
- Parallel exports: Multiple tables at once, with per-table partitioning
- Incremental sync: Watermark-based delta exports
- Schema filtering: Include/exclude schemas and tables via SQL patterns
- Resume on failure: Pick up where a failed export left off
- CDM metadata: Generate Common Data Model files
How it works
- Configure --
LakeXpress config createdefines your source database, storage target, and optional publishing - Sync --
LakeXpress syncexports tables to Parquet and publishes to your catalog
See the Quick Start Guide for a full walkthrough with real commands.