Intermediate Storage
After extracting data from a source database, LakeXpress stages the results as Parquet files in intermediate storage before publishing to a target platform (such as Snowflake or Databricks). This staging step decouples extraction from publishing: you can extract once and publish to multiple targets, re-publish without re-extracting, or inspect the raw files before they reach the destination.
LakeXpress supports two categories of intermediate storage:
| Option | Flag | Description |
|---|---|---|
| Local Filesystem | --output_dir | Stage files in a local directory |
| Cloud Storage | --target_storage_id | Stage files in S3, GCS, Azure Blob Storage, or OneLake |
--output_dir and --target_storage_id are mutually exclusive — choose one per pipeline run.
Cloud storage (S3, GCS, Azure, OneLake) is a staging location, not a publishing target. Publishing targets such as Snowflake, Databricks, and BigQuery are configured separately with --publish_target. See the Snowflake Publishing Guide for an example of how publishing follows storage.
Sub-Path Option
All backends support --sub_path to insert an intermediate directory level:
./LakeXpress --target_storage_id s3_01 --sub_path staging/daily/2025-01-15 ...
Produces:
s3://bucket/base_path/staging/daily/2025-01-15/schema_name/table_name/*.parquet
Use cases: date partitioning (staging/2025/01/15), environment separation (dev/exports vs prod/exports), project organization (project-a/datasets).