Skip to main content

Intermediate Storage

After extracting data from a source database, LakeXpress stages the results as Parquet files in intermediate storage before publishing to a target platform (such as Snowflake or Databricks). This staging step decouples extraction from publishing: you can extract once and publish to multiple targets, re-publish without re-extracting, or inspect the raw files before they reach the destination.

LakeXpress supports two categories of intermediate storage:

OptionFlagDescription
Local Filesystem--output_dirStage files in a local directory
Cloud Storage--target_storage_idStage files in S3, GCS, Azure Blob Storage, or OneLake

--output_dir and --target_storage_id are mutually exclusive — choose one per pipeline run.

note

Cloud storage (S3, GCS, Azure, OneLake) is a staging location, not a publishing target. Publishing targets such as Snowflake, Databricks, and BigQuery are configured separately with --publish_target. See the Snowflake Publishing Guide for an example of how publishing follows storage.

Sub-Path Option

All backends support --sub_path to insert an intermediate directory level:

./LakeXpress --target_storage_id s3_01 --sub_path staging/daily/2025-01-15 ...

Produces:

s3://bucket/base_path/staging/daily/2025-01-15/schema_name/table_name/*.parquet

Use cases: date partitioning (staging/2025/01/15), environment separation (dev/exports vs prod/exports), project organization (project-a/datasets).

See Also

Copyright © 2026 Architecture & Performance.