Version: 0.4 (Latest)

MotherDuck Publishing

LakeXpress creates MotherDuck tables from exported Parquet files in cloud storage (S3, GCS, Azure).

Prerequisites
Authentication
Configuration Options
Table Types
Dynamic Naming Patterns
Usage Examples

Prerequisites

1. MotherDuck Account

Log in to MotherDuck
Go to Settings → Access Tokens
Create a new token
Copy the token for your credentials file

2. Cloud Storage Setup

Configure credentials for cloud storage and MotherDuck:

{
    "s3_datalake": {
        "ds_type": "s3",
        "auth_mode": "profile",
        "info": {
            "directory": "s3://my-datalake-bucket/lakexpress/",
            "profile": "my-aws-profile"
        }
    },
    "motherduck_prod": {
        "ds_type": "motherduck",
        "auth_mode": "token",
        "info": {
            "database": "my_analytics",
            "token": "your-motherduck-access-token"
        }
    }
}

3. Storage Access from MotherDuck (Required for External Views)

Important: When using external views (--publish_method external), MotherDuck needs credentials to read your Parquet files from cloud storage. Without this, you'll see errors like:

IO Error: No files found that match the pattern "s3://..."
HTTP Error: Permission error: Missing or invalid credentials

Configure secrets in MotherDuck before querying external views:

For AWS S3 (most common):

CREATE SECRET aws_s3_secret (
    TYPE S3,
    KEY_ID 'your_aws_access_key_id',
    SECRET 'your_aws_secret_access_key',
    REGION 'us-east-1'  -- your bucket's region
);

To find your AWS credentials:

aws configure get aws_access_key_id
aws configure get aws_secret_access_key
aws configure get region

For GCS:

CREATE SECRET gcs_secret (
    TYPE GCS,
    KEY_ID 'your_hmac_access_key',
    SECRET 'your_hmac_secret_key'
);

For Azure:

CREATE SECRET azure_secret (
    TYPE AZURE,
    ACCOUNT_NAME 'your_storage_account',
    ACCOUNT_KEY 'your_account_key'
);

Verify secrets are configured:

SELECT * FROM duckdb_secrets();

Authentication

Token Authentication (Recommended)

Store your token in the credentials file:

{
    "motherduck_prod": {
        "ds_type": "motherduck",
        "auth_mode": "token",
        "info": {
            "database": "my_analytics",
            "token": "your-motherduck-access-token"
        }
    }
}

Field	Required	Description
`ds_type`	Yes	Must be `"motherduck"`
`auth_mode`	Yes	Set to `"token"`
`database`	Yes	Target MotherDuck database name
`token`	Yes	MotherDuck access token

Environment Variable Authentication

Read the token from an environment variable:

{
    "motherduck_env": {
        "ds_type": "motherduck",
        "auth_mode": "env",
        "info": {
            "database": "my_analytics"
        }
    }
}

export motherduck_token="your-motherduck-access-token"
lakexpress ...

Field	Required	Description
`ds_type`	Yes	Must be `"motherduck"`
`auth_mode`	Yes	Set to `"env"`
`database`	Yes	Target MotherDuck database name

Configuration Options

Option	Description	Default
`--publish_target ID`	Credential ID for MotherDuck target (required)	-
`--publish_schema_pattern PATTERN`	Schema naming pattern	`{schema}`
`--publish_table_pattern PATTERN`	Table naming pattern	`{table}`
`--publish_method TYPE`	Table type: `external` or `internal`	`external`
`--n_jobs N`	Parallel workers for table creation	`1`

Table Types

External Views (Default)

Queries data directly from cloud storage via read_parquet().

lakexpress ... --publish_target motherduck_prod --publish_method external

Generated SQL:

CREATE OR REPLACE VIEW "my_database"."tpch_1"."orders" AS
SELECT * FROM read_parquet('s3://bucket/exports/tpch_1/orders/*.parquet')

Pros:

No data copying, instant publishing
No MotherDuck storage costs
Always reflects latest cloud storage data
Best for large datasets, infrequent queries, exploration

Trade-offs:

Query speed depends on cloud storage latency
Requires cloud storage credentials in MotherDuck

Internal Tables

Loads data into MotherDuck native columnar storage.

lakexpress ... --publish_target motherduck_prod --publish_method internal

Generated SQL:

CREATE OR REPLACE TABLE "my_database"."tpch_1"."orders" AS
SELECT * FROM read_parquet('s3://bucket/exports/tpch_1/orders/*.parquet')

Pros:

Faster queries (optimized columnar format)
No storage credentials needed after load
Better caching for repeated queries
Best for dashboards and production analytics

Trade-offs:

Slower publishing (data must load)
MotherDuck storage costs apply
Data is a point-in-time snapshot

Dynamic Naming Patterns

Schema and table names support token-based patterns.

Supported Tokens

Token	Description	Example Output
`{schema}`	Source schema name	`tpch_1`
`{table}`	Source table name	`customer`
`{database}`	Source database name	`tpch`
`{date}`	Current date (YYYYMMDD)	`20251210`
`{timestamp}`	Current timestamp (YYYYMMDD_HHMMSS)	`20251210_143022`
`{uuid}`	UUID4 identifier (consistent per run)	`a1b2c3d4`
`{subpath}`	CLI `--sub_path` parameter	`staging`

Common Patterns

Prefixed Schemas

lakexpress \
  --publish_schema_pattern "lx_{schema}" \
  --publish_table_pattern "{table}" \
  --publish_target motherduck_prod \
  ...

# Results:
# Schema: lx_tpch_1
# Tables: customer, orders, lineitem

Date-Partitioned Schemas

lakexpress \
  --publish_schema_pattern "{schema}_{date}" \
  --publish_table_pattern "{table}" \
  --publish_target motherduck_prod \
  ...

# Results:
# Schema: tpch_1_20251210
# Tables: customer, orders, lineitem

Consolidated Multi-Schema

lakexpress \
  --source_schema_name schema1,schema2 \
  --publish_schema_pattern "consolidated" \
  --publish_table_pattern "{schema}_{table}" \
  --publish_target motherduck_prod \
  ...

# Results:
# Schema: consolidated
# Tables: schema1_customer, schema2_customer

Usage Examples

Example 1: Basic Export with External Views

lakexpress -a credentials.json \
    --lxdb_auth_id lxdb \
    --source_db_auth_id postgres_prod \
    --source_schema_name public \
    --target_storage_id s3_datalake \
    --fastbcp_dir_path /path/to/FastBCP \
    --publish_target motherduck_prod

Result: Database my_analytics, schema public, external views over S3 Parquet files.

Example 2: Internal Tables with Parallel Execution

lakexpress -a credentials.json \
    --lxdb_auth_id lxdb \
    --source_db_auth_id postgres_prod \
    --source_schema_name tpch_1 \
    --target_storage_id s3_datalake \
    --fastbcp_dir_path /path/to/FastBCP \
    --publish_target motherduck_prod \
    --publish_method internal \
    --publish_schema_pattern "lx_{schema}" \
    --n_jobs 4

Result: Database my_analytics, schema lx_tpch_1, native tables loaded from S3.

Example 3: Daily Snapshots

lakexpress -a credentials.json \
    --lxdb_auth_id lxdb \
    --source_db_auth_id postgres_prod \
    --source_schema_name sales \
    --target_storage_id s3_datalake \
    --fastbcp_dir_path /path/to/FastBCP \
    --publish_target motherduck_prod \
    --publish_schema_pattern "sales_{date}" \
    --sub_path "daily/$(date +%Y%m%d)"

Result: Schema sales_20251210, data at s3://bucket/lakexpress/daily/20251210/.

Example 4: Environment Variable Token

export motherduck_token="your-token-here"

lakexpress -a credentials.json \
    --lxdb_auth_id lxdb \
    --source_db_auth_id postgres_prod \
    --source_schema_name public \
    --target_storage_id s3_datalake \
    --fastbcp_dir_path /path/to/FastBCP \
    --publish_target motherduck_env

Example 5: GCS Storage Backend

{
    "gcs_datalake": {
        "ds_type": "gcs",
        "auth_mode": "profile",
        "info": {
            "directory": "gs://my-datalake-bucket/lakexpress/",
            "profile": "/path/to/service-account.json"
        }
    },
    "motherduck_prod": {
        "ds_type": "motherduck",
        "auth_mode": "token",
        "info": {
            "database": "my_analytics",
            "token": "your-motherduck-token"
        }
    }
}

lakexpress -a credentials.json \
    --lxdb_auth_id lxdb \
    --source_db_auth_id postgres_prod \
    --source_schema_name public \
    --target_storage_id gcs_datalake \
    --fastbcp_dir_path /path/to/FastBCP \
    --publish_target motherduck_prod

MotherDuck Publishing

Table of Contents

Prerequisites

1. MotherDuck Account

2. Cloud Storage Setup

3. Storage Access from MotherDuck (Required for External Views)

Authentication

Token Authentication (Recommended)

Environment Variable Authentication

Configuration Options

Table Types

External Views (Default)

Internal Tables

Dynamic Naming Patterns

Supported Tokens

Common Patterns

Prefixed Schemas

Date-Partitioned Schemas

Consolidated Multi-Schema

Usage Examples

Example 1: Basic Export with External Views

Example 2: Internal Tables with Parallel Execution

Example 3: Daily Snapshots

Example 4: Environment Variable Token

Example 5: GCS Storage Backend

See Also

Table of Contents​

Prerequisites​

1. MotherDuck Account​

2. Cloud Storage Setup​

3. Storage Access from MotherDuck (Required for External Views)​

Authentication​

Token Authentication (Recommended)​

Environment Variable Authentication​

Configuration Options​

Table Types​

External Views (Default)​

Internal Tables​

Dynamic Naming Patterns​

Supported Tokens​

Common Patterns​

Prefixed Schemas​

Date-Partitioned Schemas​

Consolidated Multi-Schema​

Usage Examples​

Example 1: Basic Export with External Views​

Example 2: Internal Tables with Parallel Execution​

Example 3: Daily Snapshots​

Example 4: Environment Variable Token​

Example 5: GCS Storage Backend​

See Also​

Table of Contents

Prerequisites

1. MotherDuck Account

2. Cloud Storage Setup

3. Storage Access from MotherDuck (Required for External Views)

Authentication

Token Authentication (Recommended)

Environment Variable Authentication

Configuration Options

Table Types

External Views (Default)

Internal Tables

Dynamic Naming Patterns

Supported Tokens

Common Patterns

Prefixed Schemas

Date-Partitioned Schemas

Consolidated Multi-Schema

Usage Examples

Example 1: Basic Export with External Views

Example 2: Internal Tables with Parallel Execution

Example 3: Daily Snapshots

Example 4: Environment Variable Token

Example 5: GCS Storage Backend

See Also