Skip to main content

MotherDuck Publishing

LakeXpress creates MotherDuck tables from exported Parquet files in cloud storage (S3, GCS, Azure).

Table of Contents

Prerequisites

1. MotherDuck Account

Sign up at motherduck.com and generate an access token:

  1. Log in to MotherDuck
  2. Go to Settings → Access Tokens
  3. Create a new token
  4. Copy the token for your credentials file

2. Cloud Storage Setup

Configure credentials for cloud storage and MotherDuck:

{
"s3_datalake": {
"ds_type": "s3",
"auth_mode": "profile",
"info": {
"directory": "s3://my-datalake-bucket/lakexpress/",
"profile": "my-aws-profile"
}
},
"motherduck_prod": {
"ds_type": "motherduck",
"auth_mode": "token",
"info": {
"database": "my_analytics",
"token": "your-motherduck-access-token"
}
}
}

3. Storage Access from MotherDuck (Required for External Views)

Important: When using external views (--publish_method external), MotherDuck needs credentials to read your Parquet files from cloud storage. Without this, you'll see errors like:

IO Error: No files found that match the pattern "s3://..."
HTTP Error: Permission error: Missing or invalid credentials

Configure secrets in MotherDuck before querying external views:

For AWS S3 (most common):

CREATE SECRET aws_s3_secret (
TYPE S3,
KEY_ID 'your_aws_access_key_id',
SECRET 'your_aws_secret_access_key',
REGION 'us-east-1' -- your bucket's region
);

To find your AWS credentials:

aws configure get aws_access_key_id
aws configure get aws_secret_access_key
aws configure get region

For GCS:

CREATE SECRET gcs_secret (
TYPE GCS,
KEY_ID 'your_hmac_access_key',
SECRET 'your_hmac_secret_key'
);

For Azure:

CREATE SECRET azure_secret (
TYPE AZURE,
ACCOUNT_NAME 'your_storage_account',
ACCOUNT_KEY 'your_account_key'
);

Verify secrets are configured:

SELECT * FROM duckdb_secrets();

Authentication

Store your token in the credentials file:

{
"motherduck_prod": {
"ds_type": "motherduck",
"auth_mode": "token",
"info": {
"database": "my_analytics",
"token": "your-motherduck-access-token"
}
}
}
FieldRequiredDescription
ds_typeYesMust be "motherduck"
auth_modeYesSet to "token"
databaseYesTarget MotherDuck database name
tokenYesMotherDuck access token

Environment Variable Authentication

Read the token from an environment variable:

{
"motherduck_env": {
"ds_type": "motherduck",
"auth_mode": "env",
"info": {
"database": "my_analytics"
}
}
}
export motherduck_token="your-motherduck-access-token"
lakexpress ...
FieldRequiredDescription
ds_typeYesMust be "motherduck"
auth_modeYesSet to "env"
databaseYesTarget MotherDuck database name

Configuration Options

OptionDescriptionDefault
--publish_target IDCredential ID for MotherDuck target (required)-
--publish_schema_pattern PATTERNSchema naming pattern{schema}
--publish_table_pattern PATTERNTable naming pattern{table}
--publish_method TYPETable type: external or internalexternal
--n_jobs NParallel workers for table creation1

Table Types

External Views (Default)

Queries data directly from cloud storage via read_parquet().

lakexpress ... --publish_target motherduck_prod --publish_method external

Generated SQL:

CREATE OR REPLACE VIEW "my_database"."tpch_1"."orders" AS
SELECT * FROM read_parquet('s3://bucket/exports/tpch_1/orders/*.parquet')

Pros:

  • No data copying, instant publishing
  • No MotherDuck storage costs
  • Always reflects latest cloud storage data
  • Best for large datasets, infrequent queries, exploration

Trade-offs:

  • Query speed depends on cloud storage latency
  • Requires cloud storage credentials in MotherDuck

Internal Tables

Loads data into MotherDuck native columnar storage.

lakexpress ... --publish_target motherduck_prod --publish_method internal

Generated SQL:

CREATE OR REPLACE TABLE "my_database"."tpch_1"."orders" AS
SELECT * FROM read_parquet('s3://bucket/exports/tpch_1/orders/*.parquet')

Pros:

  • Faster queries (optimized columnar format)
  • No storage credentials needed after load
  • Better caching for repeated queries
  • Best for dashboards and production analytics

Trade-offs:

  • Slower publishing (data must load)
  • MotherDuck storage costs apply
  • Data is a point-in-time snapshot

Dynamic Naming Patterns

Schema and table names support token-based patterns.

Supported Tokens

TokenDescriptionExample Output
{schema}Source schema nametpch_1
{table}Source table namecustomer
{database}Source database nametpch
{date}Current date (YYYYMMDD)20251210
{timestamp}Current timestamp (YYYYMMDD_HHMMSS)20251210_143022
{uuid}UUID4 identifier (consistent per run)a1b2c3d4
{subpath}CLI --sub_path parameterstaging

Common Patterns

Prefixed Schemas

lakexpress \
--publish_schema_pattern "lx_{schema}" \
--publish_table_pattern "{table}" \
--publish_target motherduck_prod \
...

# Results:
# Schema: lx_tpch_1
# Tables: customer, orders, lineitem

Date-Partitioned Schemas

lakexpress \
--publish_schema_pattern "{schema}_{date}" \
--publish_table_pattern "{table}" \
--publish_target motherduck_prod \
...

# Results:
# Schema: tpch_1_20251210
# Tables: customer, orders, lineitem

Consolidated Multi-Schema

lakexpress \
--source_schema_name schema1,schema2 \
--publish_schema_pattern "consolidated" \
--publish_table_pattern "{schema}_{table}" \
--publish_target motherduck_prod \
...

# Results:
# Schema: consolidated
# Tables: schema1_customer, schema2_customer

Usage Examples

Example 1: Basic Export with External Views

lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name public \
--target_storage_id s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_prod

Result: Database my_analytics, schema public, external views over S3 Parquet files.

Example 2: Internal Tables with Parallel Execution

lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name tpch_1 \
--target_storage_id s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_prod \
--publish_method internal \
--publish_schema_pattern "lx_{schema}" \
--n_jobs 4

Result: Database my_analytics, schema lx_tpch_1, native tables loaded from S3.

Example 3: Daily Snapshots

lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name sales \
--target_storage_id s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_prod \
--publish_schema_pattern "sales_{date}" \
--sub_path "daily/$(date +%Y%m%d)"

Result: Schema sales_20251210, data at s3://bucket/lakexpress/daily/20251210/.

Example 4: Environment Variable Token

export motherduck_token="your-token-here"

lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name public \
--target_storage_id s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_env

Example 5: GCS Storage Backend

{
"gcs_datalake": {
"ds_type": "gcs",
"auth_mode": "profile",
"info": {
"directory": "gs://my-datalake-bucket/lakexpress/",
"profile": "/path/to/service-account.json"
}
},
"motherduck_prod": {
"ds_type": "motherduck",
"auth_mode": "token",
"info": {
"database": "my_analytics",
"token": "your-motherduck-token"
}
}
}
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name public \
--target_storage_id gcs_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_prod

See Also

Copyright © 2026 Architecture & Performance.