MotherDuck Publishing
LakeXpress creates MotherDuck tables from exported Parquet files in cloud storage (S3, GCS, Azure).
Table of Contents
- Prerequisites
- Authentication
- Configuration Options
- Table Types
- Dynamic Naming Patterns
- Usage Examples
Prerequisites
1. MotherDuck Account
Sign up at motherduck.com and generate an access token:
- Log in to MotherDuck
- Go to Settings → Access Tokens
- Create a new token
- Copy the token for your credentials file
2. Cloud Storage Setup
Configure credentials for cloud storage and MotherDuck:
{
"s3_datalake": {
"ds_type": "s3",
"auth_mode": "profile",
"info": {
"directory": "s3://my-datalake-bucket/lakexpress/",
"profile": "my-aws-profile"
}
},
"motherduck_prod": {
"ds_type": "motherduck",
"auth_mode": "token",
"info": {
"database": "my_analytics",
"token": "your-motherduck-access-token"
}
}
}
3. Storage Access from MotherDuck (Required for External Views)
Important: When using external views (--publish_method external), MotherDuck needs credentials to read your Parquet files from cloud storage. Without this, you'll see errors like:
IO Error: No files found that match the pattern "s3://..."
HTTP Error: Permission error: Missing or invalid credentials
Configure secrets in MotherDuck before querying external views:
For AWS S3 (most common):
CREATE SECRET aws_s3_secret (
TYPE S3,
KEY_ID 'your_aws_access_key_id',
SECRET 'your_aws_secret_access_key',
REGION 'us-east-1' -- your bucket's region
);
To find your AWS credentials:
aws configure get aws_access_key_id
aws configure get aws_secret_access_key
aws configure get region
For GCS:
CREATE SECRET gcs_secret (
TYPE GCS,
KEY_ID 'your_hmac_access_key',
SECRET 'your_hmac_secret_key'
);
For Azure:
CREATE SECRET azure_secret (
TYPE AZURE,
ACCOUNT_NAME 'your_storage_account',
ACCOUNT_KEY 'your_account_key'
);
Verify secrets are configured:
SELECT * FROM duckdb_secrets();
Authentication
Token Authentication (Recommended)
Store your token in the credentials file:
{
"motherduck_prod": {
"ds_type": "motherduck",
"auth_mode": "token",
"info": {
"database": "my_analytics",
"token": "your-motherduck-access-token"
}
}
}
| Field | Required | Description |
|---|---|---|
ds_type | Yes | Must be "motherduck" |
auth_mode | Yes | Set to "token" |
database | Yes | Target MotherDuck database name |
token | Yes | MotherDuck access token |
Environment Variable Authentication
Read the token from an environment variable:
{
"motherduck_env": {
"ds_type": "motherduck",
"auth_mode": "env",
"info": {
"database": "my_analytics"
}
}
}
export motherduck_token="your-motherduck-access-token"
lakexpress ...
| Field | Required | Description |
|---|---|---|
ds_type | Yes | Must be "motherduck" |
auth_mode | Yes | Set to "env" |
database | Yes | Target MotherDuck database name |
Configuration Options
| Option | Description | Default |
|---|---|---|
--publish_target ID | Credential ID for MotherDuck target (required) | - |
--publish_schema_pattern PATTERN | Schema naming pattern | {schema} |
--publish_table_pattern PATTERN | Table naming pattern | {table} |
--publish_method TYPE | Table type: external or internal | external |
--n_jobs N | Parallel workers for table creation | 1 |
Table Types
External Views (Default)
Queries data directly from cloud storage via read_parquet().
lakexpress ... --publish_target motherduck_prod --publish_method external
Generated SQL:
CREATE OR REPLACE VIEW "my_database"."tpch_1"."orders" AS
SELECT * FROM read_parquet('s3://bucket/exports/tpch_1/orders/*.parquet')
Pros:
- No data copying, instant publishing
- No MotherDuck storage costs
- Always reflects latest cloud storage data
- Best for large datasets, infrequent queries, exploration
Trade-offs:
- Query speed depends on cloud storage latency
- Requires cloud storage credentials in MotherDuck
Internal Tables
Loads data into MotherDuck native columnar storage.
lakexpress ... --publish_target motherduck_prod --publish_method internal
Generated SQL:
CREATE OR REPLACE TABLE "my_database"."tpch_1"."orders" AS
SELECT * FROM read_parquet('s3://bucket/exports/tpch_1/orders/*.parquet')
Pros:
- Faster queries (optimized columnar format)
- No storage credentials needed after load
- Better caching for repeated queries
- Best for dashboards and production analytics
Trade-offs:
- Slower publishing (data must load)
- MotherDuck storage costs apply
- Data is a point-in-time snapshot
Dynamic Naming Patterns
Schema and table names support token-based patterns.
Supported Tokens
| Token | Description | Example Output |
|---|---|---|
{schema} | Source schema name | tpch_1 |
{table} | Source table name | customer |
{database} | Source database name | tpch |
{date} | Current date (YYYYMMDD) | 20251210 |
{timestamp} | Current timestamp (YYYYMMDD_HHMMSS) | 20251210_143022 |
{uuid} | UUID4 identifier (consistent per run) | a1b2c3d4 |
{subpath} | CLI --sub_path parameter | staging |
Common Patterns
Prefixed Schemas
lakexpress \
--publish_schema_pattern "lx_{schema}" \
--publish_table_pattern "{table}" \
--publish_target motherduck_prod \
...
# Results:
# Schema: lx_tpch_1
# Tables: customer, orders, lineitem
Date-Partitioned Schemas
lakexpress \
--publish_schema_pattern "{schema}_{date}" \
--publish_table_pattern "{table}" \
--publish_target motherduck_prod \
...
# Results:
# Schema: tpch_1_20251210
# Tables: customer, orders, lineitem
Consolidated Multi-Schema
lakexpress \
--source_schema_name schema1,schema2 \
--publish_schema_pattern "consolidated" \
--publish_table_pattern "{schema}_{table}" \
--publish_target motherduck_prod \
...
# Results:
# Schema: consolidated
# Tables: schema1_customer, schema2_customer
Usage Examples
Example 1: Basic Export with External Views
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name public \
--target_storage_id s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_prod
Result: Database my_analytics, schema public, external views over S3 Parquet files.
Example 2: Internal Tables with Parallel Execution
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name tpch_1 \
--target_storage_id s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_prod \
--publish_method internal \
--publish_schema_pattern "lx_{schema}" \
--n_jobs 4
Result: Database my_analytics, schema lx_tpch_1, native tables loaded from S3.
Example 3: Daily Snapshots
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name sales \
--target_storage_id s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_prod \
--publish_schema_pattern "sales_{date}" \
--sub_path "daily/$(date +%Y%m%d)"
Result: Schema sales_20251210, data at s3://bucket/lakexpress/daily/20251210/.
Example 4: Environment Variable Token
export motherduck_token="your-token-here"
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name public \
--target_storage_id s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_env
Example 5: GCS Storage Backend
{
"gcs_datalake": {
"ds_type": "gcs",
"auth_mode": "profile",
"info": {
"directory": "gs://my-datalake-bucket/lakexpress/",
"profile": "/path/to/service-account.json"
}
},
"motherduck_prod": {
"ds_type": "motherduck",
"auth_mode": "token",
"info": {
"database": "my_analytics",
"token": "your-motherduck-token"
}
}
}
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name public \
--target_storage_id gcs_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_prod
See Also
- MotherDuck Reference - Data type mappings, CLI arguments, querying, troubleshooting
- CLI Reference - All command-line options
- Examples & Recipes - Working command examples