Snowflake Publishing
LakeXpress creates Snowflake tables from exported Parquet files, enabling queries on data stored in S3, Azure, or GCS.
Table of Contents
- Prerequisites
- Authentication Modes
- Table Types
- Dynamic Naming Patterns
- Usage Examples
- Views and Customization
- Primary Key Constraints
Prerequisites
1. Snowflake Setup
Required objects:
- Database (e.g.,
LAKEXPRESS_DB) - Warehouse (e.g.,
SNOWFLAKE_LEARNING_WH) - Stage pointing to S3 bucket (e.g.,
AWSS3_AWDW_STAGE)
Required privileges:
USE ROLE ACCOUNTADMIN;
GRANT CREATE SCHEMA ON DATABASE LAKEXPRESS_DB TO ROLE SYSADMIN;
GRANT USAGE ON WAREHOUSE SNOWFLAKE_LEARNING_WH TO ROLE SYSADMIN;
GRANT USAGE ON STAGE LAKEXPRESS_DB.PUBLIC.AWSS3_AWDW_STAGE TO ROLE SYSADMIN;
2. Cloud Storage Setup
Both S3 and Snowflake credentials must be in your credentials.json:
{
"aws_s3_01": {
"ds_type": "s3",
"auth_mode": "profile",
"info": {
"directory": "s3://aetplakexpress/lakexpress/",
"profile": "lakexpress"
}
},
"snowflake_prod": {
"ds_type": "snowflake",
"auth_mode": "password",
"info": {
"account": "tj36405.eu-west-1",
"user": "FRANCOIS.PACULL",
"password": "$env{LX_SNOWFLAKE_PASSWORD}",
"warehouse": "SNOWFLAKE_LEARNING_WH",
"database": "LAKEXPRESS_DB",
"stage": "AWSS3_AWDW_STAGE"
}
}
}
Critical: The S3 directory path must exactly match the Snowflake stage location.
Verify the match:
DESC STAGE LAKEXPRESS_DB.PUBLIC.AWSS3_AWDW_STAGE;
-- URL column shows: s3://aetplakexpress/lakexpress/
-- This MUST match your S3 credentials directory
-- "directory": "s3://aetplakexpress/lakexpress/" ✓ Correct
-- "directory": "s3://aetplakexpress/" ✗ Wrong - files won't be found
Authentication Modes
Password Authentication
{
"snowflake_prod": {
"ds_type": "snowflake",
"auth_mode": "password",
"info": {
"account": "tj36405.eu-west-1",
"user": "FRANCOIS.PACULL",
"password": "$env{LX_SNOWFLAKE_PASSWORD}",
"warehouse": "SNOWFLAKE_LEARNING_WH",
"database": "LAKEXPRESS_DB",
"stage": "AWSS3_AWDW_STAGE"
}
}
}
Programmatic Access Token (PAT) Authentication
Recommended for automation:
{
"snowflake_prod": {
"ds_type": "snowflake",
"auth_mode": "pat",
"info": {
"account": "tj36405.eu-west-1",
"user": "FRANCOIS.PACULL",
"token": "your-personal-access-token",
"warehouse": "SNOWFLAKE_LEARNING_WH",
"database": "LAKEXPRESS_DB",
"stage": "AWSS3_AWDW_STAGE"
}
}
}
PATs require a network policy configured for the user.
PAT benefits:
- Easily rotated, configurable lifetime
- No MFA prompts -- suited for CI/CD pipelines
Table Types
External Tables
Data stays in cloud storage; Snowflake stores only metadata. Views are auto-created for cleaner querying.
- No data loading time or Snowflake storage costs
- Immediate availability after export
- Suited for: data lake integration, infrequent access, exploration, ETL staging
./LakeXpress \
-a data/ds_credentials.json \
--source_db_auth_id ds_04_pg \
--target_storage_id aws_s3_01 \
--publish_target snowflake_prod \
--snowflake_table_type external # Default
Internal Tables
Data is loaded into Snowflake storage with full feature support (clustering, time travel, etc.). Can include PRIMARY KEY constraints from source.
- Faster queries with full Snowflake features
- Maintains PRIMARY KEY constraints
- Suited for: frequently queried data, production warehouses, complex joins
./LakeXpress \
-a data/ds_credentials.json \
--source_db_auth_id ds_04_pg \
--target_storage_id aws_s3_01 \
--publish_target snowflake_prod \
--snowflake_table_type internal \
--snowflake_pk_constraints # Optional: include PK constraints
Dynamic Naming Patterns
Schema and table names support token-based patterns for time-partitioned exports, schema consolidation, or unique identifiers.
Supported Tokens
| Token | Description | Example Output |
|---|---|---|
{schema} | Source schema name | tpch_1 |
{table} | Source table name | customer |
{database} | Source database name | tpch |
{date} | Current date (YYYYMMDD) | 20251120 |
{timestamp} | Current timestamp (YYYYMMDD_HHMMSS) | 20251120_112511 |
{uuid} | UUID4 identifier (consistent per run) | a1b2c3d4-... |
{subpath} | CLI --sub_path parameter | toto |
Note: All timestamps are in local time. The {table} token is mandatory for --publish_table_pattern.
CLI Arguments
--publish_schema_pattern PATTERN # Schema name pattern (default varies by table type)
--publish_table_pattern PATTERN # Table name pattern (must include {table} token)
Default Patterns
| Table Type | Default Schema Pattern | Default Table Pattern |
|---|---|---|
| External | EXT_{schema} | {table} |
| Internal | {schema} | {table} |
Common Patterns
1. Time-Based Partitioning
./LakeXpress \
--publish_schema_pattern "EXT_{schema}_{date}" \
--publish_table_pattern "{table}" \
--publish_target snowflake_prod \
...
# Results:
# Schema: EXT_tpch_1_20251120
# Tables: customer, orders, lineitem
2. Multi-Schema Consolidation
./LakeXpress \
--source_schema_name schema1,schema2,schema3 \
--publish_schema_pattern "{subpath}" \
--publish_table_pattern "{schema}_{table}" \
--sub_path consolidated \
--publish_target snowflake_prod \
...
# Results:
# Schema: consolidated
# Tables: schema1_customer, schema2_customer, schema3_customer
3. Unique Run Identifiers
./LakeXpress \
--publish_schema_pattern "EXT_{schema}_{uuid}" \
--publish_table_pattern "{table}" \
--publish_target snowflake_prod \
...
# Results:
# Schema: EXT_tpch_1_a1b2c3d4-e5f6-7890-abcd-ef1234567890
# Tables: customer, orders, lineitem
Usage Examples
Example 1: External Tables with Custom Patterns
./LakeXpress -a data/ds_credentials.json \
--lxdb_auth_id lxdb_ms \
--source_db_auth_id ds_04_pg \
--source_db_name tpch \
--source_schema_name tpch_1 \
--fastbcp_dir_path /home/francois/Workspace/FastBCP_exe/FastBCP_0_28_3_full/ \
--fastbcp_p 2 \
--n_jobs 4 \
--target_storage_id aws_s3_01 \
--generate_metadata \
--sub_path toto \
--publish_target snowflake_pass \
--publish_schema_pattern "EXT_{subpath}_{date}" \
--publish_table_pattern "{schema}_{table}" \
--snowflake_table_type external
Result:
- Schema:
EXT_toto_20251124 - Tables:
tpch_1_customer,tpch_1_orders, etc. - Views:
V_tpch_1_customer,V_tpch_1_orders, etc.
Example 2: Internal Tables with Primary Keys
./LakeXpress -a data/ds_credentials.json \
--lxdb_auth_id lxdb_ms \
--source_db_auth_id ds_04_pg \
--source_db_name tpch \
--source_schema_name tpch_1 \
--fastbcp_dir_path /home/francois/Workspace/FastBCP_exe/FastBCP_0_28_3_full/ \
--fastbcp_p 2 \
--n_jobs 4 \
--target_storage_id aws_s3_01 \
--generate_metadata \
--sub_path toto \
--snowflake_table_type internal \
--publish_target snowflake_pat \
--snowflake_pk_constraints
Result:
- Schema:
tpch_1 - Tables:
customer,orders, etc. (with data loaded) - PRIMARY KEY constraints applied where available
Example 3: Single Table
./LakeXpress \
-a data/ds_credentials.json \
--lxdb_auth_id lxdb_ms \
--source_db_auth_id ds_04_pg \
--target_storage_id aws_s3_01 \
--source_schema_name tpch_1 \
-i nation \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target snowflake_prod
Example 4: Full Schema
./LakeXpress \
-a data/ds_credentials.json \
--lxdb_auth_id lxdb_ms \
--source_db_auth_id ds_04_pg \
--target_storage_id aws_s3_01 \
--source_schema_name tpch_1 \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target snowflake_prod
Views and Customization
Automatic Views (External Tables Only)
For external tables, LakeXpress auto-creates views that expose typed columns instead of the raw Parquet VALUE column.
-- External table (raw)
SELECT * FROM EXT_TPCH_1.CUSTOMER;
-- Shows: VALUE column (complex structure)
-- View (typed columns)
SELECT * FROM EXT_TPCH_1.V_CUSTOMER;
-- Shows: C_CUSTKEY, C_NAME, C_ADDRESS, etc.
View Options
| Option | Description | Default |
|---|---|---|
--snowflake_no_views | Skip view creation | Views created |
--snowflake_view_prefix PREFIX | Custom view name prefix | V_ |
Custom prefix:
./LakeXpress \
--publish_target snowflake_prod \
--snowflake_table_type external \
--snowflake_view_prefix "VIEW_" \
...
# Tables: customer, orders
# Views: VIEW_customer, VIEW_orders
No views:
./LakeXpress \
--publish_target snowflake_prod \
--snowflake_table_type external \
--snowflake_no_views \
...
# Tables: customer, orders
# Views: None
Refreshing External Tables
External tables must be refreshed after data upload:
ALTER EXTERNAL TABLE EXT_TPCH_1.CUSTOMER REFRESH;
ALTER EXTERNAL TABLE EXT_TPCH_1.LINEITEM REFRESH;
Or via Snowflake CLI:
snow sql -q "ALTER EXTERNAL TABLE EXT_TPCH_1.LINEITEM REFRESH;" -c lakexpress
Primary Key Constraints
Applies to internal tables only. LakeXpress propagates PRIMARY KEY constraints from the source database to Snowflake.
Usage
./LakeXpress \
--publish_target snowflake_prod \
--snowflake_table_type internal \
--snowflake_pk_constraints \
...
How It Works
- Reads PRIMARY KEY metadata from the source database
- Stores PK information in the LakeXpress DB
- Adds PRIMARY KEY constraints to the Snowflake DDL
- Snowflake enforces constraints (query planner benefits)
Requirements
--snowflake_table_type internalonly- Source database must have PRIMARY KEY constraints defined
- LakeXpress DB must have
is_primary_keycolumn (auto-created in v0.1.16+)
Benefits
- Query planner optimizations
- Better execution plans
- Documents data integrity
See Also
- Snowflake Reference - Data type mappings, CLI arguments, deprecated options
- CLI Reference - All command-line options
- Examples & Recipes - Working command examples