Skip to main content

Snowflake Publishing

LakeXpress creates Snowflake tables from exported Parquet files, enabling queries on data stored in S3, Azure, or GCS.

Table of Contents

Prerequisites

1. Snowflake Setup

Required objects:

  • Database (e.g., LAKEXPRESS_DB)
  • Warehouse (e.g., SNOWFLAKE_LEARNING_WH)
  • Stage pointing to S3 bucket (e.g., AWSS3_AWDW_STAGE)

Required privileges:

USE ROLE ACCOUNTADMIN;
GRANT CREATE SCHEMA ON DATABASE LAKEXPRESS_DB TO ROLE SYSADMIN;
GRANT USAGE ON WAREHOUSE SNOWFLAKE_LEARNING_WH TO ROLE SYSADMIN;
GRANT USAGE ON STAGE LAKEXPRESS_DB.PUBLIC.AWSS3_AWDW_STAGE TO ROLE SYSADMIN;

2. Cloud Storage Setup

Both S3 and Snowflake credentials must be in your credentials.json:

{
"aws_s3_01": {
"ds_type": "s3",
"auth_mode": "profile",
"info": {
"directory": "s3://aetplakexpress/lakexpress/",
"profile": "lakexpress"
}
},
"snowflake_prod": {
"ds_type": "snowflake",
"auth_mode": "password",
"info": {
"account": "tj36405.eu-west-1",
"user": "FRANCOIS.PACULL",
"password": "$env{LX_SNOWFLAKE_PASSWORD}",
"warehouse": "SNOWFLAKE_LEARNING_WH",
"database": "LAKEXPRESS_DB",
"stage": "AWSS3_AWDW_STAGE"
}
}
}

Critical: The S3 directory path must exactly match the Snowflake stage location.

Verify the match:

DESC STAGE LAKEXPRESS_DB.PUBLIC.AWSS3_AWDW_STAGE;
-- URL column shows: s3://aetplakexpress/lakexpress/

-- This MUST match your S3 credentials directory
-- "directory": "s3://aetplakexpress/lakexpress/" ✓ Correct
-- "directory": "s3://aetplakexpress/" ✗ Wrong - files won't be found

Authentication Modes

Password Authentication

{
"snowflake_prod": {
"ds_type": "snowflake",
"auth_mode": "password",
"info": {
"account": "tj36405.eu-west-1",
"user": "FRANCOIS.PACULL",
"password": "$env{LX_SNOWFLAKE_PASSWORD}",
"warehouse": "SNOWFLAKE_LEARNING_WH",
"database": "LAKEXPRESS_DB",
"stage": "AWSS3_AWDW_STAGE"
}
}
}

Programmatic Access Token (PAT) Authentication

Recommended for automation:

{
"snowflake_prod": {
"ds_type": "snowflake",
"auth_mode": "pat",
"info": {
"account": "tj36405.eu-west-1",
"user": "FRANCOIS.PACULL",
"token": "your-personal-access-token",
"warehouse": "SNOWFLAKE_LEARNING_WH",
"database": "LAKEXPRESS_DB",
"stage": "AWSS3_AWDW_STAGE"
}
}
}

PATs require a network policy configured for the user.

PAT benefits:

  • Easily rotated, configurable lifetime
  • No MFA prompts -- suited for CI/CD pipelines

Table Types

External Tables

Data stays in cloud storage; Snowflake stores only metadata. Views are auto-created for cleaner querying.

  • No data loading time or Snowflake storage costs
  • Immediate availability after export
  • Suited for: data lake integration, infrequent access, exploration, ETL staging
./LakeXpress \
-a data/ds_credentials.json \
--source_db_auth_id ds_04_pg \
--target_storage_id aws_s3_01 \
--publish_target snowflake_prod \
--snowflake_table_type external # Default

Internal Tables

Data is loaded into Snowflake storage with full feature support (clustering, time travel, etc.). Can include PRIMARY KEY constraints from source.

  • Faster queries with full Snowflake features
  • Maintains PRIMARY KEY constraints
  • Suited for: frequently queried data, production warehouses, complex joins
./LakeXpress \
-a data/ds_credentials.json \
--source_db_auth_id ds_04_pg \
--target_storage_id aws_s3_01 \
--publish_target snowflake_prod \
--snowflake_table_type internal \
--snowflake_pk_constraints # Optional: include PK constraints

Dynamic Naming Patterns

Schema and table names support token-based patterns for time-partitioned exports, schema consolidation, or unique identifiers.

Supported Tokens

TokenDescriptionExample Output
{schema}Source schema nametpch_1
{table}Source table namecustomer
{database}Source database nametpch
{date}Current date (YYYYMMDD)20251120
{timestamp}Current timestamp (YYYYMMDD_HHMMSS)20251120_112511
{uuid}UUID4 identifier (consistent per run)a1b2c3d4-...
{subpath}CLI --sub_path parametertoto

Note: All timestamps are in local time. The {table} token is mandatory for --publish_table_pattern.

CLI Arguments

--publish_schema_pattern PATTERN    # Schema name pattern (default varies by table type)
--publish_table_pattern PATTERN # Table name pattern (must include {table} token)

Default Patterns

Table TypeDefault Schema PatternDefault Table Pattern
ExternalEXT_{schema}{table}
Internal{schema}{table}

Common Patterns

1. Time-Based Partitioning

./LakeXpress \
--publish_schema_pattern "EXT_{schema}_{date}" \
--publish_table_pattern "{table}" \
--publish_target snowflake_prod \
...

# Results:
# Schema: EXT_tpch_1_20251120
# Tables: customer, orders, lineitem

2. Multi-Schema Consolidation

./LakeXpress \
--source_schema_name schema1,schema2,schema3 \
--publish_schema_pattern "{subpath}" \
--publish_table_pattern "{schema}_{table}" \
--sub_path consolidated \
--publish_target snowflake_prod \
...

# Results:
# Schema: consolidated
# Tables: schema1_customer, schema2_customer, schema3_customer

3. Unique Run Identifiers

./LakeXpress \
--publish_schema_pattern "EXT_{schema}_{uuid}" \
--publish_table_pattern "{table}" \
--publish_target snowflake_prod \
...

# Results:
# Schema: EXT_tpch_1_a1b2c3d4-e5f6-7890-abcd-ef1234567890
# Tables: customer, orders, lineitem

Usage Examples

Example 1: External Tables with Custom Patterns

./LakeXpress -a data/ds_credentials.json \
--lxdb_auth_id lxdb_ms \
--source_db_auth_id ds_04_pg \
--source_db_name tpch \
--source_schema_name tpch_1 \
--fastbcp_dir_path /home/francois/Workspace/FastBCP_exe/FastBCP_0_28_3_full/ \
--fastbcp_p 2 \
--n_jobs 4 \
--target_storage_id aws_s3_01 \
--generate_metadata \
--sub_path toto \
--publish_target snowflake_pass \
--publish_schema_pattern "EXT_{subpath}_{date}" \
--publish_table_pattern "{schema}_{table}" \
--snowflake_table_type external

Result:

  • Schema: EXT_toto_20251124
  • Tables: tpch_1_customer, tpch_1_orders, etc.
  • Views: V_tpch_1_customer, V_tpch_1_orders, etc.

Example 2: Internal Tables with Primary Keys

./LakeXpress -a data/ds_credentials.json \
--lxdb_auth_id lxdb_ms \
--source_db_auth_id ds_04_pg \
--source_db_name tpch \
--source_schema_name tpch_1 \
--fastbcp_dir_path /home/francois/Workspace/FastBCP_exe/FastBCP_0_28_3_full/ \
--fastbcp_p 2 \
--n_jobs 4 \
--target_storage_id aws_s3_01 \
--generate_metadata \
--sub_path toto \
--snowflake_table_type internal \
--publish_target snowflake_pat \
--snowflake_pk_constraints

Result:

  • Schema: tpch_1
  • Tables: customer, orders, etc. (with data loaded)
  • PRIMARY KEY constraints applied where available

Example 3: Single Table

./LakeXpress \
-a data/ds_credentials.json \
--lxdb_auth_id lxdb_ms \
--source_db_auth_id ds_04_pg \
--target_storage_id aws_s3_01 \
--source_schema_name tpch_1 \
-i nation \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target snowflake_prod

Example 4: Full Schema

./LakeXpress \
-a data/ds_credentials.json \
--lxdb_auth_id lxdb_ms \
--source_db_auth_id ds_04_pg \
--target_storage_id aws_s3_01 \
--source_schema_name tpch_1 \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target snowflake_prod

Views and Customization

Automatic Views (External Tables Only)

For external tables, LakeXpress auto-creates views that expose typed columns instead of the raw Parquet VALUE column.

-- External table (raw)
SELECT * FROM EXT_TPCH_1.CUSTOMER;
-- Shows: VALUE column (complex structure)

-- View (typed columns)
SELECT * FROM EXT_TPCH_1.V_CUSTOMER;
-- Shows: C_CUSTKEY, C_NAME, C_ADDRESS, etc.

View Options

OptionDescriptionDefault
--snowflake_no_viewsSkip view creationViews created
--snowflake_view_prefix PREFIXCustom view name prefixV_

Custom prefix:

./LakeXpress \
--publish_target snowflake_prod \
--snowflake_table_type external \
--snowflake_view_prefix "VIEW_" \
...

# Tables: customer, orders
# Views: VIEW_customer, VIEW_orders

No views:

./LakeXpress \
--publish_target snowflake_prod \
--snowflake_table_type external \
--snowflake_no_views \
...

# Tables: customer, orders
# Views: None

Refreshing External Tables

External tables must be refreshed after data upload:

ALTER EXTERNAL TABLE EXT_TPCH_1.CUSTOMER REFRESH;
ALTER EXTERNAL TABLE EXT_TPCH_1.LINEITEM REFRESH;

Or via Snowflake CLI:

snow sql -q "ALTER EXTERNAL TABLE EXT_TPCH_1.LINEITEM REFRESH;" -c lakexpress

Primary Key Constraints

Applies to internal tables only. LakeXpress propagates PRIMARY KEY constraints from the source database to Snowflake.

Usage

./LakeXpress \
--publish_target snowflake_prod \
--snowflake_table_type internal \
--snowflake_pk_constraints \
...

How It Works

  1. Reads PRIMARY KEY metadata from the source database
  2. Stores PK information in the LakeXpress DB
  3. Adds PRIMARY KEY constraints to the Snowflake DDL
  4. Snowflake enforces constraints (query planner benefits)

Requirements

  • --snowflake_table_type internal only
  • Source database must have PRIMARY KEY constraints defined
  • LakeXpress DB must have is_primary_key column (auto-created in v0.1.16+)

Benefits

  • Query planner optimizations
  • Better execution plans
  • Documents data integrity

See Also

Copyright © 2026 Architecture & Performance.