Databricks Publishing
LakeXpress creates Databricks Unity Catalog tables from exported Parquet files. Supports external tables (data stays in S3) and managed Delta tables (data loaded into Databricks).
Table of Contents
- Prerequisites
- Authentication
- Table Types
- Configuration Options
- Dynamic Naming Patterns
- Usage Examples
Prerequisites
1. Databricks Setup
Required Resources:
- Databricks Workspace with Unity Catalog enabled
- SQL Warehouse (for DDL execution)
- Catalog with CREATE SCHEMA and CREATE TABLE permissions
- Storage credential for S3 access (external tables only)
Required Permissions:
CREATE SCHEMAon the target catalogCREATE TABLEon the target schemaUSE CATALOGon the target catalog- For external tables: access to the external location
2. Storage Credential Setup
For external tables, configure a storage credential for S3 access:
-- In Databricks SQL
CREATE STORAGE CREDENTIAL my_s3_credential
WITH (
AWS_IAM_ROLE = 'arn:aws:iam::123456789012:role/DatabricksS3AccessRole'
);
CREATE EXTERNAL LOCATION my_s3_location
URL 's3://my-datalake-bucket/lakexpress/'
WITH (STORAGE CREDENTIAL my_s3_credential);
3. Credentials Configuration
Both S3 and Databricks credentials are needed in credentials.json:
{
"aws_s3": {
"ds_type": "s3",
"auth_mode": "profile",
"info": {
"directory": "s3://your-bucket-name/path/to/exports",
"profile": "your-aws-profile"
}
},
"databricks": {
"ds_type": "databricks",
"auth_mode": "token",
"info": {
"host": "your-workspace.cloud.databricks.com",
"http_path": "/sql/1.0/warehouses/your-warehouse-id",
"access_token": "your-access-token",
"catalog": "your-catalog"
}
}
}
Authentication
Databricks uses Personal Access Tokens (PAT):
{
"databricks": {
"ds_type": "databricks",
"auth_mode": "token",
"info": {
"host": "your-workspace.cloud.databricks.com",
"http_path": "/sql/1.0/warehouses/your-warehouse-id",
"access_token": "your-access-token",
"catalog": "your-catalog"
}
}
}
Configuration Fields:
| Field | Description | Required |
|---|---|---|
host | Databricks workspace hostname | Yes |
http_path | SQL Warehouse HTTP path | Yes |
access_token | Personal Access Token (PAT) | Yes |
catalog | Target Unity Catalog | Yes |
schema | Default schema | No |
To get the HTTP path:
- Go to Databricks SQL > SQL Warehouses
- Select your warehouse > Connection details
- Copy the HTTP path
To create a PAT:
- Go to User Settings > Developer > Access tokens
- Generate a new token
Table Types
External Tables
Reference Parquet data in S3. Databricks stores only metadata; queries read directly from S3.
- No data loading time or Databricks storage costs
- Data stays in place
- Available right after export
lakexpress -a credentials.json \
--source_db_auth_id postgres_prod \
--target_storage_id aws_s3_datalake \
--publish_target databricks_prod \
--databricks_table_type external # Default
Managed Tables (Delta)
Delta tables loaded into Databricks managed storage via COPY INTO. Enables ACID transactions, time travel, and Z-ordering.
- Faster queries with optimized storage and caching
- Full Delta Lake features
- Better for frequently accessed data
lakexpress -a credentials.json \
--source_db_auth_id postgres_prod \
--target_storage_id aws_s3_datalake \
--publish_target databricks_prod \
--databricks_table_type managed
Configuration Options
| Option | Description | Default |
|---|---|---|
--publish_target ID | Credential ID for Databricks target (required) | - |
--publish_schema_pattern PATTERN | Schema naming pattern | {schema} |
--publish_table_pattern PATTERN | Table naming pattern | {table} |
--databricks_table_type TYPE | external or managed | external |
--databricks_catalog CATALOG | Override catalog from credentials | (from auth) |
--n_jobs N | Parallel workers for table creation | 1 |
Dynamic Naming Patterns
Schema and table names support token-based patterns.
Supported Tokens
| Token | Description | Example Output |
|---|---|---|
{schema} | Source schema name | tpch_1 |
{table} | Source table name | customer |
{database} | Source database name | tpch |
{date} | Current date (YYYYMMDD) | 20251210 |
{timestamp} | Current timestamp (YYYYMMDD_HHMMSS) | 20251210_143022 |
{uuid} | UUID4, consistent per run | a1b2c3d4-... |
{subpath} | CLI --sub_path value | staging |
Common Patterns
Prefixed Schemas
lakexpress \
--publish_schema_pattern "bronze_{schema}" \
--publish_table_pattern "{table}" \
--publish_target databricks_prod \
...
# Result:
# lakexpress_catalog.bronze_tpch_1.customer
# lakexpress_catalog.bronze_tpch_1.orders
Date-Based Schemas
lakexpress \
--publish_schema_pattern "{schema}_{date}" \
--publish_table_pattern "{table}" \
--publish_target databricks_prod \
...
# Result:
# lakexpress_catalog.tpch_1_20251210.customer
Medallion Architecture
# Bronze layer (raw data)
lakexpress \
--publish_schema_pattern "bronze_{schema}" \
--databricks_table_type external \
--publish_target databricks_prod \
...
# Silver layer (curated data)
lakexpress \
--publish_schema_pattern "silver_{schema}" \
--databricks_table_type managed \
--publish_target databricks_prod \
...
Usage Examples
Example 1: External Tables (Data in S3)
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name public \
--target_storage_id aws_s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target databricks_prod \
--databricks_table_type external
Result: External Parquet tables in lakexpress_catalog.public reading from S3.
Example 2: Managed Delta Tables
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name tpch_1 \
--target_storage_id aws_s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target databricks_prod \
--databricks_table_type managed \
--n_jobs 4
Result: Delta tables in lakexpress_catalog.tpch_1 loaded via COPY INTO.
Example 3: Custom Catalog and Schema Pattern
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name sales \
--target_storage_id aws_s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target databricks_prod \
--databricks_catalog analytics_catalog \
--publish_schema_pattern "raw_{schema}" \
--databricks_table_type external
Result: Tables in analytics_catalog.raw_sales (customer, orders, etc.).
Example 4: Parallel Table Creation
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name tpch_1 \
--target_storage_id aws_s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target databricks_prod \
--n_jobs 8
See Also
- Databricks Reference - Data type mappings, CLI arguments, querying, Delta features
- CLI Reference - All command-line options
- Examples & Recipes - Working command examples