Skip to main content

Databricks Publishing

LakeXpress creates Databricks Unity Catalog tables from exported Parquet files. Supports external tables (data stays in S3) and managed Delta tables (data loaded into Databricks).

Table of Contents

Prerequisites

1. Databricks Setup

Required Resources:

  • Databricks Workspace with Unity Catalog enabled
  • SQL Warehouse (for DDL execution)
  • Catalog with CREATE SCHEMA and CREATE TABLE permissions
  • Storage credential for S3 access (external tables only)

Required Permissions:

  • CREATE SCHEMA on the target catalog
  • CREATE TABLE on the target schema
  • USE CATALOG on the target catalog
  • For external tables: access to the external location

2. Storage Credential Setup

For external tables, configure a storage credential for S3 access:

-- In Databricks SQL
CREATE STORAGE CREDENTIAL my_s3_credential
WITH (
AWS_IAM_ROLE = 'arn:aws:iam::123456789012:role/DatabricksS3AccessRole'
);

CREATE EXTERNAL LOCATION my_s3_location
URL 's3://my-datalake-bucket/lakexpress/'
WITH (STORAGE CREDENTIAL my_s3_credential);

3. Credentials Configuration

Both S3 and Databricks credentials are needed in credentials.json:

{
"aws_s3": {
"ds_type": "s3",
"auth_mode": "profile",
"info": {
"directory": "s3://your-bucket-name/path/to/exports",
"profile": "your-aws-profile"
}
},
"databricks": {
"ds_type": "databricks",
"auth_mode": "token",
"info": {
"host": "your-workspace.cloud.databricks.com",
"http_path": "/sql/1.0/warehouses/your-warehouse-id",
"access_token": "your-access-token",
"catalog": "your-catalog"
}
}
}

Authentication

Databricks uses Personal Access Tokens (PAT):

{
"databricks": {
"ds_type": "databricks",
"auth_mode": "token",
"info": {
"host": "your-workspace.cloud.databricks.com",
"http_path": "/sql/1.0/warehouses/your-warehouse-id",
"access_token": "your-access-token",
"catalog": "your-catalog"
}
}
}

Configuration Fields:

FieldDescriptionRequired
hostDatabricks workspace hostnameYes
http_pathSQL Warehouse HTTP pathYes
access_tokenPersonal Access Token (PAT)Yes
catalogTarget Unity CatalogYes
schemaDefault schemaNo

To get the HTTP path:

  1. Go to Databricks SQL > SQL Warehouses
  2. Select your warehouse > Connection details
  3. Copy the HTTP path

To create a PAT:

  1. Go to User Settings > Developer > Access tokens
  2. Generate a new token

Table Types

External Tables

Reference Parquet data in S3. Databricks stores only metadata; queries read directly from S3.

  • No data loading time or Databricks storage costs
  • Data stays in place
  • Available right after export
lakexpress -a credentials.json \
--source_db_auth_id postgres_prod \
--target_storage_id aws_s3_datalake \
--publish_target databricks_prod \
--databricks_table_type external # Default

Managed Tables (Delta)

Delta tables loaded into Databricks managed storage via COPY INTO. Enables ACID transactions, time travel, and Z-ordering.

  • Faster queries with optimized storage and caching
  • Full Delta Lake features
  • Better for frequently accessed data
lakexpress -a credentials.json \
--source_db_auth_id postgres_prod \
--target_storage_id aws_s3_datalake \
--publish_target databricks_prod \
--databricks_table_type managed

Configuration Options

OptionDescriptionDefault
--publish_target IDCredential ID for Databricks target (required)-
--publish_schema_pattern PATTERNSchema naming pattern{schema}
--publish_table_pattern PATTERNTable naming pattern{table}
--databricks_table_type TYPEexternal or managedexternal
--databricks_catalog CATALOGOverride catalog from credentials(from auth)
--n_jobs NParallel workers for table creation1

Dynamic Naming Patterns

Schema and table names support token-based patterns.

Supported Tokens

TokenDescriptionExample Output
{schema}Source schema nametpch_1
{table}Source table namecustomer
{database}Source database nametpch
{date}Current date (YYYYMMDD)20251210
{timestamp}Current timestamp (YYYYMMDD_HHMMSS)20251210_143022
{uuid}UUID4, consistent per runa1b2c3d4-...
{subpath}CLI --sub_path valuestaging

Common Patterns

Prefixed Schemas

lakexpress \
--publish_schema_pattern "bronze_{schema}" \
--publish_table_pattern "{table}" \
--publish_target databricks_prod \
...

# Result:
# lakexpress_catalog.bronze_tpch_1.customer
# lakexpress_catalog.bronze_tpch_1.orders

Date-Based Schemas

lakexpress \
--publish_schema_pattern "{schema}_{date}" \
--publish_table_pattern "{table}" \
--publish_target databricks_prod \
...

# Result:
# lakexpress_catalog.tpch_1_20251210.customer

Medallion Architecture

# Bronze layer (raw data)
lakexpress \
--publish_schema_pattern "bronze_{schema}" \
--databricks_table_type external \
--publish_target databricks_prod \
...

# Silver layer (curated data)
lakexpress \
--publish_schema_pattern "silver_{schema}" \
--databricks_table_type managed \
--publish_target databricks_prod \
...

Usage Examples

Example 1: External Tables (Data in S3)

lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name public \
--target_storage_id aws_s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target databricks_prod \
--databricks_table_type external

Result: External Parquet tables in lakexpress_catalog.public reading from S3.

Example 2: Managed Delta Tables

lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name tpch_1 \
--target_storage_id aws_s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target databricks_prod \
--databricks_table_type managed \
--n_jobs 4

Result: Delta tables in lakexpress_catalog.tpch_1 loaded via COPY INTO.

Example 3: Custom Catalog and Schema Pattern

lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name sales \
--target_storage_id aws_s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target databricks_prod \
--databricks_catalog analytics_catalog \
--publish_schema_pattern "raw_{schema}" \
--databricks_table_type external

Result: Tables in analytics_catalog.raw_sales (customer, orders, etc.).

Example 4: Parallel Table Creation

lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name tpch_1 \
--target_storage_id aws_s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target databricks_prod \
--n_jobs 8

See Also

Copyright © 2026 Architecture & Performance.