Skip to main content

Amazon Redshift Publishing

LakeXpress creates Amazon Redshift tables from exported Parquet files stored in S3, supporting both native internal tables and Spectrum external tables.

Table of Contents

Prerequisites

1. Redshift Cluster

A Redshift Provisioned cluster or Redshift Serverless workgroup with:

  • A database (default: dev)
  • A user with CREATE SCHEMA and CREATE TABLE privileges
  • Security group allowing inbound TCP on port 5439

2. IAM Role

An IAM role that Redshift can assume, with:

  • S3 read access on the bucket containing exported Parquet files (required for both internal and external tables)
  • AWS Glue catalog access (required only for external/Spectrum tables)

Example trust policy:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {"Service": "redshift.amazonaws.com"},
"Action": "sts:AssumeRole"
}
]
}

3. S3 Storage

Both the Redshift cluster and the S3 bucket must be in the same AWS region to avoid cross-region transfer costs.

The S3 credential and the Redshift credential must both be in your credentials.json:

{
"aws_s3": {
"ds_type": "s3",
"auth_mode": "profile",
"info": {
"directory": "s3://my-bucket/lakexpress/",
"profile_name": "lakexpress"
}
},
"redshift_classic": {
"ds_type": "redshift",
"auth_mode": "classic",
"info": {
"host": "my-cluster.eu-west-1.redshift.amazonaws.com",
"port": 5439,
"database": "dev",
"username": "$env{LX_REDSHIFT_USER}",
"password": "$env{LX_REDSHIFT_PASSWORD}",
"iam_role": "arn:aws:iam::123456789012:role/MyRedshiftRole",
"region": "eu-west-1"
}
}
}

Authentication Modes

Classic (Username/Password)

{
"redshift_classic": {
"ds_type": "redshift",
"auth_mode": "classic",
"info": {
"host": "my-cluster.eu-west-1.redshift.amazonaws.com",
"port": 5439,
"database": "dev",
"username": "$env{LX_REDSHIFT_USER}",
"password": "$env{LX_REDSHIFT_PASSWORD}",
"iam_role": "arn:aws:iam::123456789012:role/MyRedshiftRole",
"region": "eu-west-1"
}
}
}
FieldRequiredDescription
hostYesCluster endpoint
portNoDefault: 5439
databaseNoDefault: dev
usernameYesDatabase user
passwordYesDatabase password
iam_roleYesIAM role ARN for S3/Glue access
regionNoAWS region (default: us-east-1)

IAM Authentication

{
"redshift_iam": {
"ds_type": "redshift",
"auth_mode": "iam",
"info": {
"host": "my-cluster.eu-west-1.redshift.amazonaws.com",
"port": 5439,
"database": "dev",
"cluster_identifier": "my-cluster",
"iam_role": "arn:aws:iam::123456789012:role/MyRedshiftRole",
"region": "eu-west-1"
}
}
}
FieldRequiredDescription
cluster_identifierYesRedshift cluster identifier
iam_roleYesIAM role ARN
regionNoAWS region (default: us-east-1)

Table Types

Internal Tables (--publish_method internal)

Creates native Redshift tables and loads data using COPY FROM S3 FORMAT AS PARQUET.

  • Data is copied into Redshift storage -- queries are fast but storage costs apply
  • The COPY command reads all Parquet files in the S3 directory (including partitioned exports)
  • Decimal columns are mapped to DECIMAL(38,18) to match the Parquet physical layout
  • Supports --pk_constraints for informational primary key constraints

External Tables (--publish_method external)

Creates Redshift Spectrum tables backed by the AWS Glue Data Catalog.

  • Data stays in S3 -- no data movement, no Redshift storage costs
  • Requires an IAM role with Glue catalog permissions
  • The Glue database is created automatically (CREATE EXTERNAL DATABASE IF NOT EXISTS)
  • Decimal columns preserve source precision (e.g., DECIMAL(12,2))
  • PK constraints are not supported for external tables

Dynamic Naming Patterns

TokenDescription
{schema}Source schema name
{table}Source table name
{subpath}Value of --sub_path
{date}Current date (YYYYMMDD)

Set via --publish_schema_pattern and --publish_table_pattern.

Usage Examples

Internal Tables

LakeXpress config create \
-a credentials.json \
--lxdb_auth_id log_db \
--source_db_auth_id source_db \
--source_db_name MY_DB \
--source_schema_name MY_SCHEMA \
--target_storage_id aws_s3 \
--sub_path mydata \
--publish_target redshift_classic \
--publish_method internal \
--publish_schema_pattern "INT_{subpath}_{date}" \
--publish_table_pattern "{schema}_{table}"

External Tables (Spectrum)

LakeXpress config create \
-a credentials.json \
--lxdb_auth_id log_db \
--source_db_auth_id source_db \
--source_db_name MY_DB \
--source_schema_name MY_SCHEMA \
--target_storage_id aws_s3 \
--sub_path mydata \
--publish_target redshift_classic \
--publish_method external \
--publish_schema_pattern "EXT_{subpath}_{date}" \
--publish_table_pattern "{schema}_{table}"

With Primary Key Constraints

Add --pk_constraints to include primary key definitions in internal table DDL:

LakeXpress config create \
... \
--publish_method internal \
--publish_target redshift_classic \
--pk_constraints

Primary Key Constraints

When --pk_constraints is set, LakeXpress adds PRIMARY KEY constraints to internal table DDL based on the source table's primary key columns.

These constraints are informational only in Redshift -- they are not enforced but are used by the query optimizer for better execution plans.

PK constraints are not supported for external (Spectrum) tables.

Notes and Limitations

  • COPY parallelism: The COPY command automatically distributes file reads across Redshift compute slices. Multi-file exports (partitioned tables) benefit from multi-node clusters.
  • Decimal mapping: Internal tables use DECIMAL(38,18) because Redshift COPY requires the column byte size to exactly match the Parquet physical type. External tables preserve source precision.
  • Region: The Redshift cluster and S3 bucket should be in the same AWS region to avoid cross-region data transfer costs.
  • Glue database: For external tables, the CREATE EXTERNAL SCHEMA DDL automatically creates the backing Glue database if it does not exist.
  • Redshift Serverless: Both Provisioned and Serverless clusters are supported. The connection uses the same host/port/user/password parameters.
  • Driver: Requires redshift_connector. Install with pip install redshift_connector.

See Also

Copyright © 2026 Architecture & Performance.