Amazon Redshift Publishing
LakeXpress creates Amazon Redshift tables from exported Parquet files stored in S3, supporting both native internal tables and Spectrum external tables.
Table of Contents
- Prerequisites
- Authentication Modes
- Table Types
- Dynamic Naming Patterns
- Usage Examples
- Primary Key Constraints
- Notes and Limitations
Prerequisites
1. Redshift Cluster
A Redshift Provisioned cluster or Redshift Serverless workgroup with:
- A database (default:
dev) - A user with
CREATE SCHEMAandCREATE TABLEprivileges - Security group allowing inbound TCP on port 5439
2. IAM Role
An IAM role that Redshift can assume, with:
- S3 read access on the bucket containing exported Parquet files (required for both internal and external tables)
- AWS Glue catalog access (required only for external/Spectrum tables)
Example trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {"Service": "redshift.amazonaws.com"},
"Action": "sts:AssumeRole"
}
]
}
3. S3 Storage
Both the Redshift cluster and the S3 bucket must be in the same AWS region to avoid cross-region transfer costs.
The S3 credential and the Redshift credential must both be in your credentials.json:
{
"aws_s3": {
"ds_type": "s3",
"auth_mode": "profile",
"info": {
"directory": "s3://my-bucket/lakexpress/",
"profile_name": "lakexpress"
}
},
"redshift_classic": {
"ds_type": "redshift",
"auth_mode": "classic",
"info": {
"host": "my-cluster.eu-west-1.redshift.amazonaws.com",
"port": 5439,
"database": "dev",
"username": "$env{LX_REDSHIFT_USER}",
"password": "$env{LX_REDSHIFT_PASSWORD}",
"iam_role": "arn:aws:iam::123456789012:role/MyRedshiftRole",
"region": "eu-west-1"
}
}
}
Authentication Modes
Classic (Username/Password)
{
"redshift_classic": {
"ds_type": "redshift",
"auth_mode": "classic",
"info": {
"host": "my-cluster.eu-west-1.redshift.amazonaws.com",
"port": 5439,
"database": "dev",
"username": "$env{LX_REDSHIFT_USER}",
"password": "$env{LX_REDSHIFT_PASSWORD}",
"iam_role": "arn:aws:iam::123456789012:role/MyRedshiftRole",
"region": "eu-west-1"
}
}
}
| Field | Required | Description |
|---|---|---|
host | Yes | Cluster endpoint |
port | No | Default: 5439 |
database | No | Default: dev |
username | Yes | Database user |
password | Yes | Database password |
iam_role | Yes | IAM role ARN for S3/Glue access |
region | No | AWS region (default: us-east-1) |
IAM Authentication
{
"redshift_iam": {
"ds_type": "redshift",
"auth_mode": "iam",
"info": {
"host": "my-cluster.eu-west-1.redshift.amazonaws.com",
"port": 5439,
"database": "dev",
"cluster_identifier": "my-cluster",
"iam_role": "arn:aws:iam::123456789012:role/MyRedshiftRole",
"region": "eu-west-1"
}
}
}
| Field | Required | Description |
|---|---|---|
cluster_identifier | Yes | Redshift cluster identifier |
iam_role | Yes | IAM role ARN |
region | No | AWS region (default: us-east-1) |
Table Types
Internal Tables (--publish_method internal)
Creates native Redshift tables and loads data using COPY FROM S3 FORMAT AS PARQUET.
- Data is copied into Redshift storage -- queries are fast but storage costs apply
- The COPY command reads all Parquet files in the S3 directory (including partitioned exports)
- Decimal columns are mapped to
DECIMAL(38,18)to match the Parquet physical layout - Supports
--pk_constraintsfor informational primary key constraints
External Tables (--publish_method external)
Creates Redshift Spectrum tables backed by the AWS Glue Data Catalog.
- Data stays in S3 -- no data movement, no Redshift storage costs
- Requires an IAM role with Glue catalog permissions
- The Glue database is created automatically (
CREATE EXTERNAL DATABASE IF NOT EXISTS) - Decimal columns preserve source precision (e.g.,
DECIMAL(12,2)) - PK constraints are not supported for external tables
Dynamic Naming Patterns
| Token | Description |
|---|---|
{schema} | Source schema name |
{table} | Source table name |
{subpath} | Value of --sub_path |
{date} | Current date (YYYYMMDD) |
Set via --publish_schema_pattern and --publish_table_pattern.
Usage Examples
Internal Tables
LakeXpress config create \
-a credentials.json \
--lxdb_auth_id log_db \
--source_db_auth_id source_db \
--source_db_name MY_DB \
--source_schema_name MY_SCHEMA \
--target_storage_id aws_s3 \
--sub_path mydata \
--publish_target redshift_classic \
--publish_method internal \
--publish_schema_pattern "INT_{subpath}_{date}" \
--publish_table_pattern "{schema}_{table}"
External Tables (Spectrum)
LakeXpress config create \
-a credentials.json \
--lxdb_auth_id log_db \
--source_db_auth_id source_db \
--source_db_name MY_DB \
--source_schema_name MY_SCHEMA \
--target_storage_id aws_s3 \
--sub_path mydata \
--publish_target redshift_classic \
--publish_method external \
--publish_schema_pattern "EXT_{subpath}_{date}" \
--publish_table_pattern "{schema}_{table}"
With Primary Key Constraints
Add --pk_constraints to include primary key definitions in internal table DDL:
LakeXpress config create \
... \
--publish_method internal \
--publish_target redshift_classic \
--pk_constraints
Primary Key Constraints
When --pk_constraints is set, LakeXpress adds PRIMARY KEY constraints to internal table DDL based on the source table's primary key columns.
These constraints are informational only in Redshift -- they are not enforced but are used by the query optimizer for better execution plans.
PK constraints are not supported for external (Spectrum) tables.
Notes and Limitations
- COPY parallelism: The COPY command automatically distributes file reads across Redshift compute slices. Multi-file exports (partitioned tables) benefit from multi-node clusters.
- Decimal mapping: Internal tables use
DECIMAL(38,18)because Redshift COPY requires the column byte size to exactly match the Parquet physical type. External tables preserve source precision. - Region: The Redshift cluster and S3 bucket should be in the same AWS region to avoid cross-region data transfer costs.
- Glue database: For external tables, the
CREATE EXTERNAL SCHEMADDL automatically creates the backing Glue database if it does not exist. - Redshift Serverless: Both Provisioned and Serverless clusters are supported. The connection uses the same host/port/user/password parameters.
- Driver: Requires
redshift_connector. Install withpip install redshift_connector.
See Also
- Compatibility - Full list of supported platforms
- Storage Configuration: S3 - S3 credential setup