Version: 0.4 (Latest)

BigQuery Publishing

LakeXpress can automatically create BigQuery tables from Parquet files stored in GCS.

Prerequisites
Authentication
Configuration Options
Table Types
Dynamic Naming Patterns
Usage Examples

Prerequisites

1. Google Cloud Permissions

Required IAM roles:

BigQuery Data Editor (roles/bigquery.dataEditor) - Create datasets and tables
BigQuery Job User (roles/bigquery.jobUser) - Run load jobs (native tables)
Storage Object Viewer (roles/storage.objectViewer) - Read GCS files (external tables)

Or a custom role with these permissions:

bigquery.datasets.create
bigquery.datasets.get
bigquery.tables.create
bigquery.tables.get
bigquery.tables.delete
bigquery.tables.updateData
bigquery.jobs.create
storage.objects.get
storage.objects.list

2. GCS Storage Setup

Your credentials.json needs both GCS and BigQuery entries:

{
    "gcs_datalake": {
        "ds_type": "gcs",
        "auth_mode": "profile",
        "info": {
            "directory": "gs://my-datalake-bucket/lakexpress/",
            "profile": "/path/to/service-account.json"
        }
    },
    "bigquery_prod": {
        "ds_type": "bigquery",
        "auth_mode": "profile",
        "info": {
            "project_id": "my-gcp-project",
            "location": "US",
            "profile": "/path/to/service-account.json"
        }
    }
}

The same service account JSON file can be used for both GCS and BigQuery.

Authentication

LakeXpress uses service account authentication for BigQuery.

{
    "bigquery_prod": {
        "ds_type": "bigquery",
        "auth_mode": "profile",
        "info": {
            "project_id": "my-gcp-project",
            "location": "US",
            "profile": "/path/to/service-account.json"
        }
    }
}

Field	Required	Description
`ds_type`	Yes	Must be `"bigquery"`
`auth_mode`	Yes	Must be `"profile"`
`project_id`	Yes	GCP project ID where datasets will be created
`location`	No	BigQuery dataset location (default: `"US"`)
`profile`	Yes	Path to service account JSON key file

Common locations: US, EU, us-central1, europe-west1.

Creating a Service Account

Go to Google Cloud Console > IAM & Admin > Service Accounts
Create a service account with BigQuery and Storage permissions
Create and download a JSON key
Reference the key file path in your credentials

Configuration Options

Option	Description	Default
`--publish_target ID`	Credential ID for BigQuery target (required)	-
`--publish_schema_pattern PATTERN`	BigQuery dataset naming pattern	`{schema}`
`--publish_table_pattern PATTERN`	Table naming pattern	`{table}`
`--publish_method TYPE`	Table type: `external` or `internal`	`external`
`--n_jobs N`	Parallel workers for table creation	`1`

Table Types

External Tables (Default)

Queries data directly from GCS Parquet files without loading into BigQuery storage.

lakexpress ... --publish_target bigquery_prod --publish_method external

Fast publishing, no data copying
No BigQuery storage costs
Always reflects latest GCS data
Best for large datasets with infrequent queries
Query speed depends on GCS read performance
No clustering or partitioning support

Native Tables

Loads data from GCS into BigQuery's columnar storage.

lakexpress ... --publish_target bigquery_prod --publish_method internal

Faster queries via optimized columnar storage
Full BigQuery features (clustering, partitioning, materialized views)
Best for frequently queried data
Slower publishing due to data loading
Incurs BigQuery storage costs
Data is a point-in-time snapshot

Dynamic Naming Patterns

Dataset and table names support token-based patterns.

Supported Tokens

Token	Description	Example Output
`{schema}`	Source schema name	`tpch_1`
`{table}`	Source table name	`customer`
`{database}`	Source database name	`tpch`
`{date}`	Current date (YYYYMMDD)	`20251210`
`{timestamp}`	Current timestamp (YYYYMMDD_HHMMSS)	`20251210_143022`
`{uuid}`	UUID4 identifier (consistent per run)	`a1b2c3d4`
`{subpath}`	CLI `--sub_path` parameter	`staging`

Common Patterns

Prefixed Datasets

lakexpress \
  --publish_schema_pattern "lx_{schema}" \
  --publish_table_pattern "{table}" \
  --publish_target bigquery_prod \
  ...

# Results:
# Dataset: lx_tpch_1
# Tables: customer, orders, lineitem

Date-Partitioned Datasets

lakexpress \
  --publish_schema_pattern "{schema}_{date}" \
  --publish_table_pattern "{table}" \
  --publish_target bigquery_prod \
  ...

# Results:
# Dataset: tpch_1_20251210
# Tables: customer, orders, lineitem

Consolidated Multi-Schema

lakexpress \
  --source_schema_name schema1,schema2 \
  --publish_schema_pattern "consolidated" \
  --publish_table_pattern "{schema}_{table}" \
  --publish_target bigquery_prod \
  ...

# Results:
# Dataset: consolidated
# Tables: schema1_customer, schema2_customer

Usage Examples

Example 1: External Tables (Default)

lakexpress -a credentials.json \
    --lxdb_auth_id lxdb \
    --source_db_auth_id postgres_prod \
    --source_schema_name public \
    --target_storage_id gcs_datalake \
    --fastbcp_dir_path /path/to/FastBCP \
    --publish_target bigquery_prod

Creates dataset public with external tables pointing to GCS Parquet files.

Example 2: Native Tables with Parallel Execution

lakexpress -a credentials.json \
    --lxdb_auth_id lxdb \
    --source_db_auth_id postgres_prod \
    --source_schema_name tpch_1 \
    --target_storage_id gcs_datalake \
    --fastbcp_dir_path /path/to/FastBCP \
    --publish_target bigquery_prod \
    --publish_method internal \
    --publish_schema_pattern "lx_{schema}" \
    --n_jobs 4

Creates dataset lx_tpch_1 with native tables loaded from GCS.

Example 3: Daily Snapshots

lakexpress -a credentials.json \
    --lxdb_auth_id lxdb \
    --source_db_auth_id postgres_prod \
    --source_schema_name sales \
    --target_storage_id gcs_datalake \
    --fastbcp_dir_path /path/to/FastBCP \
    --publish_target bigquery_prod \
    --publish_schema_pattern "sales_{date}" \
    --sub_path "daily/$(date +%Y%m%d)"

Creates dataset sales_20251210, data stored in gs://bucket/lakexpress/daily/20251210/.

Example 4: EU Data Residency

{
    "bigquery_eu": {
        "ds_type": "bigquery",
        "auth_mode": "profile",
        "info": {
            "project_id": "my-eu-project",
            "location": "EU",
            "profile": "/path/to/service-account.json"
        }
    }
}

lakexpress -a credentials.json \
    --lxdb_auth_id lxdb \
    --source_db_auth_id postgres_prod \
    --source_schema_name gdpr_data \
    --target_storage_id gcs_eu \
    --fastbcp_dir_path /path/to/FastBCP \
    --publish_target bigquery_eu

BigQuery Publishing

Table of Contents

Prerequisites

1. Google Cloud Permissions

2. GCS Storage Setup

Authentication

Creating a Service Account

Configuration Options

Table Types

External Tables (Default)

Native Tables

Dynamic Naming Patterns

Supported Tokens

Common Patterns

Prefixed Datasets

Date-Partitioned Datasets

Consolidated Multi-Schema

Usage Examples

Example 1: External Tables (Default)

Example 2: Native Tables with Parallel Execution

Example 3: Daily Snapshots

Example 4: EU Data Residency

See Also

Table of Contents​

Prerequisites​

1. Google Cloud Permissions​

2. GCS Storage Setup​

Authentication​

Creating a Service Account​

Configuration Options​

Table Types​

External Tables (Default)​

Native Tables​

Dynamic Naming Patterns​

Supported Tokens​

Common Patterns​

Prefixed Datasets​

Date-Partitioned Datasets​

Consolidated Multi-Schema​

Usage Examples​

Example 1: External Tables (Default)​

Example 2: Native Tables with Parallel Execution​

Example 3: Daily Snapshots​

Example 4: EU Data Residency​

See Also​

Table of Contents

Prerequisites

1. Google Cloud Permissions

2. GCS Storage Setup

Authentication

Creating a Service Account

Configuration Options

Table Types

External Tables (Default)

Native Tables

Dynamic Naming Patterns

Supported Tokens

Common Patterns

Prefixed Datasets

Date-Partitioned Datasets

Consolidated Multi-Schema

Usage Examples

Example 1: External Tables (Default)

Example 2: Native Tables with Parallel Execution

Example 3: Daily Snapshots

Example 4: EU Data Residency

See Also