BigQuery Publishing
LakeXpress can automatically create BigQuery tables from Parquet files stored in GCS.
Table of Contents
- Prerequisites
- Authentication
- Configuration Options
- Table Types
- Dynamic Naming Patterns
- Usage Examples
Prerequisites
1. Google Cloud Permissions
Required IAM roles:
- BigQuery Data Editor (
roles/bigquery.dataEditor) - Create datasets and tables - BigQuery Job User (
roles/bigquery.jobUser) - Run load jobs (native tables) - Storage Object Viewer (
roles/storage.objectViewer) - Read GCS files (external tables)
Or a custom role with these permissions:
bigquery.datasets.create
bigquery.datasets.get
bigquery.tables.create
bigquery.tables.get
bigquery.tables.delete
bigquery.tables.updateData
bigquery.jobs.create
storage.objects.get
storage.objects.list
2. GCS Storage Setup
Your credentials.json needs both GCS and BigQuery entries:
{
"gcs_datalake": {
"ds_type": "gcs",
"auth_mode": "profile",
"info": {
"directory": "gs://my-datalake-bucket/lakexpress/",
"profile": "/path/to/service-account.json"
}
},
"bigquery_prod": {
"ds_type": "bigquery",
"auth_mode": "profile",
"info": {
"project_id": "my-gcp-project",
"location": "US",
"profile": "/path/to/service-account.json"
}
}
}
The same service account JSON file can be used for both GCS and BigQuery.
Authentication
LakeXpress uses service account authentication for BigQuery.
{
"bigquery_prod": {
"ds_type": "bigquery",
"auth_mode": "profile",
"info": {
"project_id": "my-gcp-project",
"location": "US",
"profile": "/path/to/service-account.json"
}
}
}
| Field | Required | Description |
|---|---|---|
ds_type | Yes | Must be "bigquery" |
auth_mode | Yes | Must be "profile" |
project_id | Yes | GCP project ID where datasets will be created |
location | No | BigQuery dataset location (default: "US") |
profile | Yes | Path to service account JSON key file |
Common locations: US, EU, us-central1, europe-west1.
Creating a Service Account
- Go to Google Cloud Console > IAM & Admin > Service Accounts
- Create a service account with BigQuery and Storage permissions
- Create and download a JSON key
- Reference the key file path in your credentials
Configuration Options
| Option | Description | Default |
|---|---|---|
--publish_target ID | Credential ID for BigQuery target (required) | - |
--publish_schema_pattern PATTERN | BigQuery dataset naming pattern | {schema} |
--publish_table_pattern PATTERN | Table naming pattern | {table} |
--publish_method TYPE | Table type: external or internal | external |
--n_jobs N | Parallel workers for table creation | 1 |
Table Types
External Tables (Default)
Queries data directly from GCS Parquet files without loading into BigQuery storage.
lakexpress ... --publish_target bigquery_prod --publish_method external
- Fast publishing, no data copying
- No BigQuery storage costs
- Always reflects latest GCS data
- Best for large datasets with infrequent queries
- Query speed depends on GCS read performance
- No clustering or partitioning support
Native Tables
Loads data from GCS into BigQuery's columnar storage.
lakexpress ... --publish_target bigquery_prod --publish_method internal
- Faster queries via optimized columnar storage
- Full BigQuery features (clustering, partitioning, materialized views)
- Best for frequently queried data
- Slower publishing due to data loading
- Incurs BigQuery storage costs
- Data is a point-in-time snapshot
Dynamic Naming Patterns
Dataset and table names support token-based patterns.
Supported Tokens
| Token | Description | Example Output |
|---|---|---|
{schema} | Source schema name | tpch_1 |
{table} | Source table name | customer |
{database} | Source database name | tpch |
{date} | Current date (YYYYMMDD) | 20251210 |
{timestamp} | Current timestamp (YYYYMMDD_HHMMSS) | 20251210_143022 |
{uuid} | UUID4 identifier (consistent per run) | a1b2c3d4 |
{subpath} | CLI --sub_path parameter | staging |
Common Patterns
Prefixed Datasets
lakexpress \
--publish_schema_pattern "lx_{schema}" \
--publish_table_pattern "{table}" \
--publish_target bigquery_prod \
...
# Results:
# Dataset: lx_tpch_1
# Tables: customer, orders, lineitem
Date-Partitioned Datasets
lakexpress \
--publish_schema_pattern "{schema}_{date}" \
--publish_table_pattern "{table}" \
--publish_target bigquery_prod \
...
# Results:
# Dataset: tpch_1_20251210
# Tables: customer, orders, lineitem
Consolidated Multi-Schema
lakexpress \
--source_schema_name schema1,schema2 \
--publish_schema_pattern "consolidated" \
--publish_table_pattern "{schema}_{table}" \
--publish_target bigquery_prod \
...
# Results:
# Dataset: consolidated
# Tables: schema1_customer, schema2_customer
Usage Examples
Example 1: External Tables (Default)
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name public \
--target_storage_id gcs_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target bigquery_prod
Creates dataset public with external tables pointing to GCS Parquet files.
Example 2: Native Tables with Parallel Execution
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name tpch_1 \
--target_storage_id gcs_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target bigquery_prod \
--publish_method internal \
--publish_schema_pattern "lx_{schema}" \
--n_jobs 4
Creates dataset lx_tpch_1 with native tables loaded from GCS.
Example 3: Daily Snapshots
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name sales \
--target_storage_id gcs_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target bigquery_prod \
--publish_schema_pattern "sales_{date}" \
--sub_path "daily/$(date +%Y%m%d)"
Creates dataset sales_20251210, data stored in gs://bucket/lakexpress/daily/20251210/.
Example 4: EU Data Residency
{
"bigquery_eu": {
"ds_type": "bigquery",
"auth_mode": "profile",
"info": {
"project_id": "my-eu-project",
"location": "EU",
"profile": "/path/to/service-account.json"
}
}
}
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name gdpr_data \
--target_storage_id gcs_eu \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target bigquery_eu
See Also
- BigQuery Reference - Data type mappings, CLI arguments, querying, troubleshooting
- CLI Reference - All command-line options
- Examples & Recipes - Working command examples