BigQuery Reference
Reference material for BigQuery Publishing -- data type mappings, CLI arguments, querying, and troubleshooting.
Data Type Mapping
Source database types are mapped to BigQuery types automatically.
PostgreSQL to BigQuery
| PostgreSQL Type | BigQuery Type |
|---|---|
INTEGER, INT4 | INT64 |
BIGINT, INT8 | INT64 |
SMALLINT, INT2 | INT64 |
NUMERIC(p,s) | NUMERIC |
REAL, FLOAT4 | FLOAT64 |
DOUBLE PRECISION | FLOAT64 |
VARCHAR(n), TEXT | STRING |
DATE | DATE |
TIMESTAMP | TIMESTAMP |
TIMESTAMPTZ | TIMESTAMP |
TIME | TIME |
BOOLEAN | BOOL |
BYTEA | BYTES |
JSON, JSONB | JSON |
UUID | STRING |
SQL Server to BigQuery
| SQL Server Type | BigQuery Type |
|---|---|
INT | INT64 |
BIGINT | INT64 |
SMALLINT | INT64 |
TINYINT | INT64 |
DECIMAL(p,s) | NUMERIC |
MONEY | NUMERIC |
FLOAT | FLOAT64 |
REAL | FLOAT64 |
VARCHAR(n), NVARCHAR(n) | STRING |
DATE | DATE |
DATETIME, DATETIME2 | DATETIME |
DATETIMEOFFSET | TIMESTAMP |
TIME | TIME |
BIT | BOOL |
VARBINARY | BYTES |
UNIQUEIDENTIFIER | STRING |
Oracle to BigQuery
| Oracle Type | BigQuery Type |
|---|---|
NUMBER | NUMERIC |
NUMBER(p,s) | NUMERIC |
FLOAT | FLOAT64 |
VARCHAR2, NVARCHAR2 | STRING |
CLOB, NCLOB | STRING |
DATE | TIMESTAMP |
TIMESTAMP | TIMESTAMP |
RAW, BLOB | BYTES |
CLI Reference
BigQuery Publishing Arguments
| Option | Type | Description |
|---|---|---|
--publish_target ID | String | Credential ID for BigQuery publishing (required) |
--publish_schema_pattern PATTERN | String | Dynamic dataset naming pattern (default: {schema}) |
--publish_table_pattern PATTERN | String | Dynamic table naming pattern (default: {table}) |
--publish_method TYPE | String | Table type: external (default) or internal |
--n_jobs N | Integer | Parallel workers for table creation (default: 1) |
Querying BigQuery Tables
BigQuery Console:
SELECT * FROM `my-project.lx_tpch_1.customer` LIMIT 10;
bq CLI:
bq query --use_legacy_sql=false \
'SELECT * FROM `my-project.lx_tpch_1.customer` LIMIT 10'
Python (google-cloud-bigquery):
from google.cloud import bigquery
client = bigquery.Client()
query = "SELECT * FROM `my-project.lx_tpch_1.customer` LIMIT 10"
df = client.query(query).to_dataframe()
print(df)
pandas-gbq:
import pandas_gbq
query = "SELECT * FROM `my-project.lx_tpch_1.customer` LIMIT 10"
df = pandas_gbq.read_gbq(query, project_id="my-project")
print(df)
Troubleshooting
Common Issues
"Permission denied" errors:
- Verify the service account has BigQuery Data Editor role
- Check GCS bucket access from the service account
- Ensure project ID in credentials matches the target project
"Dataset not found" errors:
- Check that
locationmatches your GCS bucket region - Cross-region access between GCS and BigQuery may cause issues
"Invalid table" errors for external tables:
- Verify GCS path contains valid Parquet files
- Check schema mapping for your data types
Verifying Setup
Test BigQuery connectivity:
lakexpress -a credentials.json \
--lxdb_auth_id lxdb \
--source_db_auth_id postgres_prod \
--source_schema_name public \
--target_storage_id gcs_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target bigquery_prod \
--tables customer \
--dry_run
Validate credentials:
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(
'/path/to/service-account.json'
)
client = bigquery.Client(credentials=credentials, project='my-project')
print(list(client.list_datasets()))
See Also
- BigQuery Publishing - Setup and usage guide
- CLI Reference - All command-line options