Usage Guide¶
This guide covers how to use MCP BigQuery for SQL validation and dry-run analysis.
SQL Validation¶
The bq_validate_sql
tool validates BigQuery SQL syntax without executing queries.
Basic Validation¶
Valid Response:
Invalid SQL Example:
Error Response:
{
"isValid": false,
"error": {
"code": "INVALID_SQL",
"message": "Syntax error: Unexpected ',' at [1:12]",
"location": {
"line": 1,
"column": 12
}
}
}
Common Syntax Errors¶
Error | Example | Fix |
---|---|---|
Missing FROM | SELECT * users | Add FROM keyword |
Invalid identifier | SELECT user-name | Use backticks: `user-name` |
Unclosed string | WHERE name = 'John | Close the string: 'John' |
Reserved keyword | SELECT select FROM t | Use backticks: `select` |
Dry-Run Analysis¶
The bq_dry_run_sql
tool provides cost estimates and metadata without executing queries.
Cost Estimation¶
{
"tool": "bq_dry_run_sql",
"arguments": {
"sql": "SELECT * FROM `bigquery-public-data.samples.shakespeare`",
"pricePerTiB": 5.0
}
}
Response:
{
"totalBytesProcessed": 2654199,
"usdEstimate": 0.013,
"referencedTables": [
{
"project": "bigquery-public-data",
"dataset": "samples",
"table": "shakespeare"
}
],
"schemaPreview": [
{"name": "word", "type": "STRING", "mode": "NULLABLE"},
{"name": "word_count", "type": "INT64", "mode": "NULLABLE"},
{"name": "corpus", "type": "STRING", "mode": "NULLABLE"},
{"name": "corpus_date", "type": "INT64", "mode": "NULLABLE"}
]
}
Cost Optimization Tips¶
-
Use partitioned tables - Filter by partition column to reduce scan
-
Select only needed columns - Avoid
SELECT *
-
Use clustering - Filter by clustered columns for efficiency
Using Parameters¶
Both tools support parameterized queries for safe, reusable SQL.
Basic Parameters¶
{
"tool": "bq_validate_sql",
"arguments": {
"sql": "SELECT * FROM orders WHERE date = @date AND status = @status",
"params": {
"date": "2024-01-01",
"status": "completed"
}
}
}
Array Parameters¶
{
"tool": "bq_validate_sql",
"arguments": {
"sql": "SELECT * FROM products WHERE category IN UNNEST(@categories)",
"params": {
"categories": ["electronics", "books", "clothing"]
}
}
}
Struct Parameters¶
{
"tool": "bq_validate_sql",
"arguments": {
"sql": "SELECT * FROM users WHERE address.city = @filter.city",
"params": {
"filter": {
"city": "New York",
"state": "NY"
}
}
}
}
Note: In dry-run mode, all parameters are treated as STRING type.
Advanced Queries¶
CTEs and Window Functions¶
WITH monthly_sales AS (
SELECT
DATE_TRUNC(sale_date, MONTH) as month,
SUM(amount) as total_sales
FROM sales
GROUP BY month
)
SELECT
month,
total_sales,
LAG(total_sales) OVER (ORDER BY month) as previous_month,
total_sales - LAG(total_sales) OVER (ORDER BY month) as growth
FROM monthly_sales
ORDER BY month DESC
MERGE Statements¶
MERGE target_table T
USING source_table S
ON T.id = S.id
WHEN MATCHED THEN
UPDATE SET T.value = S.value, T.updated_at = CURRENT_TIMESTAMP()
WHEN NOT MATCHED THEN
INSERT (id, value, created_at)
VALUES (S.id, S.value, CURRENT_TIMESTAMP())
DDL Validation¶
CREATE OR REPLACE TABLE dataset.new_table
PARTITION BY DATE(created_at)
CLUSTER BY user_id, region
AS
SELECT * FROM dataset.source_table
WHERE created_at >= '2024-01-01'
Working with Different SQL Types¶
Standard SQL (Default)¶
{
"tool": "bq_validate_sql",
"arguments": {
"sql": "SELECT * FROM `project.dataset.table` WHERE date > '2024-01-01'"
}
}
DML Operations¶
-- INSERT
INSERT INTO table_name (col1, col2)
VALUES ('value1', 'value2')
-- UPDATE
UPDATE table_name
SET column1 = 'new_value'
WHERE condition
-- DELETE
DELETE FROM table_name
WHERE date < '2024-01-01'
Scripting¶
DECLARE x INT64 DEFAULT 0;
DECLARE y STRING;
SET x = 5;
SET y = 'Hello';
IF x > 0 THEN
SELECT CONCAT(y, ' World') as greeting;
END IF;
Error Handling¶
Understanding Error Responses¶
All errors follow this structure:
{
"error": {
"code": "INVALID_SQL",
"message": "Detailed error message",
"location": {
"line": 1,
"column": 15
},
"details": [] // Optional additional context
}
}
Common Error Patterns¶
- Table not found
- Check project.dataset.table format
- Verify permissions
-
Ensure table exists
-
Column not found
- Check column name spelling
- Verify table schema
-
Use INFORMATION_SCHEMA to list columns
-
Type mismatch
- Check data types in comparisons
- Use CAST() for type conversion
- Verify function parameter types
Best Practices¶
1. Validate Before Execution¶
Always validate queries before running them:
# Pseudo-code workflow
validation = bq_validate_sql(sql)
if validation.isValid:
dry_run = bq_dry_run_sql(sql)
if dry_run.usdEstimate < threshold:
# Safe to execute
execute_query(sql)
2. Monitor Query Costs¶
Set cost thresholds and alerts:
MAX_QUERY_COST = 10.0 # $10 USD
result = bq_dry_run_sql(sql)
if result.usdEstimate > MAX_QUERY_COST:
raise Exception(f"Query too expensive: ${result.usdEstimate}")
3. Use Query Templates¶
Create reusable, parameterized queries:
-- query_template.sql
SELECT
user_id,
COUNT(*) as event_count
FROM events
WHERE
event_type = @event_type
AND date BETWEEN @start_date AND @end_date
GROUP BY user_id
4. Implement CI/CD Validation¶
Add SQL validation to your pipeline:
# .github/workflows/validate-sql.yml
- name: Validate SQL
run: |
for file in queries/*.sql; do
mcp-bigquery validate "$file"
done
Performance Tips¶
- Partition Pruning - Always filter by partition column
- Clustering Benefits - Order WHERE clauses by cluster columns
- **Avoid SELECT *** - Specify only needed columns
- Use APPROX functions - For large aggregations when exact values aren't required
- Materialized Views - Pre-compute expensive aggregations
- Query Cache - Note: Disabled in dry-run for accurate estimates
Next Steps¶
- See API Reference for complete tool documentation
- Check Examples for real-world use cases
- Read Development Guide for contributing