Usage Guide¶

This guide covers how to use MCP BigQuery's eleven tools for SQL validation, analysis, schema discovery, and performance optimization.

SQL Validation¶

The bq_validate_sql tool validates BigQuery SQL syntax without executing queries.

Basic Validation¶

{
  "tool": "bq_validate_sql",
  "arguments": {
    "sql": "SELECT name, age FROM users WHERE age > 18"
  }
}

Valid Response:

{
  "isValid": true
}

Invalid SQL Example:

{
  "tool": "bq_validate_sql",
  "arguments": {
    "sql": "SELECT name, FROM users"  // Extra comma
  }
}

Error Response:

{
  "isValid": false,
  "error": {
    "code": "INVALID_SQL",
    "message": "Syntax error: Unexpected ',' at [1:12]",
    "location": {
      "line": 1,
      "column": 12
    }
  }
}

Common Syntax Errors¶

Error	Example	Fix
Missing FROM	`SELECT * users`	Add `FROM` keyword
Invalid identifier	`SELECT user-name`	Use backticks: `user-name`
Unclosed string	`WHERE name = 'John`	Close the string: `'John'`
Reserved keyword	`SELECT select FROM t`	Use backticks: `select`

Dry-Run Analysis¶

The bq_dry_run_sql tool provides cost estimates and metadata without executing queries.

Cost Estimation¶

{
  "tool": "bq_dry_run_sql",
  "arguments": {
    "sql": "SELECT * FROM `bigquery-public-data.samples.shakespeare`",
    "pricePerTiB": 5.0
  }
}

Response:

{
  "totalBytesProcessed": 2654199,
  "usdEstimate": 0.013,
  "referencedTables": [
    {
      "project": "bigquery-public-data",
      "dataset": "samples",
      "table": "shakespeare"
    }
  ],
  "schemaPreview": [
    {"name": "word", "type": "STRING", "mode": "NULLABLE"},
    {"name": "word_count", "type": "INT64", "mode": "NULLABLE"},
    {"name": "corpus", "type": "STRING", "mode": "NULLABLE"},
    {"name": "corpus_date", "type": "INT64", "mode": "NULLABLE"}
  ]
}

Cost Optimization Tips¶

Use partitioned tables - Filter by partition column to reduce scan

SELECT * FROM events 
WHERE date = '2024-01-01'  -- Partition column

Select only needed columns - Avoid SELECT *

-- Bad: Scans all columns
SELECT * FROM large_table

-- Good: Scans only 2 columns
SELECT id, name FROM large_table

Use clustering - Filter by clustered columns for efficiency

SELECT * FROM sales
WHERE region = 'US'  -- Clustered column
AND product_id = 123

Using Parameters¶

Both tools support parameterized queries for safe, reusable SQL.

Basic Parameters¶

{
  "tool": "bq_validate_sql",
  "arguments": {
    "sql": "SELECT * FROM orders WHERE date = @date AND status = @status",
    "params": {
      "date": "2024-01-01",
      "status": "completed"
    }
  }
}

Array Parameters¶

{
  "tool": "bq_validate_sql",
  "arguments": {
    "sql": "SELECT * FROM products WHERE category IN UNNEST(@categories)",
    "params": {
      "categories": ["electronics", "books", "clothing"]
    }
  }
}

Struct Parameters¶

{
  "tool": "bq_validate_sql",
  "arguments": {
    "sql": "SELECT * FROM users WHERE address.city = @filter.city",
    "params": {
      "filter": {
        "city": "New York",
        "state": "NY"
      }
    }
  }
}

Note: In dry-run mode, all parameters are treated as STRING type.

Advanced Queries¶

CTEs and Window Functions¶

WITH monthly_sales AS (
  SELECT 
    DATE_TRUNC(sale_date, MONTH) as month,
    SUM(amount) as total_sales
  FROM sales
  GROUP BY month
)
SELECT 
  month,
  total_sales,
  LAG(total_sales) OVER (ORDER BY month) as previous_month,
  total_sales - LAG(total_sales) OVER (ORDER BY month) as growth
FROM monthly_sales
ORDER BY month DESC

MERGE Statements¶

MERGE target_table T
USING source_table S
ON T.id = S.id
WHEN MATCHED THEN
  UPDATE SET T.value = S.value, T.updated_at = CURRENT_TIMESTAMP()
WHEN NOT MATCHED THEN
  INSERT (id, value, created_at)
  VALUES (S.id, S.value, CURRENT_TIMESTAMP())

DDL Validation¶

CREATE OR REPLACE TABLE dataset.new_table
PARTITION BY DATE(created_at)
CLUSTER BY user_id, region
AS
SELECT * FROM dataset.source_table
WHERE created_at >= '2024-01-01'

Working with Different SQL Types¶

Standard SQL (Default)¶

{
  "tool": "bq_validate_sql",
  "arguments": {
    "sql": "SELECT * FROM `project.dataset.table` WHERE date > '2024-01-01'"
  }
}

DML Operations¶

-- INSERT
INSERT INTO table_name (col1, col2)
VALUES ('value1', 'value2')

-- UPDATE
UPDATE table_name
SET column1 = 'new_value'
WHERE condition

-- DELETE
DELETE FROM table_name
WHERE date < '2024-01-01'

Scripting¶

DECLARE x INT64 DEFAULT 0;
DECLARE y STRING;

SET x = 5;
SET y = 'Hello';

IF x > 0 THEN
  SELECT CONCAT(y, ' World') as greeting;
END IF;

Error Handling¶

Understanding Error Responses¶

All errors follow this structure:

{
  "error": {
    "code": "INVALID_SQL",
    "message": "Detailed error message",
    "location": {
      "line": 1,
      "column": 15
    },
    "details": []  // Optional additional context
  }
}

Common Error Patterns¶

Table not found
Check project.dataset.table format
Verify permissions
Ensure table exists
Column not found
Check column name spelling
Verify table schema
Use INFORMATION_SCHEMA to list columns
Type mismatch
Check data types in comparisons
Use CAST() for type conversion
Verify function parameter types

Best Practices¶

1. Validate Before Execution¶

Always validate queries before running them:

# Pseudo-code workflow
validation = bq_validate_sql(sql)
if validation.isValid:
    dry_run = bq_dry_run_sql(sql)
    if dry_run.usdEstimate < threshold:
        # Safe to execute
        execute_query(sql)

2. Monitor Query Costs¶

Set cost thresholds and alerts:

MAX_QUERY_COST = 10.0  # $10 USD

result = bq_dry_run_sql(sql)
if result.usdEstimate > MAX_QUERY_COST:
    raise Exception(f"Query too expensive: ${result.usdEstimate}")

3. Use Query Templates¶

Create reusable, parameterized queries:

-- query_template.sql
SELECT 
  user_id,
  COUNT(*) as event_count
FROM events
WHERE 
  event_type = @event_type
  AND date BETWEEN @start_date AND @end_date
GROUP BY user_id

4. Implement CI/CD Validation¶

Add SQL validation to your pipeline:

# .github/workflows/validate-sql.yml
- name: Validate SQL
  run: |
    for file in queries/*.sql; do
      mcp-bigquery validate "$file"
    done

SQL Analysis¶

Query Structure Analysis¶

The bq_analyze_query_structure tool analyzes SQL complexity and patterns:

{
  "tool": "bq_analyze_query_structure",
  "arguments": {
    "sql": "WITH user_orders AS (SELECT user_id, COUNT(*) as cnt FROM orders GROUP BY user_id) SELECT * FROM user_orders WHERE cnt > 10"
  }
}

Response:

{
  "query_type": "SELECT",
  "has_cte": true,
  "has_aggregations": true,
  "complexity_score": 15,
  "table_count": 1
}

Dependency Extraction¶

Extract table and column dependencies with bq_extract_dependencies:

{
  "tool": "bq_extract_dependencies",
  "arguments": {
    "sql": "SELECT u.name, o.total FROM users u JOIN orders o ON u.id = o.user_id"
  }
}

Enhanced Syntax Validation¶

Get detailed validation with suggestions using bq_validate_query_syntax:

{
  "tool": "bq_validate_query_syntax",
  "arguments": {
    "sql": "SELECT * FROM users LIMIT 10"
  }
}

Schema Discovery¶

List Datasets¶

Discover available datasets with bq_list_datasets:

{
  "tool": "bq_list_datasets",
  "arguments": {
    "project_id": "my-project",
    "max_results": 10
  }
}

Browse Tables¶

Explore tables in a dataset with bq_list_tables:

{
  "tool": "bq_list_tables",
  "arguments": {
    "dataset_id": "analytics",
    "table_type_filter": ["TABLE", "VIEW"]
  }
}

Describe Table Schema¶

Get detailed schema information with bq_describe_table:

{
  "tool": "bq_describe_table",
  "arguments": {
    "table_id": "users",
    "dataset_id": "analytics",
    "format_output": true
  }
}

Comprehensive Table Info¶

Access all table metadata with bq_get_table_info:

{
  "tool": "bq_get_table_info",
  "arguments": {
    "table_id": "users",
    "dataset_id": "analytics"
  }
}

INFORMATION_SCHEMA Queries¶

Query Metadata Views¶

Use bq_query_info_schema to safely query INFORMATION_SCHEMA:

{
  "tool": "bq_query_info_schema",
  "arguments": {
    "query_type": "columns",
    "dataset_id": "analytics",
    "table_filter": "users"
  }
}

Available query types: - tables - Table metadata - columns - Column information - table_storage - Storage statistics - partitions - Partition details - views - View definitions - routines - Stored procedures - custom - Custom INFORMATION_SCHEMA queries

Performance Analysis¶

Analyze Query Performance¶

Get performance insights with bq_analyze_query_performance:

{
  "tool": "bq_analyze_query_performance",
  "arguments": {
    "sql": "SELECT * FROM large_table WHERE date > '2024-01-01'"
  }
}

Response includes: - Performance score (0-100) - Performance rating (EXCELLENT, GOOD, FAIR, NEEDS_OPTIMIZATION) - Optimization suggestions with severity levels - Cost estimates and data scan volume

Optimization Recommendations¶

Common optimization suggestions: 1. SELECT_STAR - Replace with specific columns 2. HIGH_DATA_SCAN - Add WHERE clauses or use partitions 3. LIMIT_WITHOUT_ORDER - Add ORDER BY for consistency 4. CROSS_JOIN - Verify necessity, use INNER JOIN if possible 5. SUBQUERY_IN_WHERE - Consider JOIN or WITH clause 6. MANY_TABLES - Create intermediate tables or views

Advanced Features¶

Working with Nested Fields¶

Handle RECORD and ARRAY types:

SELECT 
  user.name,
  user.address.city,
  ARRAY_LENGTH(user.tags) as tag_count
FROM users_with_nested_data

Partitioning and Clustering¶

Identify partitioned tables:

{
  "tool": "bq_describe_table",
  "arguments": {
    "table_id": "events",
    "dataset_id": "analytics"
  }
}

Response includes partitioning details:

{
  "partitioning": {
    "type": "DAY",
    "field": "event_date",
    "require_partition_filter": true
  },
  "clustering_fields": ["user_id", "event_type"]
}

Performance Tips¶

Partition Pruning - Always filter by partition column
Clustering Benefits - Order WHERE clauses by cluster columns
**Avoid SELECT *** - Specify only needed columns
Use APPROX functions - For large aggregations when exact values aren't required
Materialized Views - Pre-compute expensive aggregations
Query Cache - Note: Disabled in dry-run for accurate estimates

Next Steps¶

See API Reference for complete tool documentation
Check Examples for real-world use cases
Read Development Guide for contributing