Skip to content

Usage Guide

This guide covers how to use MCP BigQuery's eleven tools for SQL validation, analysis, schema discovery, and performance optimization.

SQL Validation

The bq_validate_sql tool validates BigQuery SQL syntax without executing queries.

Basic Validation

{
  "tool": "bq_validate_sql",
  "arguments": {
    "sql": "SELECT name, age FROM users WHERE age > 18"
  }
}

Valid Response:

{
  "isValid": true
}

Invalid SQL Example:

{
  "tool": "bq_validate_sql",
  "arguments": {
    "sql": "SELECT name, FROM users"  // Extra comma
  }
}

Error Response:

{
  "isValid": false,
  "error": {
    "code": "INVALID_SQL",
    "message": "Syntax error: Unexpected ',' at [1:12]",
    "location": {
      "line": 1,
      "column": 12
    }
  }
}

Common Syntax Errors

Error Example Fix
Missing FROM SELECT * users Add FROM keyword
Invalid identifier SELECT user-name Use backticks: `user-name`
Unclosed string WHERE name = 'John Close the string: 'John'
Reserved keyword SELECT select FROM t Use backticks: `select`

Dry-Run Analysis

The bq_dry_run_sql tool provides cost estimates and metadata without executing queries.

Cost Estimation

{
  "tool": "bq_dry_run_sql",
  "arguments": {
    "sql": "SELECT * FROM `bigquery-public-data.samples.shakespeare`",
    "pricePerTiB": 5.0
  }
}

Response:

{
  "totalBytesProcessed": 2654199,
  "usdEstimate": 0.013,
  "referencedTables": [
    {
      "project": "bigquery-public-data",
      "dataset": "samples",
      "table": "shakespeare"
    }
  ],
  "schemaPreview": [
    {"name": "word", "type": "STRING", "mode": "NULLABLE"},
    {"name": "word_count", "type": "INT64", "mode": "NULLABLE"},
    {"name": "corpus", "type": "STRING", "mode": "NULLABLE"},
    {"name": "corpus_date", "type": "INT64", "mode": "NULLABLE"}
  ]
}

Cost Optimization Tips

  1. Use partitioned tables - Filter by partition column to reduce scan

    SELECT * FROM events 
    WHERE date = '2024-01-01'  -- Partition column
    

  2. Select only needed columns - Avoid SELECT *

    -- Bad: Scans all columns
    SELECT * FROM large_table
    
    -- Good: Scans only 2 columns
    SELECT id, name FROM large_table
    

  3. Use clustering - Filter by clustered columns for efficiency

    SELECT * FROM sales
    WHERE region = 'US'  -- Clustered column
    AND product_id = 123
    

Using Parameters

Both tools support parameterized queries for safe, reusable SQL.

Basic Parameters

{
  "tool": "bq_validate_sql",
  "arguments": {
    "sql": "SELECT * FROM orders WHERE date = @date AND status = @status",
    "params": {
      "date": "2024-01-01",
      "status": "completed"
    }
  }
}

Array Parameters

{
  "tool": "bq_validate_sql",
  "arguments": {
    "sql": "SELECT * FROM products WHERE category IN UNNEST(@categories)",
    "params": {
      "categories": ["electronics", "books", "clothing"]
    }
  }
}

Struct Parameters

{
  "tool": "bq_validate_sql",
  "arguments": {
    "sql": "SELECT * FROM users WHERE address.city = @filter.city",
    "params": {
      "filter": {
        "city": "New York",
        "state": "NY"
      }
    }
  }
}

Note: In dry-run mode, all parameters are treated as STRING type.

Advanced Queries

CTEs and Window Functions

WITH monthly_sales AS (
  SELECT 
    DATE_TRUNC(sale_date, MONTH) as month,
    SUM(amount) as total_sales
  FROM sales
  GROUP BY month
)
SELECT 
  month,
  total_sales,
  LAG(total_sales) OVER (ORDER BY month) as previous_month,
  total_sales - LAG(total_sales) OVER (ORDER BY month) as growth
FROM monthly_sales
ORDER BY month DESC

MERGE Statements

MERGE target_table T
USING source_table S
ON T.id = S.id
WHEN MATCHED THEN
  UPDATE SET T.value = S.value, T.updated_at = CURRENT_TIMESTAMP()
WHEN NOT MATCHED THEN
  INSERT (id, value, created_at)
  VALUES (S.id, S.value, CURRENT_TIMESTAMP())

DDL Validation

CREATE OR REPLACE TABLE dataset.new_table
PARTITION BY DATE(created_at)
CLUSTER BY user_id, region
AS
SELECT * FROM dataset.source_table
WHERE created_at >= '2024-01-01'

Working with Different SQL Types

Standard SQL (Default)

{
  "tool": "bq_validate_sql",
  "arguments": {
    "sql": "SELECT * FROM `project.dataset.table` WHERE date > '2024-01-01'"
  }
}

DML Operations

-- INSERT
INSERT INTO table_name (col1, col2)
VALUES ('value1', 'value2')

-- UPDATE
UPDATE table_name
SET column1 = 'new_value'
WHERE condition

-- DELETE
DELETE FROM table_name
WHERE date < '2024-01-01'

Scripting

DECLARE x INT64 DEFAULT 0;
DECLARE y STRING;

SET x = 5;
SET y = 'Hello';

IF x > 0 THEN
  SELECT CONCAT(y, ' World') as greeting;
END IF;

Error Handling

Understanding Error Responses

All errors follow this structure:

{
  "error": {
    "code": "INVALID_SQL",
    "message": "Detailed error message",
    "location": {
      "line": 1,
      "column": 15
    },
    "details": []  // Optional additional context
  }
}

Common Error Patterns

  1. Table not found
  2. Check project.dataset.table format
  3. Verify permissions
  4. Ensure table exists

  5. Column not found

  6. Check column name spelling
  7. Verify table schema
  8. Use INFORMATION_SCHEMA to list columns

  9. Type mismatch

  10. Check data types in comparisons
  11. Use CAST() for type conversion
  12. Verify function parameter types

Best Practices

1. Validate Before Execution

Always validate queries before running them:

# Pseudo-code workflow
validation = bq_validate_sql(sql)
if validation.isValid:
    dry_run = bq_dry_run_sql(sql)
    if dry_run.usdEstimate < threshold:
        # Safe to execute
        execute_query(sql)

2. Monitor Query Costs

Set cost thresholds and alerts:

MAX_QUERY_COST = 10.0  # $10 USD

result = bq_dry_run_sql(sql)
if result.usdEstimate > MAX_QUERY_COST:
    raise Exception(f"Query too expensive: ${result.usdEstimate}")

3. Use Query Templates

Create reusable, parameterized queries:

-- query_template.sql
SELECT 
  user_id,
  COUNT(*) as event_count
FROM events
WHERE 
  event_type = @event_type
  AND date BETWEEN @start_date AND @end_date
GROUP BY user_id

4. Implement CI/CD Validation

Add SQL validation to your pipeline:

# .github/workflows/validate-sql.yml
- name: Validate SQL
  run: |
    for file in queries/*.sql; do
      mcp-bigquery validate "$file"
    done

SQL Analysis

Query Structure Analysis

The bq_analyze_query_structure tool analyzes SQL complexity and patterns:

{
  "tool": "bq_analyze_query_structure",
  "arguments": {
    "sql": "WITH user_orders AS (SELECT user_id, COUNT(*) as cnt FROM orders GROUP BY user_id) SELECT * FROM user_orders WHERE cnt > 10"
  }
}

Response:

{
  "query_type": "SELECT",
  "has_cte": true,
  "has_aggregations": true,
  "complexity_score": 15,
  "table_count": 1
}

Dependency Extraction

Extract table and column dependencies with bq_extract_dependencies:

{
  "tool": "bq_extract_dependencies",
  "arguments": {
    "sql": "SELECT u.name, o.total FROM users u JOIN orders o ON u.id = o.user_id"
  }
}

Enhanced Syntax Validation

Get detailed validation with suggestions using bq_validate_query_syntax:

{
  "tool": "bq_validate_query_syntax",
  "arguments": {
    "sql": "SELECT * FROM users LIMIT 10"
  }
}

Schema Discovery

List Datasets

Discover available datasets with bq_list_datasets:

{
  "tool": "bq_list_datasets",
  "arguments": {
    "project_id": "my-project",
    "max_results": 10
  }
}

Browse Tables

Explore tables in a dataset with bq_list_tables:

{
  "tool": "bq_list_tables",
  "arguments": {
    "dataset_id": "analytics",
    "table_type_filter": ["TABLE", "VIEW"]
  }
}

Describe Table Schema

Get detailed schema information with bq_describe_table:

{
  "tool": "bq_describe_table",
  "arguments": {
    "table_id": "users",
    "dataset_id": "analytics",
    "format_output": true
  }
}

Comprehensive Table Info

Access all table metadata with bq_get_table_info:

{
  "tool": "bq_get_table_info",
  "arguments": {
    "table_id": "users",
    "dataset_id": "analytics"
  }
}

INFORMATION_SCHEMA Queries

Query Metadata Views

Use bq_query_info_schema to safely query INFORMATION_SCHEMA:

{
  "tool": "bq_query_info_schema",
  "arguments": {
    "query_type": "columns",
    "dataset_id": "analytics",
    "table_filter": "users"
  }
}

Available query types: - tables - Table metadata - columns - Column information - table_storage - Storage statistics - partitions - Partition details - views - View definitions - routines - Stored procedures - custom - Custom INFORMATION_SCHEMA queries

Performance Analysis

Analyze Query Performance

Get performance insights with bq_analyze_query_performance:

{
  "tool": "bq_analyze_query_performance",
  "arguments": {
    "sql": "SELECT * FROM large_table WHERE date > '2024-01-01'"
  }
}

Response includes: - Performance score (0-100) - Performance rating (EXCELLENT, GOOD, FAIR, NEEDS_OPTIMIZATION) - Optimization suggestions with severity levels - Cost estimates and data scan volume

Optimization Recommendations

Common optimization suggestions: 1. SELECT_STAR - Replace with specific columns 2. HIGH_DATA_SCAN - Add WHERE clauses or use partitions 3. LIMIT_WITHOUT_ORDER - Add ORDER BY for consistency 4. CROSS_JOIN - Verify necessity, use INNER JOIN if possible 5. SUBQUERY_IN_WHERE - Consider JOIN or WITH clause 6. MANY_TABLES - Create intermediate tables or views

Advanced Features

Working with Nested Fields

Handle RECORD and ARRAY types:

SELECT 
  user.name,
  user.address.city,
  ARRAY_LENGTH(user.tags) as tag_count
FROM users_with_nested_data

Partitioning and Clustering

Identify partitioned tables:

{
  "tool": "bq_describe_table",
  "arguments": {
    "table_id": "events",
    "dataset_id": "analytics"
  }
}

Response includes partitioning details:

{
  "partitioning": {
    "type": "DAY",
    "field": "event_date",
    "require_partition_filter": true
  },
  "clustering_fields": ["user_id", "event_type"]
}

Performance Tips

  1. Partition Pruning - Always filter by partition column
  2. Clustering Benefits - Order WHERE clauses by cluster columns
  3. **Avoid SELECT *** - Specify only needed columns
  4. Use APPROX functions - For large aggregations when exact values aren't required
  5. Materialized Views - Pre-compute expensive aggregations
  6. Query Cache - Note: Disabled in dry-run for accurate estimates

Next Steps