Optimizes Snowflake SQL query performance from provided query text. Use when optimizing Snowflake SQL for: (1) User provides or pastes a SQL query and asks to optimize, tune, or improve it (2) Task mentions "slow query", "make faster", "improve performance", "optimize SQL", or "query tuning" (3) Reviewing SQL for performance anti-patterns (function on filter column, implicit joins, etc.) (4) User asks why a query is slow or how to speed it up
apm install @altimateai/optimizing-query-text---
name: optimizing-query-text
description: |
Optimizes Snowflake SQL query performance from provided query text. Use when optimizing Snowflake SQL for:
(1) User provides or pastes a SQL query and asks to optimize, tune, or improve it
(2) Task mentions "slow query", "make faster", "improve performance", "optimize SQL", or "query tuning"
(3) Reviewing SQL for performance anti-patterns (function on filter column, implicit joins, etc.)
(4) User asks why a query is slow or how to speed it up
---
# Optimize Query from SQL Text
## OUTPUT FORMAT
Return ONLY the optimized SQL query. No markdown formatting, no explanations, no bullet points - just pure SQL that can be executed directly in Snowflake.
## CRITICAL: Semantic Preservation Rules
**The optimized query MUST return IDENTICAL results to the original.**
Before returning ANY optimization, verify:
- **Same columns**: Exact same columns in exact same order with exact same aliases
- **Same rows**: Filter conditions must be semantically equivalent
- **Same ordering**: Preserve `ORDER BY` exactly as written
- **Same limits**: If original has `LIMIT N`, keep `LIMIT N`. If no LIMIT, do NOT add one.
**If you cannot guarantee identical results, return the original query unchanged.**
---
## Pattern 1: Function on Filter Column
**Problem**: Functions on columns in WHERE clause prevent partition pruning and index usage.
### CAN Fix
| Original | Optimized | Why Safe |
|----------|-----------|----------|
| `WHERE DATE(ts) = '2024-01-01'` | `WHERE ts >= '2024-01-01' AND ts < '2024-01-02'` | Equivalent range |
| `WHERE YEAR(dt) = 2024` | `WHERE dt >= '2024-01-01' AND dt < '2025-01-01'` | Equivalent range |
| `WHERE MONTH(dt) = 3 AND YEAR(dt) = 2024` | `WHERE dt >= '2024-03-01' AND dt < '2024-04-01'` | Equivalent range |
| `WHERE DATE(ts) >= '2024-01-01' AND DATE(ts) < '2024-02-01'` | `WHERE ts >= '2024-01-01' AND ts < '2024-02-01'` | Same boundaries |
| `WHERE YEAR(dt) BETWEEN 1995 AND 1996` | `WHERE dt >= '1995-01-01' AND dt < '1997-01-01'` | Equivalent range |
### CANNOT Fix
| Pattern | Why Not |
|---------|---------|
| `WHERE YEAR(dt) IN (SELECT year FROM ...)` | Dynamic values, cannot precompute range |
| `WHERE DATE(ts) = DATE(other_col)` | Comparing two columns, both need function |
| `WHERE EXTRACT(DOW FROM dt) = 1` | Day-of-week has no contiguous range |
| `WHERE DATE_TRUNC('month', dt) = '2024-01-01'` in GROUP BY | Needed for grouping logic |
| `SELECT YEAR(dt) AS yr ... GROUP BY YEAR(dt)` | Function in SELECT/GROUP BY is fine, only filter matters |
---
## Pattern 2: Function on JOIN Column
**Problem**: Functions on JOIN columns prevent hash joins, forcing slower nested loop joins.
### CAN Fix
| Original | Optimized | Why Safe |
|----------|-----------|----------|
| `ON CAST(a.id AS VARCHAR) = CAST(b.id AS VARCHAR)` | `ON a.id = b.id` | If both are same type (e.g., INTEGER) |
| `ON UPPER(a.code) = UPPER(b.code)` | `ON a.code = b.code` | If data is already consistently cased |
| `ON TRIM(a.name) = TRIM(b.name)` | `ON a.name = b.name` | If data has no leading/trailing spaces |
### CANNOT Fix
| Pattern | Why Not |
|---------|---------|
| `ON CAST(a.id AS VARCHAR) = b.string_id` | Types genuinely differ, CAST required |
| `ON DATE(a.timestamp) = b.date_col` | Different granularity, DATE() required |
| `ON UPPER(a.code) = b.code` | If b.code might have different case |
| `ON a.id = b.id + 1` | Arithmetic transformation, cannot remove |
---
## Pattern 3: NOT IN Subquery
**Problem**: `NOT IN` has poor performance and unexpected NULL behavior.
### CAN Fix
| Original | Optimized | Why Safe |
|----------|-----------|----------|
| `WHERE id NOT IN (SELECT id FROM t WHERE ...)` | `WHERE NOT EXISTS (SELECT 1 FROM t WHERE t.id = main.id AND ...)` | Equivalent when subquery column is NOT NULL |
| `WHERE id NOT IN (SELECT id FROM t)` where id has NOT NULL constraint | `WHERE NOT EXISTS (SELECT 1 FROM t WHERE t.id = main.id)` | NOT NULL guarantees equivalence |
### CANNOT Fix
| Pattern | Why Not |
|---------|---------|
| `WHERE id NOT IN (SELECT nullable_col FROM t)` | If subquery returns NULL, NOT IN returns no rows; NOT EXISTS doesn't |
| `WHERE (a, b) NOT IN (SELECT x, y FROM t)` | Multi-column NOT IN has complex NULL semantics |
**Key Rule**: Only convert NOT IN to NOT EXISTS if you can verify the subquery column cannot be NULL.
---
## Pattern 4: Repeated Subquery
**Problem**: Same subquery executed multiple times causes redundant scans.
### CAN Fix
| Original | Optimized |
|----------|-----------|
| Subquery appears 2+ times identically | Extract to CTE, reference CTE multiple times |
| Same aggregation used in multiple places | Compute once in CTE |
### CANNOT Fix
| Pattern | Why Not |
|---------|---------|
| Correlated subquery (references outer table) | Each execution is different, cannot cache |
| Subqueries with different filters | Not actually the same subquery |
| Subquery in SELECT that depends on current row | Correlation prevents extraction |
---
## Pattern 5: Implicit Comma Joins
**Problem**: Comma-separated tables in FROM clause are harder to read and optimize.
### CAN Fix - Always
Convert `FROM a, b, c WHERE a.id = b.id AND b.id = c.id` to explicit JOIN syntax.
This is always safe - just restructuring, no semantic change.
---
## UNSAFE Optimizations (NEVER apply)
- **UNION to UNION ALL**: UNION deduplicates rows, UNION ALL does not - different results
- **Changing window functions**: Do not modify `SUM(SUM(x)) OVER(...)` or similar nested aggregates
- **Adding redundant filters**: Do not add filters in JOIN ON if same filter exists in WHERE
- **Changing column names**: Copy column names EXACTLY from original - do not "simplify" or rename
- **Changing column aliases**: Keep all aliases exactly as original
- **Adding early filtering in JOINs**: If a filter is in WHERE, do not duplicate it in JOIN ON clause
---
## Principles
1. **Minimal changes**: Make the fewest changes necessary. Simpler optimizations are more reliable.
2. **Preserve structure**: Keep subqueries, CTEs, and overall query structure unless there's a clear benefit.
3. **When in doubt, don't**: If unsure whether a change preserves semantics, skip it.
4. **Copy exactly**: Column names, table aliases, and expressions should be copied character-for-character.
---
## Priority Order
1. **Date/time functions on filter columns** - Highest impact
2. **Implicit joins to explicit JOIN** - Always safe, improves readability
3. **NOT IN to NOT EXISTS** - Only if NULL-safe
---
## Requirements
- **Results must be identical**: Same rows, same columns, same order
- **Valid Snowflake SQL**: Output must execute without errors in Snowflake