Blogchevron_rightEngineering
Engineering

How Our SQL Prettifier Works Under the Hood

A technical walkthrough of our AST-based SQL formatter and the engineering decisions that make it handle edge cases gracefully.

September 18, 2024·9 min read·Try the SQL Formatter →

Why SQL Formatting Is Hard

SQL looks simple on the surface but is notoriously difficult to format well. Unlike JSON or YAML — which have rigid, unambiguous grammars — SQL has decades of dialects, vendor extensions, and underspecified behaviors. A naive formatter that works on standard SELECT statements will break on window functions, CTEs, lateral joins, or PostgreSQL-specific syntax.

Regex-based approaches are a dead end. They can handle simple cases but collapse under real-world queries. Building a proper SQL formatter requires a real parse step.

The Three-Stage Pipeline

Our SQL formatter operates in three distinct stages:

{step}

{title}

{desc}

{step}

{title}

{desc}

{step}

{title}

{desc}

Handling Dialects

SQL dialects differ in ways that matter for formatting. PostgreSQL uses :: for type casts. MySQL uses backtick-quoted identifiers. BigQuery supports QUALIFY and STRUCT. T-SQL uses TOP instead of LIMIT.

We handle this by parameterizing the lexer and parser with a dialect configuration. The dialect config specifies which tokens are valid keywords in that dialect, which operators are supported, and how identifier quoting works. This allows a single parse pipeline to handle MySQL, PostgreSQL, SQLite, BigQuery, and T-SQL without branching spaghetti in the core logic.

Edge Cases We Had to Get Right

{title}

{code}

{desc}

{title}

{code}

{desc}

{title}

{code}

{desc}

Error Recovery

Real-world SQL is often syntactically incorrect — queries under development, queries extracted from logs, fragments. A formatter that refuses to format invalid SQL is frustrating.

Our parser implements error recovery: when it encounters an unexpected token, it logs the error, skips tokens until it finds a safe resynchronization point (typically a statement boundary or a known clause keyword), and continues parsing. The resulting AST may be incomplete, but the formatter can still produce useful output for the portions it understood.

Try the SQL Formatter

Paste any SQL query — standard or dialect-specific — and get clean, readable output.

Open SQL Formatter →

Related Articles

How Our SQL Prettifier Works Under the Hood