feat(snowflake): switch plugin/parser, advisors, and db driver to the omni parser#20567
feat(snowflake): switch plugin/parser, advisors, and db driver to the omni parser#20567h3n4l wants to merge 20 commits into
Conversation
…vs legacy ;-as-stmt quirk)
…h core 2/7) Rewrite diagnose.go to use github.com/bytebase/omni/snowflake/diagnostics (Analyze) instead of the legacy ANTLR error-listener path; drop the legacy parser + antlr imports from this file. Mirror trino/diagnose.go: map each omni Diagnostic to base.SyntaxError via a byte-offset position mapper (rune-based column) and convert with base.ConvertSyntaxErrorToDiagnostic. Add diagnose_test.go modeled on the Trino precedent, with cases pinned to omni's actual behavior and inline notes on each omni-vs-legacy divergence (trailing-garbage acceptance, empty/;-as-no-error). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…switch core 3/7) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eQuery) to omni (bytebase-switch core 4/7)
Rewrite query_type.go + query.go onto omni's hand-written parser/AST,
mirroring the Trino migration:
- query_type.go: getQueryType(ast.Node) type-switches omni AST nodes to
base.QueryType (SELECT/set-op->Select, INSERT/UPDATE/DELETE/MERGE/COPY INTO
->DML, SHOW/DESCRIBE->SelectInfoSchema, USE/SET->Select, COMMENT/everything
else->DDL), replacing the legacy ANTLR queryTypeListener.
- query.go: validateQuery splits via the omni splitter and classifies each
segment via omni, preserving the legacy SQL-editor contract
(allReadOnly, allReturnData, error) and the read-only set
{SELECT, SHOW, DESCRIBE, SET, EXPLAIN}; SET is read-only but not
data-returning; USE is rejected.
omni's Snowflake parser does not yet support EXPLAIN (parser.Parse errors),
so EXPLAIN is detected lexically and accepted as read-only+data-returning,
matching the legacy other_command.explain branch which never inspected the
inner statement.
The still-legacy query_span_extractor.go is the sole consumer of the old
queryTypeListener; it was relocated there verbatim (no behavior change) so
the extractor keeps compiling on the legacy ANTLR stack until its own
migration. ANTLR/antlr4-go imports are removed from query.go + query_type.go.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…omni does parse EXPLAIN; lexical short-circuit is for legacy parity)
…ase-switch core 5/7) pivot.yaml disabled: omni parses PIVOT/UNPIVOT but does not model the clause in the AST (consume-and-discard), so pivot lineage is unrecoverable until omni gains PIVOT AST nodes. Assertions preserved for re-enable.
… USING/NATURAL collapse, USE/SET empty span (cross-review findings) Four verified bugs in the omni-AST query-span extractor (column lineage feeds data masking): 1. UNION [ALL] BY NAME was merged positionally, attributing the wrong source columns when the branches list columns in different orders. Now merges by case-insensitive column name (right-only columns appended), mirroring omni's own snowflake/analysis by-name merge. NOTE: the legacy ANTLR extractor was ALSO positional here — fixing forward, not a parity regression. 2. CASE WHEN ... THEN and window-function OVER (...) lineage was dropped entirely: omni's generated walker does not descend into CaseExpr.Whens ([]*WhenClause), FuncCallExpr.OrderBy, or FuncCallExpr.Over (non-Node embedded structs). The Inspect callback now recurses into those sub-trees manually; collection into a set keeps it idempotent if the upstream walker is fixed later. 3. JOIN ... USING fell through to the generic append, duplicating the shared column — shifting every later positional mask by one — and NATURAL JOIN collapsed but kept only the left side's lineage. USING now emits each using-column ONCE (using cols first in USING order, then left, then right non-shared, per Snowflake JOIN docs) with merged left+right lineage; NATURAL keeps its column order but merges right lineage into the shared columns. NOTE: legacy had NATURAL-only collapse (left lineage only) and no USING handling — fixing forward to docs-correct behavior. The pre-existing Natural Join yaml case expectations are updated accordingly (right-side lineage was missing). 4. USE / SET are classified base.Select but are not query expressions; they hit extractPseudoTableFromQueryNode's "unsupported query node type" error. They now return an empty-results span with no error, exactly as the legacy selectOnlyListener (which never fired for them) shaped it. Regression yaml cases added for all four (UNION ALL BY NAME swapped columns, CASE WHEN, ROW_NUMBER OVER, JOIN USING star expansion, NATURAL JOIN star expansion, USE DATABASE, SET). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…overs SELECT targets/CTEs/CASE whens/OVER
…igration (ANTLR advisors unaffected) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… (omni divergence — legacy fired via Create_table_like listener gap)
…h teardown) - parseSnowflakeStatements omni-only; OmniAST drops the wrapped ANTLR tree + provider - generic_checker.go ANTLR walker deleted; ANTLR-free BaseRule plumbing kept (base_rule.go) - db/snowflake LIMIT rewriter rewritten via omni Loc text-splicing (test_limit.yaml byte-exact; needed omni fix #297: optional FETCH noise words + TOP-count infix-* bug, both legacy/docs parity) - zero github.com/bytebase/parser/snowflake imports remain; omni bumped to post-#297
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da665ecf22
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Collect the star-token offsets first and emit advices in text order so | ||
| // the advice order matches the legacy depth-first listener walk. | ||
| var starOffsets []int | ||
| omniast.Inspect(node, func(n omniast.Node) bool { |
There was a problem hiding this comment.
** Walk CTE SELECTs when enforcing SELECT ***
When a SELECT * appears inside a CTE or a scalar subquery target, this rule no longer visits it because it relies only on omniast.Inspect; the new query-span walker explicitly notes that the generated walker skips SelectStmt.With and SelectStmt.Targets (backend/plugin/parser/snowflake/query_span_extractor.go:1065-1068). For example, WITH c AS (SELECT * FROM t) SELECT a FROM c now produces no SELECT * advice even though the legacy ANTLR listener walked the full tree and flagged the CTE body, so this SQL review rule can be bypassed in common Snowflake queries.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Verified empirically and this turned out to be a stale-comment artifact: the omni generated walker was fixed upstream (bytebase/omni#294) to descend into SelectStmt.With/Targets (and CASE whens, OVER, etc.) before this PR's pin. WITH c AS (SELECT * FROM t) SELECT a FROM c DOES produce "Avoid using SELECT *." — added a yaml lock case for exactly this shape (statement_select_no_select_all.yaml), and corrected the stale comments in query_span_extractor.go that suggested otherwise. (fed16ac)
| // advice order matches the legacy depth-first listener walk (a CTE body | ||
| // is reported before the main SELECT that follows it). | ||
| var offsets []int | ||
| omniast.Inspect(node, func(n omniast.Node) bool { |
There was a problem hiding this comment.
Walk CTE SELECTs for WHERE-required checks
This traversal misses SELECTs in WITH clauses (and SELECT-list scalar subqueries) for the same reason the query-span code hand-walks those fields: the generated omni walker skips SelectStmt.With and SelectStmt.Targets (backend/plugin/parser/snowflake/query_span_extractor.go:1065-1068). A statement such as WITH c AS (SELECT * FROM t) SELECT * FROM c WHERE id = 1 will only inspect the outer SELECT and will not report the missing WHERE in the CTE body, regressing the documented behavior that this rule checks CTE bodies and subqueries.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Same root as the SELECT * comment: the walker gap was already fixed in bytebase/omni#294 (the extractor comment citing it was stale — now corrected). WITH c AS (SELECT x FROM t) SELECT a FROM c WHERE id = 1 DOES flag the WHERE-less CTE body (code 202) — added a yaml lock case (statement_where_require_select.yaml). (fed16ac)
| // from the AST. The legacy ANTLR extractor handled both. The expected | ||
| // lineage assertions are preserved in pivot.yaml for when omni gains the | ||
| // feature. Tracked as an omni-vs-legacy divergence in the migration ledger. | ||
| // "test-data/query-span/pivot.yaml", |
There was a problem hiding this comment.
Do not silently accept unsupported PIVOT lineage
Disabling the PIVOT/UNPIVOT query-span cases leaves production code to parse FROM t PIVOT(...) AS p as a bare FROM t, as this comment says, so GetQuerySpan will return lineage and result columns for the underlying table instead of the pivoted output. For Snowflake queries that use PIVOT/UNPIVOT, this can corrupt masking/sensitive-column analysis; until omni models the clause, the parser should fail closed or retain the legacy handling rather than silently producing incorrect lineage.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Agreed — fixed fail-closed. bytebase/omni#300 models PIVOT/UNPIVOT on ast.TableRef (Pivot/Unpivot/Nested; also fixes a chained-pivot parse bug and omni-side deparse/analysis), and this PR's extractor now returns an explicit error for pivoted table sources in extractTableSourceFromTableRef instead of silently resolving the bare base table — locked by TestGetQuerySpan_PivotFailsClosed; pivot.yaml stays disabled (assertions preserved) until pivot projection lineage is implemented. (fed16ac)
…y span; lock CTE-body advisor coverage - PIVOT: omni #300 models PIVOT/UNPIVOT on ast.TableRef (+ fixes omni-side parse/deparse/analysis); extractTableSourceFromTableRef now errors on Pivot/Unpivot/Nested (no silent wrong lineage); TestGetQuerySpan_PivotFailsClosed locks it; omni pin bumped. - CTE coverage: the advisors DO flag CTE bodies (walker fixed in omni #294) — added yaml lock cases for SELECT_NO_SELECT_ALL and WHERE_REQUIRE_SELECT; corrected the stale extractor comments that claimed the generated walker skips WITH/targets.
…terals, reword TODO
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2491636145
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if colRef, ok := expr.(*ast.ColumnRef); ok { | ||
| normalizedDatabaseName, normalizedSchemaName, normalizedTableName, normalizedColumnName := normalizeColumnRef(colRef) | ||
| querySpanResult, err := q.getField(normalizedDatabaseName, normalizedSchemaName, normalizedTableName, normalizedColumnName) |
There was a problem hiding this comment.
Preserve positional
$n column lineage
For Snowflake queries that use positional column references such as SELECT $1 FROM t, the legacy extractor explicitly recognized DOLLAR()/Column_position() and resolved $1 to the first field from the current FROM source. This port only treats target expressions as ColumnRef or generic expressions, so a $n reference is either looked up as a literal column name or contributes no source columns, which breaks query-span lineage and masking for those queries. The old positional-resolution branch has no replacement in this function.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 71453ad — *ast.DollarRef (Positional) now resolves to the N-th FROM-scope column (qualified T.$2 included; out-of-range errors match legacy), restoring the legacy DOLLAR/Column_position behavior; additionally $N inside compound expressions and predicates (TO_DATE($1), $1+1 — where legacy was also blind) now contributes lineage via a shared resolvePositionalRef in the expression walker. Locked by two yaml cases (SELECT $1, T.$2 FROM T1 T; SELECT TO_DATE($1) FROM T1).
Restores legacy DOLLAR/Column_position parity for SELECT targets (bare $1 and qualified T.$2 resolve to the N-th FROM-scope column, out-of-range errors) and goes one step further than legacy: $N inside compound expressions (TO_DATE($1), $1+1) and predicates now contributes the positional column's lineage via a shared resolvePositionalRef helper in the expression walker (legacy was blind there too). yaml lock cases added.
|



What
Switch the Snowflake plugin from the legacy ANTLR parser (
github.com/bytebase/parser/snowflake) to the omni hand-written parser (github.com/bytebase/omni/snowflake) — zero legacy imports remain. Follows the Trino cutover precedent (#20517). 23 files: 7 core parser features, 14 SQL-review advisors, the db-driver LIMIT rewriter, plus transition scaffolding (added then removed).How (feature-by-feature, each subagent-implemented + cross-reviewed)
GetOmniNodestyle) via a temporary dual-ASTOmniASTwrapper, so every commit kept the suite green; yaml advice expectations byte-identical throughout.Locbyte-offset text splicing (test_limit.yaml15/15 byte-exact).Cross-review findings fixed along the way (Codex + orchestrator verification per feature)
TOP 125 *mis-lexed).Documented divergences (ledgered; first-principles judged, test-locked where behavior changed)
SplitSQLdrops empty;-only statements (legacy emitted them asempty=falseand spuriously processed them; all consumers skip empties; matches MySQL/Trino-omni). Test-locked.Diagnoseno longer flags trailing-garbage tokens (SELECT * FFROM users) — omni is parse-prefix; follow-up filed omni-side. Unicode: omni correctly rejects unquoted non-ASCII identifiers legacy wrongly accepted; quoted forms fine.migration_compatibility:CREATE TABLE LIKE+ same-batchDROPno longer warns (legacy fired only via an accidental grammar-listener gap; in-batch creations suppress their own drop by design). Test-locked.pivot.yamllineage suite disabled with assertions preserved (omni follow-up needed). NATURAL JOIN star-expansion column order kept un-hoisted (docs silent; lineage merged correctly).Verification
Whole-repo
go build ./backend/...green;parser/snowflake,advisor/snowflake,db/snowflakesuites green (-count=1, hard timeouts); all yaml behavioral gates unchanged except the explicitly locked divergences above;grep "bytebase/parser/snowflake"→ 0. omni pinned at v0.0.0-20260610063901 (post-#294/#297).🤖 Generated with Claude Code