feat(trino): migrate Trino parser plugin from ANTLR to omni by h3n4l · Pull Request #20517 · bytebase/bytebase

h3n4l · 2026-06-05T02:59:08Z

Cuts over bytebase's Trino parser from the legacy ANTLR parser (github.com/bytebase/parser/trino) to the hand-written omni parser (github.com/bytebase/omni/trino/*) — the final step of the Trino omni migration (17 omni nodes already merged to bytebase/omni). Follows the already-shipped doris precedent.

What changed

backend/plugin/parser/trino + backend/plugin/schema/trino rewired; the legacy bytebase/parser/trino import is gone; the ANTLR-only query_span_listener.go / query_span_predicate.go and the ~1700-line CodeCompletionCore completer are deleted. omni bumped to a main version with all trino packages.

P0 handler	now backed by
`SplitSQL`	`trino/parser.Split`
`parseTrinoStatements`	`trino/parser.Parse`
`Diagnose`	`trino/parser.Diagnose`
`validateQuery` / query-type	omni AST inspection + `analysis.Classify`
`GetQuerySpan` (masking/lineage)	`trino/analysis.GetQuerySpan` + metadata column expansion
`GetDatabaseDefinition`	`trino/deparse.GetDatabaseDefinition`
`Completion`	`trino/completion.Complete`

Verification

go build ./backend/... ✅ · go vet ✅ · gofmt clean ✅
go test ./backend/plugin/{parser,schema}/trino/... ✅ (the feature differential — existing trino tests pass)
doris (another omni engine) tests still ✅ with the omni bump.

Documented divergences from the legacy parser (please review)

Completion: the omni completer is intentionally not a c3 port — it emits {Text, Type} candidates only (no Definition/Priority), and column candidates are the in-scope FROM columns (query-span-resolved) rather than the legacy "every column in the default schema, ranked". Tests updated to the omni candidate model.
Query span: omni analysis.GetQuerySpan returns table access + result columns; the extractor expands tables to column-level SourceColumns via bytebase metadata and rebuilds per-result + predicate columns. A couple of query_span test expectations were adjusted to omni's (oracle-validated) behavior — review the diffs.

Reviewer focus: the query-span SourceColumns mapping (masking input) and the completion candidate-model change.

🤖 Generated with Claude Code

Cut over backend/plugin/parser/trino + backend/plugin/schema/trino from the legacy ANTLR parser (github.com/bytebase/parser/trino) to the hand-written omni parser (github.com/bytebase/omni/trino/*), following the doris precedent. The legacy bytebase/parser/trino import is fully removed. The 6 registered P0 handlers are now omni-backed: - SplitSQL → trino/parser.Split - parseTrinoStatements → trino/parser.Parse - Diagnose → trino/parser.Diagnose - validateQuery/queryType → omni AST inspection + analysis.Classify - GetQuerySpan → trino/analysis.GetQuerySpan (+ metadata column expansion) - GetDatabaseDefinition → trino/deparse.GetDatabaseDefinition - Completion → trino/completion.Complete Deleted the ANTLR-only query_span_listener.go / query_span_predicate.go and the ~1700-line CodeCompletionCore completer. omni bumped to a main version that includes all trino packages. Documented divergences from the legacy parser: - Completion: the omni completer is intentionally not a c3 port — it emits {Text,Type} candidates only (no Definition/Priority), and column candidates are the in-scope FROM columns resolved via query-span analysis. - Query span: built from omni analysis table access, expanded to column-level SourceColumns via bytebase metadata; per-result + predicate columns preserved. Feature differential green: go test ./backend/plugin/{parser,schema}/trino/... Whole backend builds; doris (another omni engine) unaffected by the omni bump. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5abec33f40

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

CI: - golangci-lint: drop unused method receivers + an unused test param (0 issues). - SonarCloud duplication: use the shared base.NewByteOffsetPositionMapper for byte->position conversion in split.go/diagnose.go/trino.go and extract the duplicated column-match loop into addMatchingColumns; net -91 lines and these files no longer duplicate the doris adapter's local helpers. Correctness (Claude + Codex cross-family review): - EXPLAIN span unwrap returned empty SourceColumns: ast.NodeLoc only handles ast-package nodes, not parser statement nodes, so the inner query never unwrapped. Use nodeSpan (QueryStmt.Loc / Span()). - Security: EXPLAIN ANALYZE executes its inner statement but getQueryType reported it read-only unconditionally, so EXPLAIN ANALYZE <DML> passed the read-only SQL-editor gate. getQueryType now recurses into the inner statement (oracle-confirmed: Trino 481 runs it), so EXPLAIN ANALYZE UPDATE is rejected. Tracked as follow-up tasks: aliased per-column masking; SELECT * result-column order (positional masking); WITH FUNCTION read-only classification; quoted- identifier case sensitivity in column matching. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

parseTrinoStatements and parseTrinoSQL shared a near-identical per-segment parse loop (split -> parse each -> wrap omniAST -> convert first error), which SonarCloud flagged as duplicated against the doris adapter's equivalent two functions (7.4% duplication on new code, gate <=3%). Extract the common logic into parseSegment(stmt) (*omniAST, error), so the two callers become thin loops with distinct bodies. dupl now reports 0 clone groups at threshold 50 and 70. Behavior is preserved (verified by cross-agent review): empty segments yield a nil-AST placeholder in parseTrinoStatements and are skipped in parseTrinoSQL; the first parse error returns immediately via convertParseError. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

go.mod pins github.com/bytebase/omni at 20260604221654-107b8b4db5bf, but go.sum still carried the leftover hash lines for the prior 20260604130724 revision (not referenced by any module), so `go mod tidy && git diff --exit-code` failed in the go-tests CI job. Remove the two orphaned lines. go mod tidy produces no other changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9c12d7b239

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…rate # Conflicts: # go.mod # go.sum

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f68d44d8fa

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Addresses three masking-correctness issues raised in review of the query span extractor, each in the data-leak (under-mask) direction. The masker applies per-result maskers positionally against the executed result's column order, so lineage order/coverage is security-relevant. 1. SELECT * column order: the star was expanded by ranging a Go map, so result columns came out in nondeterministic order and the positional masker could mask the wrong column. Expansion now uses an ordered slice built in FROM-table then metadata-column order. The ordered slice is NOT deduplicated, so a self-join (FROM users u1 JOIN users u2) emits each instance's columns and stays aligned with Trino's 2N output columns. 2. System / information_schema queries: column expansion resolved Trino's system pseudo-catalogs as Bytebase databases, producing a ResourceNotFoundError that sql_service turns into a hard "failed to mask data" rejection of valid SQL-info queries. System tables are now detected per resolved table (catalog=="system" or schema=="information_schema"), not by a query-text substring scan, so a "system." string literal cannot suppress masking of a real table in the same statement. 3. Aliased columns: an alias-qualified select item (u.email over "users u") was compared against physical table names and dropped, leaving empty lineage so the masker treated a sensitive column as a constant. Alias resolution is now additive (alias -> all physical tables it maps to; match accepts the written name or any mapped table), which resolves reused/shadowed aliases across subquery scopes by over-including (masks more) instead of under-matching. Both fixes were cross-agent reviewed (Claude + Codex); Codex caught the self-join, substring-bypass, and alias-scope edge cases above, which this revision resolves. One documented best-effort residual remains: a mixed SELECT * over a system-table-joined-with-real-table can misalign positions because system tables have no enumerable column metadata (tracked as a follow-up; such queries are already classified SelectInfoSchema). Adds direct extractor tests for column order, self-join column count, aliased and shadowed-alias lineage, the system-substring non-bypass, and the no-NotFoundError behaviour for system queries. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 549f651ec2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…rate

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d51f297fb2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

A qualified star select item such as `SELECT u.* FROM users u` is named "u.*" by omni, so it was not matched by the unqualified "*" expansion and fell through to the column-name path, producing empty SourceColumns. The masker then treated those (possibly sensitive) columns as constants and returned them unmasked. buildResults now detects a genuine qualified star by its source-ref shape (Name ends ".*" and the single source ref's Column equals the qualifier), not just the display name — so a column merely aliased to a quoted name ending in ".*" (e.g. SELECT email AS "u.*") is NOT mistaken for a star. qualifiedStarTargets resolves the star to the exact (database, schema, table) relations it names — via the relation alias when present, otherwise the written catalog/schema/table path, restricted to base tables actually in AccessTables — so it expands only that relation's columns, in order, and distinguishes same-named tables in different schemas/catalogs. Documented best-effort residual: a star over a CTE or derived relation (SELECT * FROM cte / c.*) cannot be expanded correctly because omni does not expose the resolved output projection (the prior ANTLR plugin had the same gap); tracked as a follow-up. Cross-agent reviewed (Claude + Codex) across several iterations; Codex caught the alias-ending-in-".*" false positive, the dotted-star rejection, and the same-named-table mis-target, all fixed here. Adds tests for the alias / bare-table / schema-qualified star forms, same-named-table disambiguation, and the aliased-not-a-star case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rate

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 77f2f81c6c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

buildCompletionCatalog fetched full metadata for every database returned by ListDatabaseNames on each completion request. The LSP invokes completion on every keystroke, and Trino instances routinely federate many catalogs, so a single keystroke could fan out into hundreds of metadata loads and stall completion — the legacy completer loaded catalog metadata on demand. Register every catalog NAME cheaply (so catalog-level completion still lists them all), but load full schema/table/column metadata only for catalogs the current statement needs: the session default catalog, and any catalog named in the statement text (a qualified catalog.schema.table reference). A catalog name containing a double quote — the only identifier character Trino re-spells when quoting — is loaded unconditionally so it is never missed. Cross-agent reviewed (Claude + Codex). Adds a test asserting unreferenced catalogs are not loaded and a referenced one is; existing completion behaviour (catalog/schema/table/column/CTE contexts) is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c8655c536f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

buildCompletionCatalog created its own context.Background() instead of using the Completion caller's context. The LSP metadata callbacks (ListDatabaseNamesFunc / GetDatabaseMetadataFunc) read the workspace ID from the context via common.GetWorkspaceIDFromContext, so dropping it made object lookups resolve no databases and left LSP completion with keyword-only candidates. Thread the caller's ctx through Completion -> buildCompletionCatalog -> the metadata callbacks. This is the fix recommended by the PR review bot; existing completion tests still pass (the mock getter ignores ctx). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-06-05T10:00:49Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

h3n4l requested a review from a team as a code owner June 5, 2026 02:59

cla-bot Bot added the cla-signed label Jun 5, 2026

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread backend/plugin/parser/trino/query_span_extractor.go Outdated

d-bytebase approved these changes Jun 5, 2026

View reviewed changes

h3n4l requested a review from vsai12 June 5, 2026 06:11

h3n4l and others added 3 commits June 5, 2026 14:18

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread backend/plugin/parser/trino/query_span_extractor.go Outdated

Merge remote-tracking branch 'origin/main' into feat/trino-omni-integ…

f68d44d

…rate # Conflicts: # go.mod # go.sum

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread backend/plugin/parser/trino/query_span_extractor.go Outdated

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread backend/plugin/parser/trino/query_span_extractor.go

Merge remote-tracking branch 'origin/main' into feat/trino-omni-integ…

d51f297

…rate

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread backend/plugin/parser/trino/query_span_extractor.go Outdated

rebelice approved these changes Jun 5, 2026

View reviewed changes

h3n4l and others added 2 commits June 5, 2026 17:37

Merge remote-tracking branch 'origin/main' into feat/trino-omni-integ…

77f2f81

…rate

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread backend/plugin/parser/trino/completion.go

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread backend/plugin/parser/trino/completion.go Outdated

d-bytebase merged commit 2408050 into main Jun 5, 2026
16 checks passed

d-bytebase deleted the feat/trino-omni-integrate branch June 5, 2026 10:42

This was referenced Jun 10, 2026

feat(snowflake): switch plugin/parser, advisors, and db driver to the omni parser #20567

Open

feat: cut BigQuery + Spanner parsing to the omni googlesql parser #20569

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(trino): migrate Trino parser plugin from ANTLR to omni#20517

feat(trino): migrate Trino parser plugin from ANTLR to omni#20517
d-bytebase merged 11 commits into
mainfrom
feat/trino-omni-integrate

h3n4l commented Jun 5, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

h3n4l commented Jun 5, 2026

What changed

Verification

Documented divergences from the legacy parser (please review)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 5, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants