Skip to content

feat(trino): catalog-aware masking lineage — wire GetQuerySpanWithCatalog (BYT-9674..9680)#20565

Merged
d-bytebase merged 2 commits into
mainfrom
feat/trino-omni-lineage-bump
Jun 10, 2026
Merged

feat(trino): catalog-aware masking lineage — wire GetQuerySpanWithCatalog (BYT-9674..9680)#20565
d-bytebase merged 2 commits into
mainfrom
feat/trino-omni-lineage-bump

Conversation

@h3n4l

@h3n4l h3n4l commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Activates the omni-side fixes for the seven audited Trino under-masking vectors (BYT-9674..BYT-9680, sub-issues of BYT-9142). Columns reaching a sensitive base column through indirection — derived tables, CTEs, UNNEST, scalar subqueries, set-operation arms, views, and SELECT * over derived relations — previously had empty or width-wrong lineage, so the positional result masker fell through to NoneMasker (or slid) and returned values unmasked.

-- masking policy on customer.phone — ALL of these previously leaked:
SELECT d.x FROM (SELECT phone AS x FROM customer) d;
WITH w AS (SELECT phone AS pp FROM customer) SELECT pp FROM w;
SELECT (SELECT phone FROM customer LIMIT 1) AS sp FROM customer;
SELECT name FROM customer UNION SELECT phone FROM customer;
SELECT * FROM (SELECT phone, name FROM customer) d;   -- positional slide
SELECT phone FROM customer_v;                          -- view
SELECT t.p FROM customer CROSS JOIN UNNEST(phones) AS t(p);

Changes

Safety model

omni resolution is additive (written refs always retained → masking only broadens) and stars expand only when provably width/order-correct — anything uncertain (stale metadata, USING/NATURAL coalescing, unknown relations) stays opaque and this extractor's existing metadata expansion applies as before. With no metadata available, omni behaves byte-identically to the previous catalog-less call.

Verification

Cross-review

Every layer of this work was adversarially reviewed by an independent agent (Codex) in a find→fix→re-verify loop — the omni PRs (#286/#296/#295) each went through it, and this wiring patch did too: it surfaced the cross-catalog view gap (fixed via transitive catalog loading + regression test) and confirmed the USING-join residual analysis below.

Notes for review

  1. Access-control visibility change: omni now surfaces a view definition's base tables in AccessTables, so access checks also consider tables read through views. Flagging for product judgment.
  2. Pre-existing residual (unchanged): the USING/NATURAL-join opaque-star positional trap predates this PR; the newly-appended view-def tables land at the end of AccessTables, beyond the executed result width, so they are never indexed by the masker — verified independently in review.

🤖 Generated with Claude Code

@h3n4l h3n4l requested a review from a team as a code owner June 10, 2026 06:46
@cla-bot cla-bot Bot added the cla-signed label Jun 10, 2026
@socket-security

socket-security Bot commented Jun 10, 2026

Copy link
Copy Markdown

No dependency changes detected. Learn more about Socket for GitHub.

👍 No dependency changes detected in pull request

@d-bytebase d-bytebase requested a review from rebelice June 10, 2026 06:57
h3n4l and others added 2 commits June 10, 2026 16:20
…SpanWithCatalog

Activates the omni lineage fixes for the seven audited Trino under-masking
vectors (BYT-9674..BYT-9680, sub-issues of BYT-9142): columns reaching a
sensitive base column through derived tables, CTEs, UNNEST, scalar subqueries,
set-operation arms, views, and SELECT * over derived relations previously had
empty/wrong lineage, so the result masker fell through to NoneMasker and
returned the values unmasked.

- Bump github.com/bytebase/omni to v0.0.0-20260610061900 (bytebase/omni#286
  additive lineage resolver, #296 provably-width-correct star expansion, #295
  catalog-aware view lineage).
- The query-span extractor now calls analysis.GetQuerySpanWithCatalog with a
  catalog built from instance metadata: views carry their defining query (so
  lineage through a view reaches the base-table columns masking config attaches
  to) and tables carry their column lists (so omni expands SELECT * to the
  exact projection — the positional masker stays aligned). Catalogs load
  lazily and TRANSITIVELY: a view definition referencing another catalog pulls
  that catalog in, so cross-catalog views resolve. Metadata fetches reuse the
  extractor's cache.
- completion.go fills catalog.View.Definition too (shared catalog model).
- IsPlainField now keys on the mapped physical source set (the additive
  resolver restates a plain column as written + qualified refs, which dedupe
  back to one column); inert for Trino, documented drift for repeated-column
  expressions.

Tests: 7 consumer-level audit regressions (query_span_lineage_test.go, one per
leak vector — including the SELECT*-over-derived positional repro pinned at
exact width/order) + 8 view-lineage tests carried from the superseded #20560
plus a new cross-catalog view test. Full trino parser/schema + api/v1 suites
green.

Notes for review:
- omni now surfaces a view definition's base tables in AccessTables, so
  access checks also consider tables read through views (visibility change).
- The pre-existing USING/NATURAL-join opaque-star positional residual is
  unchanged: appended view-def tables land at the END of AccessTables, beyond
  the executed result width, so they are never indexed by the masker.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…span catalog builders

SonarCloud flagged the span-catalog builder as a structural clone of the
completion builder (26.1% duplication on new code). One helper now populates a
catalog's schemas/tables/views (with definitions) for both paths; the span
builder keeps its transitive worklist, completion its per-keystroke shape.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@h3n4l h3n4l force-pushed the feat/trino-omni-lineage-bump branch from 4028130 to e258f3f Compare June 10, 2026 08:20
@h3n4l h3n4l changed the title feat(trino): catalog-aware masking lineage — bump omni, wire GetQuerySpanWithCatalog (BYT-9674..9680) feat(trino): catalog-aware masking lineage — wire GetQuerySpanWithCatalog (BYT-9674..9680) Jun 10, 2026
@h3n4l

h3n4l commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

Rebased onto main: the omni version this needs (≥05728e86) is now already pinned by main via #20568, so the go.mod/go.sum changes dropped out — this PR is purely the catalog wiring + regression tests (4 files). Title updated accordingly. All trino/schema suites green on the rebase.

@sonarqubecloud

Copy link
Copy Markdown

@d-bytebase d-bytebase requested a review from vsai12 June 10, 2026 08:51
@d-bytebase d-bytebase merged commit 8c17ad0 into main Jun 10, 2026
16 checks passed
@d-bytebase d-bytebase deleted the feat/trino-omni-lineage-bump branch June 10, 2026 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants