Skip to content

Conversation

@chelsea-lin
Copy link
Contributor

@chelsea-lin chelsea-lin commented Feb 10, 2026

No description provided.

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Feb 10, 2026
@chelsea-lin chelsea-lin marked this pull request as ready for review February 10, 2026 00:07
@chelsea-lin chelsea-lin requested review from a team as code owners February 10, 2026 00:07
new_conds = tuple(
(
l_cond.remap_column_refs(
new_child_mappings[0], allow_partial_bindings=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need allow_partial_bindings=True ? Doesn't the rewriter remap all variables? Also aren't columns from left and right unique? so why can't just remap_refs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we don't need to allow_partial_bindings here. The failed tpc-h benchmark shows the left and right columns are not unique so the remapped conditions get errors when the column id is not found.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I would like to walk through this case, the JoinNode really depends on left and right having unique ids, so if this constraint is broken, I would like to understand why

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, the propagated mapping may have redundancy. Updated with new commit.

@chelsea-lin chelsea-lin changed the title refactor: improve variable remapping for join and in nodes refactor: ensure only valid IDs are propagated in identifier remapping Feb 11, 2026
Corrected remap_variables to only propagate column IDs that are actually present in the current node's output fields. This prevents parent nodes from seeing internal or leaked column IDs from child nodes, which was specifically problematic for aggregate nodes.

Added unit tests to verify correct propagation for AggregateNode with and without grouping.
@chelsea-lin chelsea-lin enabled auto-merge (squash) February 11, 2026 20:56
@chelsea-lin chelsea-lin merged commit fbaba0b into main Feb 11, 2026
20 of 24 checks passed
@chelsea-lin chelsea-lin deleted the main_chelsealin_perf branch February 11, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants