Skip to content

Support GROUP BY ALL in relational queries#17380

Open
Eliaaazzz wants to merge 2 commits intoapache:masterfrom
Eliaaazzz:f_group_by_all
Open

Support GROUP BY ALL in relational queries#17380
Eliaaazzz wants to merge 2 commits intoapache:masterfrom
Eliaaazzz:f_group_by_all

Conversation

@Eliaaazzz
Copy link
Copy Markdown

Addresses #17337

Description

This PR adds support for GROUP BY ALL in Table Model relational queries.

GROUP BY ALL infers grouping keys from non-aggregate and non-window expressions in the SELECT list, which improves query ergonomics and reduces duplicated grouping expressions in common aggregation queries.

This PR also keeps date_bin_gapfill(...) behavior consistent under GROUP BY ALL, including existing validation for multiple gapfill expressions.


This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage.

Key changed/added classes (or packages if there are too many classes) in this PR
  • iotdb-core/relational-grammar/src/main/antlr4/org/apache/iotdb/db/relational/grammar/sql/RelationalSql.g4
  • iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/sql/parser/AstBuilder.java
  • iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/sql/ast/GroupBy.java
  • iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/analyzer/StatementAnalyzer.java
  • iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/sql/util/SqlFormatter.java
  • iotdb-core/datanode/src/test/java/org/apache/iotdb/db/queryengine/plan/relational/analyzer/GroupByAllTest.java

Tested with:

  • ./mvnw.cmd -pl iotdb-core/datanode -Dtest=GroupByAllTest test

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Table Model support for GROUP BY ALL, allowing grouping keys to be inferred from non-aggregate and non-window expressions in the SELECT list, while preserving existing date_bin_gapfill(...) semantics/validation.

Changes:

  • Extend relational SQL grammar + AST building to parse GROUP BY ALL and represent it in the AST (GroupBy.isAll).
  • Update semantic analysis to infer grouping keys for GROUP BY ALL, including date_bin_gapfill handling.
  • Update SQL formatting and add unit tests covering GROUP BY ALL behavior and edge cases.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
iotdb-core/relational-grammar/src/main/antlr4/org/apache/iotdb/db/relational/grammar/sql/RelationalSql.g4 Adds GROUP BY ALL syntax alternative in the grammar.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/sql/parser/AstBuilder.java Builds GroupBy nodes with an isAll flag when GROUP BY ALL is used.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/sql/ast/GroupBy.java Adds isAll to the AST node to represent GROUP BY ALL.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/analyzer/StatementAnalyzer.java Implements grouping-key inference logic for GROUP BY ALL during analysis.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/sql/util/SqlFormatter.java Formats GROUP BY ALL back to SQL text when present in the AST.
iotdb-core/datanode/src/test/java/org/apache/iotdb/db/queryengine/plan/relational/analyzer/GroupByAllTest.java Adds test coverage for inferred grouping, global aggregation, gapfill integration, and backward-compat parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

continue;
}

analyzeExpression(outputExpression, scope);
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In analyzeGroupByAll(), each inferred grouping expression comes from outputExpressions, which have already been analyzed in analyzeSelectSingleColumn/analyzeSelectAllColumns earlier in visitQuerySpecification. Calling analyzeExpression(outputExpression, scope) again here is redundant and can add avoidable analyzer work (and risks duplicating any side-effects if analyzeExpression isn’t strictly idempotent). Consider relying on the existing analysis state (e.g., use analysis.getColumnReferenceFields()/analysis.getType directly) and only analyze when the expression hasn’t been analyzed yet.

Suggested change
analyzeExpression(outputExpression, scope);

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +96
// The optimizer pushes aggregation into the table scan, producing AggregationTableScanNode.
// Verify by walking the plan tree.
PlanTester planTester = new PlanTester();
LogicalQueryPlan plan =
planTester.createPlan(
"SELECT tag1, tag2, tag3, count(s2) FROM table1 GROUP BY ALL");

PlanNode root = plan.getRootNode();
AggregationTableScanNode aggScan = findNode(root, AggregationTableScanNode.class);
assertTrue(
"Expected AggregationTableScanNode with grouping keys [tag1, tag2, tag3]",
aggScan != null
&& aggScan.getGroupingKeys().stream()
.map(s -> s.getName())
.collect(java.util.stream.Collectors.toSet())
.containsAll(ImmutableSet.of("tag1", "tag2", "tag3")));
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test hard-requires the optimizer to produce an AggregationTableScanNode. That’s an implementation detail and can change with planner/optimizer rules, making the test unnecessarily brittle. Consider asserting the inferred grouping keys via PlanMatchPattern (e.g., match either AggregationTableScanNode or an AggregationNode over a TableScan), similar to other analyzer tests in this module.

Suggested change
// The optimizer pushes aggregation into the table scan, producing AggregationTableScanNode.
// Verify by walking the plan tree.
PlanTester planTester = new PlanTester();
LogicalQueryPlan plan =
planTester.createPlan(
"SELECT tag1, tag2, tag3, count(s2) FROM table1 GROUP BY ALL");
PlanNode root = plan.getRootNode();
AggregationTableScanNode aggScan = findNode(root, AggregationTableScanNode.class);
assertTrue(
"Expected AggregationTableScanNode with grouping keys [tag1, tag2, tag3]",
aggScan != null
&& aggScan.getGroupingKeys().stream()
.map(s -> s.getName())
.collect(java.util.stream.Collectors.toSet())
.containsAll(ImmutableSet.of("tag1", "tag2", "tag3")));
// Assert the inferred grouping keys and aggregation via PlanMatchPattern,
// without depending on a specific physical node implementation.
PlanTester planTester = new PlanTester();
LogicalQueryPlan plan =
planTester.createPlan(
"SELECT tag1, tag2, tag3, count(s2) FROM table1 GROUP BY ALL");
assertPlan(
plan,
output(
aggregation(
singleGroupingSet("tag1", "tag2", "tag3"),
ImmutableMap.of(
Optional.empty(),
aggregationFunction("count", ImmutableList.of("s2"))),
ImmutableList.of(),
Optional.empty(),
SINGLE,
tableScan(
"testdb.table1",
ImmutableList.of("tag1", "tag2", "tag3", "s2"),
ImmutableSet.of("tag1", "tag2", "tag3", "s2")))));

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@JackieTien97 JackieTien97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add IT in integration-test module

Comment on lines +2737 to +2742
extractAggregateFunctions(ImmutableList.of(outputExpression));
List<FunctionCall> windowFunctions =
extractWindowFunctions(ImmutableList.of(outputExpression));
if (!aggregates.isEmpty() || !windowFunctions.isEmpty()) {
continue;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about select s1+avg(s2) from t1 group by all? You can try what will duckdb deal with this sql, and IMO, group by all should equal to group by s1 which means group by all should include all non-agg expression in select including sub-expression inside one select item.

Copy link
Copy Markdown
Author

@Eliaaazzz Eliaaazzz Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ed29a53. Now for expressions containing both aggregate and non-aggregate parts, extract column references outside aggregate/window function boundaries instead of skipping the whole expression.

For example, SELECT s1 + avg(s2) FROM t1 GROUP BY ALL is now equivalent to GROUP BY s1.

Implementation: added extractNonAggregateColumnReferences in ExpressionTreeUtils — it walks the expression tree but stops recursing into aggregate/window function subtrees, collecting only the Identifier/DereferenceExpression nodes outside those boundaries.

Also added corresponding unit tests and integration tests to cover this case.

…add IT

- Extract non-aggregate column references from mixed expressions
  (e.g. s1 + avg(s2) -> GROUP BY s1) instead of skipping them
- Add extractNonAggregateColumnReferences in ExpressionTreeUtils
- Add unit tests for mixed expression scenarios
- Add integration tests for GROUP BY ALL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants