[FEATURE] Add Union command in PPL by srikanthpadakanti · Pull Request #5240 · opensearch-project/sql

srikanthpadakanti · 2026-03-17T16:33:26Z

Description

Add union command to PPL that implements SQL-style UNION ALL semantics with Calcite-based type coercion. The command supports combining multiple datasets (indices, patterns, aliases, or subsearches) with automatic schema merging and missing fields are filled with NULL, compatible types are coerced to a common supertype (e.g., int+float --> float), and incompatible types fall back to string. Works both as a first command and mid-pipeline, where the upstream result set is implicitly included as the first dataset.

Related Issues

Resolves #5110
#5110

Check List

[ X ] New functionality includes testing.
[ X ] New functionality has been documented.
[ X ] New functionality has javadoc added.
[ X ] New functionality has a user manual doc added.
[ X ] New PPL command checklist all confirmed.
API changes companion pull request created.
[ X ] Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-03-17T16:35:36Z

PR Reviewer Guide 🔍

(Review updated until commit `467a7d1`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: AST and Parser: Union command grammar and AST node Relevant files: core/src/main/java/org/opensearch/sql/ast/tree/Union.java core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java core/src/main/java/org/opensearch/sql/analysis/Analyzer.java ppl/src/main/antlr/OpenSearchPPLLexer.g4 ppl/src/main/antlr/OpenSearchPPLParser.g4 ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java Sub-PR theme: Calcite execution: Union planning, schema unification, and anonymizer Relevant files: core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLUnionTest.java ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java Sub-PR theme: Integration tests and documentation for Union command Relevant files: integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteUnionCommandIT.java integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java integ-test/src/test/resources/expectedOutput/calcite/explain_union.yaml integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_union.yaml docs/user/ppl/cmd/union.md docs/user/ppl/index.md docs/category.json
⚡ Recommended focus areas for review Case-Sensitive Schema Merge The `buildUnifiedSchemaForUnion` method uses field names as keys in `seenFields` map with a plain `HashMap`, which means field name matching is case-sensitive. If two datasets have the same field with different casing (e.g., `Age` vs `age`), they will be treated as distinct fields rather than merged, potentially producing unexpected NULL-filled columns instead of a unified column. private static List<SchemaField> buildUnifiedSchemaForUnion(List<RelNode> nodes) { List<SchemaField> schema = new ArrayList<>(); Map<String, RelDataType> seenFields = new HashMap<>(); for (RelNode node : nodes) { for (RelDataTypeField field : node.getRowType().getFieldList()) { if (!seenFields.containsKey(field.getName())) { schema.add(new SchemaField(field.getName(), field.getType())); seenFields.put(field.getName(), field.getType()); } } } return schema; } FLOAT/DECIMAL Rank Ordering In `getNumericTypeRankForUnion`, DECIMAL (rank 5) is ranked lower than FLOAT (rank 7) and DOUBLE (rank 8). This means INTEGER+FLOAT→FLOAT and DECIMAL+FLOAT→FLOAT, but DECIMAL is a higher-precision exact type. Coercing DECIMAL to FLOAT/DOUBLE can silently lose precision. The ranking of DECIMAL vs floating-point types should be reconsidered. private static int getNumericTypeRankForUnion(SqlTypeName typeName) { return switch (typeName) { case TINYINT -> 1; case SMALLINT -> 2; case INTEGER -> 3; case BIGINT -> 4; case DECIMAL -> 5; case REAL -> 6; case FLOAT -> 7; case DOUBLE -> 8; default -> 0; }; } EmptySource Side Effect In `visitUnion`, each dataset is first processed through `EmptySourcePropagateVisitor` and then visited with `this` (the main visitor). The `prunedDataset.accept(this, context)` call may have side effects on `context` (e.g., modifying shared state in `context.relBuilder`) between iterations. If a dataset visit fails partway through, the context may be left in a partially modified state, potentially corrupting subsequent dataset processing. public RelNode visitUnion(Union node, CalcitePlanContext context) { List<RelNode> inputNodes = new ArrayList<>(); for (UnresolvedPlan dataset : node.getDatasets()) { UnresolvedPlan prunedDataset = dataset.accept(new EmptySourcePropagateVisitor(), null); prunedDataset.accept(this, context); inputNodes.add(context.relBuilder.build()); } if (inputNodes.size() < 2) { throw new IllegalArgumentException( "Union command requires at least two datasets. Provided: " + inputNodes.size()); } List<RelNode> unifiedInputs = SchemaUnifier.buildUnifiedSchemaWithTypeCoercion(inputNodes, context); for (RelNode input : unifiedInputs) { context.relBuilder.push(input); } context.relBuilder.union(true, unifiedInputs.size()); // true = UNION ALL if (node.getMaxout() != null) { context.relBuilder.push( LogicalSystemLimit.create( LogicalSystemLimit.SystemLimitType.SUBSEARCH_MAXOUT, context.relBuilder.build(), context.relBuilder.literal(node.getMaxout()))); } return context.relBuilder.peek(); } Regex Keyword List Incomplete The `anonymizeSubsearchCommand` method builds a `keywords` regex pattern that includes the `commandName` parameter but does not include all PPL command keywords (e.g., `append`, `dedup`, `join`, `lookup`, etc.). This could cause legitimate command names to be incorrectly replaced with `identifier` in the anonymized output. String keywords = "source\|fields\|where\|stats\|head\|tail\|sort\|eval\|rename\|" + commandName + "\|search\|table\|identifier\|\\\\\\"; List<String> anonymizedSubsearches = new ArrayList<>(); for (UnresolvedPlan subsearch : subsearches) { String anonymizedSubsearch = anonymizeData(subsearch); anonymizedSubsearch = "search " + anonymizedSubsearch; anonymizedSubsearch = anonymizedSubsearch .replaceAll("\\bsource=\\w+", "source=table") .replaceAll("\\b(?!" + keywords + ")\\w+(?=\\s[<>=!])", "identifier") .replaceAll("\\b(?!" + keywords + ")\\w+(?=\\s,)", "identifier") .replaceAll("fields \\+\\s\\b(?!" + keywords + ")\\w+", "fields + identifier") .replaceAll( "fields \\+\\sidentifier,\\s\\b(?!" + keywords + ")\\w+", "fields + identifier,identifier"); anonymizedSubsearches.add(StringUtils.format("[%s]", anonymizedSubsearch)); } return StringUtils.format("\| %s %s", commandName, String.join(" ", anonymizedSubsearches)); Mutable maxout Field The `Union` class uses both `@RequiredArgsConstructor` and `@AllArgsConstructor`, with `maxout` as a non-final field. This means `maxout` is mutable after construction. Combined with `@EqualsAndHashCode`, two `Union` nodes with the same datasets but different `maxout` values will not be equal, but the field could be changed externally after creation, breaking equality contracts. Consider making `maxout` final. public class Union extends UnresolvedPlan { private final List<UnresolvedPlan> datasets; private Integer maxout; @Override public UnresolvedPlan attach(UnresolvedPlan child) { List<UnresolvedPlan> newDatasets = ImmutableList.<UnresolvedPlan>builder().add(child).addAll(datasets).build(); return new Union(newDatasets, maxout); }

github-actions · 2026-03-17T16:36:31Z

PR Code Suggestions ✨

Latest suggestions up to 467a7d1

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix temporal type widening precedence order The temporal type widening logic has an incorrect precedence: `TIMESTAMP_WITH_LOCAL_TIME_ZONE` should be wider than `TIMESTAMP`, but the current code returns `TIMESTAMP` when either type is `TIMESTAMP`, even if the other is `TIMESTAMP_WITH_LOCAL_TIME_ZONE`. This means `TIMESTAMP` + `TIMESTAMP_WITH_LOCAL_TIME_ZONE` incorrectly resolves to `TIMESTAMP` instead of `TIMESTAMP_WITH_LOCAL_TIME_ZONE`, potentially losing timezone information. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [378-387] private static SqlTypeName getWiderTemporalTypeForUnion(SqlTypeName type1, SqlTypeName type2) { - if (type1 == SqlTypeName.TIMESTAMP \|\| type2 == SqlTypeName.TIMESTAMP) { - return SqlTypeName.TIMESTAMP; - } if (type1 == SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE \|\| type2 == SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE) { return SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE; } + if (type1 == SqlTypeName.TIMESTAMP \|\| type2 == SqlTypeName.TIMESTAMP) { + return SqlTypeName.TIMESTAMP; + } return SqlTypeName.DATE; } Suggestion importance[1-10]: 7 __ Why: This is a genuine bug: when combining `TIMESTAMP` and `TIMESTAMP_WITH_LOCAL_TIME_ZONE`, the current code incorrectly returns `TIMESTAMP` instead of the wider `TIMESTAMP_WITH_LOCAL_TIME_ZONE`, potentially losing timezone information. The fix correctly reorders the checks to prioritize `TIMESTAMP_WITH_LOCAL_TIME_ZONE`.	Medium
Possible issue	Fix potential builder stack state after maxout limit When `maxout` is set, `LogicalSystemLimit` is pushed onto the builder but then `context.relBuilder.peek()` is called, which returns the top of the stack without consuming it. However, the `LogicalSystemLimit` node was pushed via `context.relBuilder.push(...)` after `context.relBuilder.build()` consumed the union node. The final `return context.relBuilder.peek()` is correct only if the limit node is on the stack, but the code path when `maxout` is null also calls `peek()` after `union()` — this is consistent. The real issue is that after `context.relBuilder.union(true, unifiedInputs.size())`, the union result is on the stack, and `peek()` returns it without removing it, which may leave stale state on the builder for subsequent pipeline steps. Consider using `build()` consistently and returning the result directly. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2629-2642] for (RelNode input : unifiedInputs) { context.relBuilder.push(input); } context.relBuilder.union(true, unifiedInputs.size()); // true = UNION ALL if (node.getMaxout() != null) { + RelNode unionNode = context.relBuilder.build(); context.relBuilder.push( LogicalSystemLimit.create( LogicalSystemLimit.SystemLimitType.SUBSEARCH_MAXOUT, - context.relBuilder.build(), + unionNode, context.relBuilder.literal(node.getMaxout()))); } return context.relBuilder.peek(); Suggestion importance[1-10]: 4 __ Why: The suggestion identifies a potential inconsistency in how the `relBuilder` stack is managed when `maxout` is set. The improved code explicitly calls `build()` before pushing the `LogicalSystemLimit`, which is cleaner and avoids potential stale state. However, the existing code already calls `context.relBuilder.build()` inside the `LogicalSystemLimit.create()` call, so the practical impact may be limited.	Low
General	Make maxout field final for equality correctness Both `@RequiredArgsConstructor` and `@AllArgsConstructor` are present. `@RequiredArgsConstructor` generates a constructor with only `final` fields (i.e., `datasets`), while `@AllArgsConstructor` generates one with all fields (`datasets` and `maxout`). This means there are two constructors: `Union(List)` and `Union(List,` `Integer)`. However, `maxout` is not `final`, so it won't be included in `@EqualsAndHashCode`, which may cause two `Union` nodes with different `maxout` values to be considered equal. Consider making `maxout` `final` for consistency and correctness. core/src/main/java/org/opensearch/sql/ast/tree/Union.java [18-27] @Getter @ToString @EqualsAndHashCode(callSuper = false) @RequiredArgsConstructor @AllArgsConstructor public class Union extends UnresolvedPlan { private final List<UnresolvedPlan> datasets; - private Integer maxout; + private final Integer maxout; Suggestion importance[1-10]: 5 __ Why: Making `maxout` `final` is a valid correctness concern since two `Union` nodes with different `maxout` values would incorrectly be considered equal by `@EqualsAndHashCode`. However, making `maxout` `final` would require removing `@RequiredArgsConstructor` or restructuring, since `@RequiredArgsConstructor` only includes `final` fields and `@AllArgsConstructor` would then cover both.	Low
General	Ensure unified schema uses coerced types The unified schema is built using the type from the first node that introduces a field, but after `coerceUnionTypes` runs, the field types in subsequent nodes may have been coerced to a common type. The schema should reflect the coerced (common) type, not just the first-seen type. This can cause type mismatches when projecting fields from nodes where the coerced type differs from the first-seen type, potentially leading to incorrect CASTs or runtime errors. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [192-205] private static List<SchemaField> buildUnifiedSchemaForUnion(List<RelNode> nodes) { List<SchemaField> schema = new ArrayList<>(); Map<String, RelDataType> seenFields = new HashMap<>(); for (RelNode node : nodes) { for (RelDataTypeField field : node.getRowType().getFieldList()) { - if (!seenFields.containsKey(field.getName())) { - schema.add(new SchemaField(field.getName(), field.getType())); - seenFields.put(field.getName(), field.getType()); + String name = field.getName(); + if (!seenFields.containsKey(name)) { + schema.add(new SchemaField(name, field.getType())); + seenFields.put(name, field.getType()); } + // After coercion, all nodes sharing a field name should have the same type; + // no update needed since coerceUnionTypes already unified them. } } return schema; } Suggestion importance[1-10]: 3 __ Why: The suggestion claims a type mismatch issue, but after `coerceUnionTypes` runs, all nodes sharing a field name already have the same coerced type. The `buildUnifiedSchemaForUnion` method correctly picks the first-seen type, which after coercion is the common type. The improved code is functionally identical to the existing code with only a comment added, making this a low-impact suggestion.	Low

Previous suggestions

Suggestions up to commit 5952ff6

Category	Suggestion	Impact
Possible issue	Fix potential builder API ordering issue with maxout literal After calling `context.relBuilder.union(true, unifiedInputs.size())`, the union node is on the builder stack. Then if `maxout` is set, `context.relBuilder.build()` pops the union node and wraps it in a `LogicalSystemLimit`, which is pushed back. However, `context.relBuilder.literal(node.getMaxout())` is called after `context.relBuilder.build()` which already consumed the top of the stack — the literal call should be evaluated before `build()` to avoid potential ordering issues with the builder API. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2634-2642] if (node.getMaxout() != null) { + RelNode unionNode = context.relBuilder.build(); + RexNode maxoutLiteral = context.relBuilder.literal(node.getMaxout()); context.relBuilder.push( LogicalSystemLimit.create( LogicalSystemLimit.SystemLimitType.SUBSEARCH_MAXOUT, - context.relBuilder.build(), - context.relBuilder.literal(node.getMaxout()))); + unionNode, + maxoutLiteral)); } return context.relBuilder.peek(); Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that `context.relBuilder.literal(node.getMaxout())` is called after `context.relBuilder.build()` pops the stack, which could cause ordering issues depending on the `RelBuilder` API implementation. Separating the `build()` call and the `literal()` call makes the intent clearer and avoids potential side effects.	Low
Possible issue	Ensure unified schema uses coerced types The unified schema is built using the type from the first node that introduces a field, but after `coerceUnionTypes` runs, the actual field types in the coerced inputs may differ from the original. The schema should be built from the coerced inputs to ensure the projected types match the actual types in the nodes being projected. Otherwise, the cast in `buildProjectionForUnion` may be applied incorrectly or skipped when it shouldn't be. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [192-205] private static List<SchemaField> buildUnifiedSchemaForUnion(List<RelNode> nodes) { List<SchemaField> schema = new ArrayList<>(); Map<String, RelDataType> seenFields = new HashMap<>(); for (RelNode node : nodes) { for (RelDataTypeField field : node.getRowType().getFieldList()) { - if (!seenFields.containsKey(field.getName())) { - schema.add(new SchemaField(field.getName(), field.getType())); - seenFields.put(field.getName(), field.getType()); + String name = field.getName(); + if (!seenFields.containsKey(name)) { + schema.add(new SchemaField(name, field.getType())); + seenFields.put(name, field.getType()); } + // If already seen, the coerceUnionTypes step should have unified the type, + // so no update needed here. } } return schema; } Suggestion importance[1-10]: 3 __ Why: The suggestion claims the schema may use pre-coercion types, but `buildUnifiedSchemaForUnion` is called on `coercedInputs` (after `coerceUnionTypes`), so the types should already be unified. The improved code is functionally identical to the existing code, making this suggestion largely a no-op.	Low
General	Guard against unexpected builder stack state per dataset Each dataset is visited using the shared `context`, which means the `relBuilder` state is shared across iterations. If visiting a dataset modifies the `relBuilder` stack in an unexpected way (e.g., leaves extra nodes), subsequent iterations or the final `union` call may operate on incorrect state. Consider saving and restoring the builder state or using a fresh context per dataset to isolate side effects. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2615-2619] for (UnresolvedPlan dataset : node.getDatasets()) { UnresolvedPlan prunedDataset = dataset.accept(new EmptySourcePropagateVisitor(), null); + int stackSizeBefore = context.relBuilder.size(); prunedDataset.accept(this, context); + if (context.relBuilder.size() != stackSizeBefore + 1) { + throw new IllegalStateException( + "Expected exactly one RelNode pushed for union dataset, but stack changed by " + + (context.relBuilder.size() - stackSizeBefore)); + } inputNodes.add(context.relBuilder.build()); } Suggestion importance[1-10]: 4 __ Why: Adding a stack size assertion is a reasonable defensive check, but `RelBuilder.size()` may not be a standard public API method, and the pattern of visiting and calling `build()` is consistent with how other commands are handled in the codebase. The improvement is minor and may not compile without verifying the API.	Low
General	Clarify single-input early return intent The early return for a single input bypasses type coercion and schema unification entirely. However, the `visitUnion` method already validates that at least 2 inputs exist before calling this method, so the `size() == 1` check is redundant. More importantly, if this method is ever called with 1 input in a future code path, it silently returns without unification, which could cause downstream issues. The validation should be consistent with the caller's contract. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [164-166] -if (inputs.isEmpty() \|\| inputs.size() == 1) { +if (inputs.isEmpty()) { + return inputs; +} +if (inputs.size() == 1) { + // Single input: no unification needed, but still return as-is return inputs; } Suggestion importance[1-10]: 1 __ Why: The improved code is functionally identical to the existing code — it just splits the condition into two separate `if` blocks with a comment. This is purely cosmetic and adds no functional value.	Low

Suggestions up to commit 3096cd4

Category	Suggestion	Impact
Possible issue	Builder stack inconsistency in maxout branch When `maxout` is null, `context.relBuilder.build()` is never called after the union, so the union `RelNode` remains on the builder stack. The method returns `context.relBuilder.peek()` which is correct for the non-maxout path, but the caller in `visitUnion` also calls `context.relBuilder.build()` to pop the result. If the builder stack is not properly managed, subsequent operations may consume a stale node. Ensure the final node is consistently left on the stack (via `peek`) or consistently consumed (via `build`) regardless of the `maxout` branch. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2629-2642] for (RelNode input : unifiedInputs) { context.relBuilder.push(input); } context.relBuilder.union(true, unifiedInputs.size()); // true = UNION ALL if (node.getMaxout() != null) { + RelNode unionNode = context.relBuilder.build(); context.relBuilder.push( LogicalSystemLimit.create( LogicalSystemLimit.SystemLimitType.SUBSEARCH_MAXOUT, - context.relBuilder.build(), + unionNode, context.relBuilder.literal(node.getMaxout()))); } return context.relBuilder.peek(); Suggestion importance[1-10]: 5 __ Why: The suggestion identifies a real potential issue: in the `maxout` branch, `context.relBuilder.build()` is called to pop the union node before passing it to `LogicalSystemLimit.create`, which is correct. The improved code makes this explicit by storing the union node in a variable first, improving clarity and correctness of the builder stack management.	Low
Possible issue	Unified schema uses pre-coercion types incorrectly The unified schema is built using the type from the first node that introduces a field, but after `coerceUnionTypes` runs, subsequent nodes may have had their field types changed to the common supertype. The schema should use the coerced (target) type rather than the original type from the first-seen node, otherwise the projection in `buildProjectionForUnion` may generate unnecessary or incorrect CASTs. Consider building the unified schema after coercion by reading the target type from the coerced inputs, or by passing the `targetTypeMap` computed in `coerceUnionTypes` directly to `buildUnifiedSchemaForUnion`. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [192-205] private static List<SchemaField> buildUnifiedSchemaForUnion(List<RelNode> nodes) { List<SchemaField> schema = new ArrayList<>(); Map<String, RelDataType> seenFields = new HashMap<>(); for (RelNode node : nodes) { for (RelDataTypeField field : node.getRowType().getFieldList()) { - if (!seenFields.containsKey(field.getName())) { - schema.add(new SchemaField(field.getName(), field.getType())); - seenFields.put(field.getName(), field.getType()); + String name = field.getName(); + if (!seenFields.containsKey(name)) { + // After coercion, all nodes sharing this field have the same type; use it directly. + schema.add(new SchemaField(name, field.getType())); + seenFields.put(name, field.getType()); } } } return schema; } Suggestion importance[1-10]: 3 __ Why: The suggestion claims that `buildUnifiedSchemaForUnion` uses pre-coercion types, but it is called with `coercedInputs` (already coerced) in `unifySchemasForUnion`. Since the inputs passed to this method are already coerced, the first-seen type will be the coerced type. The `improved_code` is essentially identical to the `existing_code` with only a comment change, making this suggestion misleading.	Low
General	Mutable field causes inconsistent object state Using both `@RequiredArgsConstructor` and `@AllArgsConstructor` together with a mix of `final` and non-`final` fields can cause confusion: `@RequiredArgsConstructor` generates a constructor with only `datasets` (the `final` field), while `@AllArgsConstructor` generates one with both `datasets` and `maxout`. However, since `maxout` is not `final`, it is mutable and can be set independently, which may lead to inconsistent state. Consider making `maxout` `final` and removing `@RequiredArgsConstructor`, or using a single explicit constructor. core/src/main/java/org/opensearch/sql/ast/tree/Union.java [18-27] @Getter @ToString @EqualsAndHashCode(callSuper = false) -@RequiredArgsConstructor @AllArgsConstructor public class Union extends UnresolvedPlan { private final List<UnresolvedPlan> datasets; + private final Integer maxout; - private Integer maxout; + public Union(List<UnresolvedPlan> datasets) { + this(datasets, null); + } Suggestion importance[1-10]: 4 __ Why: The suggestion correctly identifies that mixing `@RequiredArgsConstructor` and `@AllArgsConstructor` with a non-`final` field can be confusing. Making `maxout` `final` and using a single `@AllArgsConstructor` with a convenience constructor improves immutability and clarity of the design.	Low
General	Regex injection risk from unescaped command name When `commandName` is `"union"`, the `keywords` regex alternation includes `"union"` as a keyword to protect from replacement. However, if `commandName` is `"multisearch"`, the original code also protected `"multisearch"`. The refactored code correctly handles this, but the `keywords` string is built by concatenating `commandName` in the middle of the alternation without escaping. If a future command name contains regex special characters, this would break the pattern. Consider using `Pattern.quote(commandName)` for safety. ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [806-809] String keywords = "source\|fields\|where\|stats\|head\|tail\|sort\|eval\|rename\|" - + commandName + + java.util.regex.Pattern.quote(commandName) + "\|search\|table\|identifier\|\\\\\\*"; Suggestion importance[1-10]: 3 __ Why: While the suggestion is technically valid for future-proofing against regex special characters in command names, the current command names (`"union"` and `"multisearch"`) contain no regex special characters, making this a low-impact defensive improvement rather than a critical fix.	Low

Suggestions up to commit e03a298

Category	Suggestion	Impact
General	Include all command names in anonymizer keyword list The `keywords` regex negative lookahead is built by concatenating `commandName` directly into the regex pattern. If `commandName` contains regex special characters, this could break the pattern. Additionally, when `commandName` is `"union"`, the keyword list will include `"union"` but not `"multisearch"`, and vice versa. This means field names that happen to match the other command name could be incorrectly anonymized. Both command names should always be included in the keywords list. ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [805-809] private String anonymizeSubsearchCommand(String commandName, List<UnresolvedPlan> subsearches) { String keywords = "source\|fields\|where\|stats\|head\|tail\|sort\|eval\|rename\|" - + commandName + + "multisearch\|union" + "\|search\|table\|identifier\|\\\\\\*"; Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that when `commandName` is `"union"`, the keyword `"multisearch"` is excluded from the negative lookahead, and vice versa. This could cause field names matching the other command name to be incorrectly anonymized. Including both command names always is a valid correctness fix.	Low
	Simplify constructor annotations for clarity Both `@RequiredArgsConstructor` and `@AllArgsConstructor` are present. `@RequiredArgsConstructor` generates a constructor with only `datasets` (the `final` field), while `@AllArgsConstructor` generates one with both `datasets` and `maxout`. However, `maxout` is not `final`, so `@RequiredArgsConstructor` will only include `datasets`. This means there are two constructors: `Union(List)` and `Union(List,` `Integer)`. The `attach` method uses `new Union(newDatasets, maxout)` which calls the all-args constructor. This is fine, but having both annotations is redundant since `@AllArgsConstructor` already covers the single-field case when `maxout` is null. Consider removing `@RequiredArgsConstructor` to avoid confusion, or make `maxout` final with a default. core/src/main/java/org/opensearch/sql/ast/tree/Union.java [18-27] @Getter @ToString @EqualsAndHashCode(callSuper = false) -@RequiredArgsConstructor @AllArgsConstructor public class Union extends UnresolvedPlan { private final List<UnresolvedPlan> datasets; + private final Integer maxout; - private Integer maxout; + public Union(List<UnresolvedPlan> datasets) { + this(datasets, null); + } Suggestion importance[1-10]: 3 __ Why: Having both `@RequiredArgsConstructor` and `@AllArgsConstructor` is redundant but not incorrect since they generate different constructors. The suggestion to consolidate is a minor code quality improvement, but the current code works correctly as-is.	Low
	Preserve insertion order in unified schema map The unified schema is built using only the first occurrence of each field name, ignoring type differences across inputs. After `coerceUnionTypes`, all inputs should have the same type for shared fields, but the schema should reflect the coerced (target) type rather than the first-seen type. If coercion didn't fully align types, the projection in `buildProjectionForUnion` will cast, but the schema type used for NULL literals for missing fields may be wrong. The schema should be derived from the coerced inputs' field types to ensure correctness. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [192-205] private static List<SchemaField> buildUnifiedSchemaForUnion(List<RelNode> nodes) { List<SchemaField> schema = new ArrayList<>(); - Map<String, RelDataType> seenFields = new HashMap<>(); + Map<String, RelDataType> seenFields = new LinkedHashMap<>(); for (RelNode node : nodes) { for (RelDataTypeField field : node.getRowType().getFieldList()) { - if (!seenFields.containsKey(field.getName())) { - schema.add(new SchemaField(field.getName(), field.getType())); - seenFields.put(field.getName(), field.getType()); + String name = field.getName(); + if (!seenFields.containsKey(name)) { + seenFields.put(name, field.getType()); + schema.add(new SchemaField(name, field.getType())); } + // After coercion, all nodes should agree on the type; no update needed } } return schema; } Suggestion importance[1-10]: 2 __ Why: The `improved_code` is functionally identical to the `existing_code` - it still uses `HashMap` in the signature description but the actual logic change (using `LinkedHashMap`) is minor and the comment "After coercion, all nodes should agree on the type; no update needed" doesn't add meaningful behavior change. The suggestion's concern about type correctness is addressed by the prior `coerceUnionTypes` step, making this a very low-impact change.	Low
Possible issue	Avoid nested build inside push argument After pushing the `LogicalSystemLimit` node onto the builder, `context.relBuilder.peek()` is called to return the top node. However, `LogicalSystemLimit` was pushed directly via `context.relBuilder.push(...)` and then `context.relBuilder.build()` was called inside the push argument, which pops the union node. The `LogicalSystemLimit` is pushed but `peek()` should correctly return it. However, if `maxout` is null, `context.relBuilder.peek()` is called after `union(...)` which is correct. Verify that after the `union(true, size)` call, the builder's stack has exactly one node before `peek()` is called, since multiple `push` calls were made for each input. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2634-2642] if (node.getMaxout() != null) { + RelNode unionNode = context.relBuilder.build(); context.relBuilder.push( LogicalSystemLimit.create( LogicalSystemLimit.SystemLimitType.SUBSEARCH_MAXOUT, - context.relBuilder.build(), + unionNode, context.relBuilder.literal(node.getMaxout()))); } return context.relBuilder.peek(); Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies a potential issue with calling `context.relBuilder.build()` inside the argument to `context.relBuilder.push()`, which could cause ordering/evaluation issues. Extracting `build()` to a separate variable before passing it to `push()` is cleaner and avoids any side-effect ordering concerns.	Low

Suggestions up to commit f64bacb

Category	Suggestion	Impact
General	Fix regex keyword list to include all command names The `keywords` regex is built by concatenating `commandName` directly into the alternation pattern. If `commandName` contains regex special characters, this could break the pattern. Additionally, when `commandName` is `"union"`, the keyword list won't include `"multisearch"`, and vice versa, potentially causing incorrect anonymization of the other command name if it appears in a subsearch. Consider using a fixed, comprehensive keyword list instead. ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [805-809] private String anonymizeSubsearchCommand(String commandName, List<UnresolvedPlan> subsearches) { String keywords = "source\|fields\|where\|stats\|head\|tail\|sort\|eval\|rename\|" - + commandName - + "\|search\|table\|identifier\|\\\\\\"; + + "multisearch\|union\|search\|table\|identifier\|\\\\\\"; Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that the dynamic `keywords` string won't include `"multisearch"` when processing a `"union"` command and vice versa, which could cause incorrect anonymization. Using a fixed comprehensive keyword list is a valid improvement for correctness and maintainability.	Low
	Isolate context per union dataset branch Each dataset is visited using `context`, which is a shared mutable object. If `EmptySourcePropagateVisitor` or the visitor modifies shared state in `context` (such as `relBuilder` stack or other fields), processing one dataset may corrupt the state for subsequent datasets. Consider creating a fresh or isolated context for each dataset to avoid cross-contamination between union branches. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2615-2619] for (UnresolvedPlan dataset : node.getDatasets()) { UnresolvedPlan prunedDataset = dataset.accept(new EmptySourcePropagateVisitor(), null); - prunedDataset.accept(this, context); - inputNodes.add(context.relBuilder.build()); + CalcitePlanContext datasetContext = context.createSubContext(); + prunedDataset.accept(this, datasetContext); + inputNodes.add(datasetContext.relBuilder.build()); } Suggestion importance[1-10]: 4 __ Why: The concern about shared mutable context is valid in principle, but the `improved_code` references a `context.createSubContext()` method that may not exist in the codebase. The existing code uses `context.relBuilder.build()` to pop the built node, which is the standard pattern used elsewhere in the visitor, so the risk may be lower than suggested.	Low
	Prevent accidental null maxout in constructors Both `@RequiredArgsConstructor` and `@AllArgsConstructor` are present. `@RequiredArgsConstructor` generates a constructor with only `final` fields (i.e., `datasets`), while `@AllArgsConstructor` generates one with both `datasets` and `maxout`. However, `maxout` is not `final`, so `@RequiredArgsConstructor` produces a constructor that doesn't set `maxout`, leaving it always `null` when using that constructor. This is intentional for the no-maxout case, but the `attach` method always uses the two-arg constructor, which is correct. Verify that the single-arg constructor usage is intentional and won't silently ignore a `maxout` value set elsewhere. core/src/main/java/org/opensearch/sql/ast/tree/Union.java [18-27] @Getter @ToString @EqualsAndHashCode(callSuper = false) -@RequiredArgsConstructor @AllArgsConstructor public class Union extends UnresolvedPlan { private final List<UnresolvedPlan> datasets; + private final Integer maxout; - private Integer maxout; + public Union(List<UnresolvedPlan> datasets) { + this(datasets, null); + } Suggestion importance[1-10]: 3 __ Why: The suggestion to make `maxout` final and use a single canonical constructor is a reasonable design improvement, but the current dual-annotation approach (`@RequiredArgsConstructor` + `@AllArgsConstructor`) is intentional and functional. The `attach` method correctly uses the two-arg constructor, so there's no actual bug here.	Low
Possible issue	Ensure unified schema uses coerced types The unified schema is built using the type from the first node that introduces a field, but after `coerceUnionTypes` runs, the field types in subsequent nodes may have been coerced to a common type. The schema should reflect the coerced (common) type, not just the first-seen type. This can cause type mismatches when projecting fields from nodes where the coerced type differs from the first-seen type, leading to incorrect CASTs or runtime errors. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [192-205] private static List<SchemaField> buildUnifiedSchemaForUnion(List<RelNode> nodes) { List<SchemaField> schema = new ArrayList<>(); Map<String, RelDataType> seenFields = new HashMap<>(); for (RelNode node : nodes) { for (RelDataTypeField field : node.getRowType().getFieldList()) { - if (!seenFields.containsKey(field.getName())) { - schema.add(new SchemaField(field.getName(), field.getType())); - seenFields.put(field.getName(), field.getType()); + String name = field.getName(); + if (!seenFields.containsKey(name)) { + schema.add(new SchemaField(name, field.getType())); + seenFields.put(name, field.getType()); } + // After coercion, all nodes sharing a field name should have the same type already. + // No update needed since coerceUnionTypes already aligned types. } } return schema; } Suggestion importance[1-10]: 3 __ Why: The suggestion claims a type mismatch issue, but after `coerceUnionTypes` runs, all nodes sharing a field name already have the same coerced type. The `buildUnifiedSchemaForUnion` method correctly picks the first-seen type, which after coercion is the common type. The 'improved_code' is functionally identical to the 'existing_code' with only a comment added, making this a low-impact suggestion.	Low

Suggestions up to commit 811fc25

Category	Suggestion	Impact
General	Use complete keyword set in anonymizer regex The `keywords` regex is built by concatenating `commandName` directly into the alternation pattern. If `commandName` contains regex special characters (unlikely for current values like "union" or "multisearch", but possible for future commands), this could break the regex. Additionally, when `visitMultisearch` calls this method, the keyword list will include "multisearch" but not "union", and vice versa — meaning the other command name could be incorrectly anonymized as an identifier if it appears as a field name. Consider using a fixed set of all known command keywords rather than dynamically inserting only the current command name. ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [805-809] private String anonymizeSubsearchCommand(String commandName, List<UnresolvedPlan> subsearches) { String keywords = "source\|fields\|where\|stats\|head\|tail\|sort\|eval\|rename\|" - + commandName - + "\|search\|table\|identifier\|\\\\\\"; + + "multisearch\|union\|search\|table\|identifier\|\\\\\\"; Suggestion importance[1-10]: 5 __ Why: The suggestion is valid — using a fixed complete set of command keywords prevents cross-command anonymization issues and avoids potential regex injection. The improved code correctly includes both `multisearch` and `union` in the keyword list regardless of which command is being processed.	Low
	Make maxout field immutable for consistency Using both `@RequiredArgsConstructor` and `@AllArgsConstructor` together with a mix of `final` and non-final fields creates two constructors: one with only `datasets` (required) and one with both `datasets` and `maxout`. However, `maxout` is non-final and mutable, which can lead to inconsistent state if the field is modified after construction. Consider making `maxout` final and using only `@AllArgsConstructor`, or using a builder pattern to avoid ambiguity. core/src/main/java/org/opensearch/sql/ast/tree/Union.java [18-27] @Getter @ToString @EqualsAndHashCode(callSuper = false) -@RequiredArgsConstructor @AllArgsConstructor public class Union extends UnresolvedPlan { private final List<UnresolvedPlan> datasets; + private final Integer maxout; - private Integer maxout; - Suggestion importance[1-10]: 4 __ Why: Making `maxout` final and removing `@RequiredArgsConstructor` in favor of only `@AllArgsConstructor` is a valid immutability improvement. The current design with both annotations and a non-final field is somewhat inconsistent, though it works functionally since `attach()` creates a new `Union` instance.	Low
	Ensure unified schema uses coerced common types The unified schema is built using the type from the first node that introduces a field, but after `coerceUnionTypes` runs, the field types in subsequent nodes may have been coerced to a common supertype. The schema should reflect the coerced (common) type, not just the first-seen type. This can cause type mismatches when projecting fields in `buildProjectionForUnion`, leading to unnecessary or incorrect CASTs. Consider building the unified schema by looking up the target type from the coercion map (or by comparing all nodes' types for each field) rather than using the first-seen type. core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java [192-205] private static List<SchemaField> buildUnifiedSchemaForUnion(List<RelNode> nodes) { List<SchemaField> schema = new ArrayList<>(); Map<String, RelDataType> seenFields = new HashMap<>(); for (RelNode node : nodes) { for (RelDataTypeField field : node.getRowType().getFieldList()) { - if (!seenFields.containsKey(field.getName())) { - schema.add(new SchemaField(field.getName(), field.getType())); - seenFields.put(field.getName(), field.getType()); + String name = field.getName(); + if (!seenFields.containsKey(name)) { + schema.add(new SchemaField(name, field.getType())); + seenFields.put(name, field.getType()); } + // After coercion, all nodes sharing a field name should have the same type; + // the first-seen type is the coerced common type. } } return schema; } Suggestion importance[1-10]: 2 __ Why: The suggestion claims a bug where the first-seen type is used instead of the coerced type, but after `coerceUnionTypes` runs, all nodes sharing a field name already have the same coerced type. So the first-seen type IS the coerced type. The `improved_code` is functionally identical to the `existing_code`, making this suggestion incorrect.	Low
Possible issue	Fix inconsistent relBuilder state after union visit After calling `context.relBuilder.build()` inside the `if (node.getMaxout() != null)` block, the built node is passed to `LogicalSystemLimit.create` but then pushed back via `context.relBuilder.push(...)`. However, `context.relBuilder.peek()` is called at the end regardless of whether maxout was applied. If `maxout` is null, `context.relBuilder.peek()` returns the union node, which is correct. But if `maxout` is not null, the `LogicalSystemLimit` node is pushed but never built — `peek()` returns it correctly only if `push` was called. This is fine, but the final `return` `context.relBuilder.peek()` is inconsistent with the pattern used elsewhere (which typically calls `build()`). More critically, callers of `visitUnion` in `visitUnion` use `context.relBuilder.build()` immediately after, so the peek/build mismatch could cause double-build issues. Ensure the method consistently returns the top node without leaving the builder in an ambiguous state, by returning `context.relBuilder.build()` and having the caller re-push if needed, or by aligning with the existing visitor pattern. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2629-2642] for (RelNode input : unifiedInputs) { context.relBuilder.push(input); } context.relBuilder.union(true, unifiedInputs.size()); // true = UNION ALL if (node.getMaxout() != null) { context.relBuilder.push( LogicalSystemLimit.create( LogicalSystemLimit.SystemLimitType.SUBSEARCH_MAXOUT, context.relBuilder.build(), context.relBuilder.literal(node.getMaxout()))); } -return context.relBuilder.peek(); +return context.relBuilder.build(); Suggestion importance[1-10]: 5 __ Why: The suggestion to use `context.relBuilder.build()` instead of `peek()` is a valid concern about consistency with the visitor pattern. Looking at the caller in `visitUnion` which calls `context.relBuilder.build()` after `prunedDataset.accept(this, context)`, using `peek()` vs `build()` could cause issues if the caller tries to build again. This is a moderate correctness concern.	Low

github-actions · 2026-03-17T16:50:06Z

Persistent review updated to latest commit 5fd5e1c

ppl/src/main/antlr/OpenSearchPPLParser.g4

ahkcs · 2026-04-01T19:26:10Z

Please also take a look at the merge conflicts

github-actions · 2026-04-02T03:00:50Z

Persistent review updated to latest commit 23f420d

github-actions · 2026-04-02T03:14:34Z

Persistent review updated to latest commit b6879f9

srikanthpadakanti · 2026-04-02T03:15:15Z

Please also take a look at the merge conflicts

@ahkcs Done. The PR was opened earlier and the branch was behind by 24 commits. It has now been updated. Could you please review it before any new conflicts arise?

github-actions · 2026-04-02T03:24:07Z

Persistent review updated to latest commit 811fc25

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

github-actions · 2026-04-02T03:40:18Z

Persistent review updated to latest commit f64bacb

github-actions · 2026-04-02T18:08:52Z

Persistent review updated to latest commit e03a298

github-actions · 2026-04-05T22:10:53Z

Persistent review updated to latest commit 3096cd4

srikanthpadakanti · 2026-04-06T18:08:55Z

Hello @ahkcs Can you please review this. Thanks.

srikanthpadakanti · 2026-04-06T21:48:13Z

@mengweieric @LantaoJin Please review this, so that it can be merged before I get conflicts. Thanks

core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

github-actions · 2026-04-10T03:17:53Z

Persistent review updated to latest commit 5952ff6

srikanthpadakanti · 2026-04-10T03:47:32Z

@mengweieric After merging the upstream changes to my branch, lintchecker is failing and looks like this is missing Related files: scripts/comment-on-duplicates.sh

github-actions · 2026-04-10T17:25:11Z

Persistent review updated to latest commit 467a7d1

srikanthpadakanti force-pushed the feature/union-public branch from 7ab57cb to 5fd5e1c Compare March 17, 2026 16:47

srikanthpadakanti force-pushed the feature/union-public branch 2 times, most recently from dcbc60f to 12b2eb5 Compare March 17, 2026 16:59

ahkcs reviewed Apr 1, 2026

View reviewed changes

ppl/src/main/antlr/OpenSearchPPLParser.g4 Outdated Show resolved Hide resolved

srikanthpadakanti requested a review from songkant-aws as a code owner April 2, 2026 02:58

srikanthpadakanti requested a review from ahkcs April 2, 2026 03:16

Srikanth Padakanti added 3 commits April 1, 2026 22:37

PPL: add union command with type coercion and UNION ALL semantics

5f711e7

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

Address PR review comments and resolve merge conflicts

5206c2b

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

Apply spotless formatting

f64bacb

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

srikanthpadakanti force-pushed the feature/union-public branch from 811fc25 to f64bacb Compare April 2, 2026 03:38

Merge remote-tracking branch 'upstream/main' into feature/union-public

e03a298

Merge remote-tracking branch 'upstream/main' into feature/union-public

3096cd4

ahkcs approved these changes Apr 6, 2026

View reviewed changes

mengweieric requested changes Apr 9, 2026

View reviewed changes

mengweieric self-requested a review April 9, 2026 22:44

mengweieric reviewed Apr 9, 2026

View reviewed changes

core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java Show resolved Hide resolved

core/src/main/java/org/opensearch/sql/calcite/SchemaUnifier.java Show resolved Hide resolved

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java Show resolved Hide resolved

mengweieric self-requested a review April 9, 2026 22:47

Merge remote-tracking branch 'upstream/main' into feature/union-public

5952ff6

Merge branch 'opensearch-project:main' into feature/union-public

467a7d1

ahkcs approved these changes Apr 10, 2026

View reviewed changes

Conversation

srikanthpadakanti commented Mar 17, 2026

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 467a7d1)

Uh oh!

github-actions bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

Uh oh!

ahkcs commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

srikanthpadakanti commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 5, 2026

Uh oh!

srikanthpadakanti commented Apr 6, 2026

Uh oh!

srikanthpadakanti commented Apr 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

srikanthpadakanti commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Mar 17, 2026 •

edited

Loading

(Review updated until commit `467a7d1`)

github-actions bot commented Mar 17, 2026 •

edited

Loading

srikanthpadakanti commented Apr 2, 2026 •

edited

Loading