fix: HashJoin panic with dictionary-encoded columns in multi-key joins by Tim-53 · Pull Request #20441 · apache/datafusion

Tim-53 · 2026-02-20T00:15:44Z

Which issue does this PR close?

Closes Panic in HashJoin with dictionary-encoded column in multi-column join key #20437

Rationale for this change

flatten_dictionary_array returned only the unique values rather then the full expanded array when being called on a DictionaryArray. When building a StructArray this caused a length mismatch panic.

What changes are included in this PR?

Replaced array.values() with arrow::compute::cast(array, value_type) in flatten_dictionary_array, which properly expands the dictionary into a full length array matching the row count.

Are these changes tested?

Yes, both a new unit test aswell as a regression test were added.

Are there any user-facing changes?

Nope

jonathanc-n

This looks good, just some small comments

jonathanc-n · 2026-02-20T05:59:37Z

datafusion/core/tests/sql/joins.rs

 }
+
+// Issue #20437: https://github.com/apache/datafusion/issues/20437
+#[tokio::test]


We want to keep unit tests at a minimum when possible. Use sqllogictests instead here

adriangb · 2026-02-22T12:53:11Z

datafusion/physical-plan/src/joins/hash_join/inlist_builder.rs

-fn flatten_dictionary_array(array: &ArrayRef) -> ArrayRef {
-    downcast_dictionary_array! {
-        array => {
+fn flatten_dictionary_array(array: &ArrayRef) -> Result<ArrayRef> {


Shall we rename the function?

It seems to me like it still flattens dictionaries. why would we rename it?

alamb

Thank you @Tim-53 and @adriangb -- I think this PR is clearly better than the prior version

alamb · 2026-02-23T17:53:20Z

datafusion/physical-plan/src/joins/hash_join/inlist_builder.rs

-fn flatten_dictionary_array(array: &ArrayRef) -> ArrayRef {
-    downcast_dictionary_array! {
-        array => {
+fn flatten_dictionary_array(array: &ArrayRef) -> Result<ArrayRef> {


It seems to me like it still flattens dictionaries. why would we rename it?

alamb · 2026-02-23T18:39:43Z

datafusion/physical-plan/src/joins/hash_join/inlist_builder.rs

    }
+
+    #[test]
+    fn test_build_multi_column_inlist_with_dictionary() {


I am not sure this unit test adds much value -- it just basically reiterates how the current function works (it is testing some intermediate state

I double checked that the .slt test fails without the code in this PR

alamb · 2026-02-23T18:46:39Z

datafusion/physical-plan/src/joins/hash_join/inlist_builder.rs

+fn flatten_dictionary_array(array: &ArrayRef) -> Result<ArrayRef> {
+    match array.data_type() {
+        DataType::Dictionary(_, value_type) => {
+            let casted = cast(array, value_type)?;


I messed around with this PR and I don't really understand why the code is flattening arrays at all

I removed the flattening code entirely and all the code seems to pass. I'll make a follow on PR

Just deleting the code seems to have worked

fix: HashJoin panic with String dictionary keys (don't flatten keys) #20505

I should have followed my instincts a bit more. It turns out we found another issue with this code at InfluxData

INNER JOIN with dictionary keys fails when run on parquet with pushdown_filters = true. #20696

Thankfully the following PR fixes it:

fix: HashJoin panic with String dictionary keys (don't flatten keys) #20505

alamb · 2026-02-24T12:10:09Z

I also made a backport PR here

[branch-52] fix: HashJoin panic with dictionary-encoded columns in multi-key joins (#20441) #20512

@Tim-53

…lti-key joins (#20441) (#20512) - part of #20287 - Closes #20437 on branch-52 - Back port of #20441 from @Tim-53 / @adriangb Made using ```shell git cherry-pick ffc5b55 git cherry-pick a18be6f ``` --------- Co-authored-by: Tim-53 <82676248+Tim-53@users.noreply.github.com>

apache#20441) - Closes apache#20437 `flatten_dictionary_array` returned only the unique values rather then the full expanded array when being called on a `DictionaryArray`. When building a `StructArray` this caused a length mismatch panic. Replaced `array.values()` with `arrow::compute::cast(array, value_type)` in `flatten_dictionary_array`, which properly expands the dictionary into a full length array matching the row count. Yes, both a new unit test aswell as a regression test were added. Nope --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

#20505) ## Which issue does this PR close? - Fixes #20696 - Follow on to #20441 ## Rationale for this change #20441 (review) fixes the special case DictionaryArray handling in Joins. However, I don't think we need to special case DictionaryArrays at all ## What changes are included in this PR? 1. Remove the special case dictionary handling ## Are these changes tested? Yes by CI ## Are there any user-facing changes? No (though maybe some queries get faster)

apache#20505) ## Which issue does this PR close? - Fixes apache#20696 - Follow on to apache#20441 ## Rationale for this change apache#20441 (review) fixes the special case DictionaryArray handling in Joins. However, I don't think we need to special case DictionaryArrays at all ## What changes are included in this PR? 1. Remove the special case dictionary handling ## Are these changes tested? Yes by CI ## Are there any user-facing changes? No (though maybe some queries get faster)

apache#20505) - Fixes apache#20696 - Follow on to apache#20441 apache#20441 (review) fixes the special case DictionaryArray handling in Joins. However, I don't think we need to special case DictionaryArrays at all 1. Remove the special case dictionary handling Yes by CI No (though maybe some queries get faster)

apache#20505) ## Which issue does this PR close? - Fixes apache#20696 - Follow on to apache#20441 ## Rationale for this change apache#20441 (review) fixes the special case DictionaryArray handling in Joins. However, I don't think we need to special case DictionaryArrays at all ## What changes are included in this PR? 1. Remove the special case dictionary handling ## Are these changes tested? Yes by CI ## Are there any user-facing changes? No (though maybe some queries get faster)

github-actions bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Feb 20, 2026

fix: HashJoin panic with dictionary-encoded columns in multi-key joins

ffc5b55

Tim-53 force-pushed the fix-20437-hash-join-dictionary-panic branch from b1328a2 to ffc5b55 Compare February 20, 2026 00:18

jonathanc-n reviewed Feb 20, 2026

View reviewed changes

Tim-53 added 2 commits February 20, 2026 16:28

test: move hash join dictionary regression test to sqllogictest

a18be6f

Merge branch 'main' into fix-20437-hash-join-dictionary-panic

3470c4d

github-actions bot added sqllogictest SQL Logic Tests (.slt) and removed core Core DataFusion crate labels Feb 20, 2026

jonathanc-n approved these changes Feb 21, 2026

View reviewed changes

Merge branch 'main' into fix-20437-hash-join-dictionary-panic

9771125

alamb added regression Something that used to work no longer does and removed regression Something that used to work no longer does labels Feb 22, 2026

adriangb reviewed Feb 22, 2026

View reviewed changes

alamb mentioned this pull request Feb 23, 2026

Patched DF 52.1.0 (revision a) influxdata/arrow-datafusion#90

Closed

alamb approved these changes Feb 23, 2026

View reviewed changes

This was referenced Feb 23, 2026

fix: HashJoin panic with String dictionary keys (don't flatten keys) #20505

Merged

[branch-52] fix: HashJoin panic with dictionary-encoded columns in multi-key joins (#20441) #20512

Merged

alamb added this pull request to the merge queue Feb 24, 2026

Merged via the queue into apache:main with commit 0dfa542 Feb 24, 2026
32 checks passed

alexanderbianchi mentioned this pull request Feb 25, 2026

[BUG] Fix panic in in list builder for dictionary arrays #20557

Closed

erratic-pattern mentioned this pull request Mar 3, 2026

Patched DF 52.1.0 (revision b) influxdata/arrow-datafusion#91

Open

alamb mentioned this pull request Mar 4, 2026

INNER JOIN with dictionary keys fails when run on parquet with pushdown_filters = true. #20696

Closed

This was referenced Mar 9, 2026

Patched DF 52.1.0 (revision c) - (NOTE: superseded by #93) influxdata/arrow-datafusion#92

Closed

Patched DF 52.1.0 (revision c) influxdata/arrow-datafusion#93

Open

Patched DF 52.1.0 (revision d) influxdata/arrow-datafusion#94

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: HashJoin panic with dictionary-encoded columns in multi-key joins#20441

fix: HashJoin panic with dictionary-encoded columns in multi-key joins#20441
alamb merged 4 commits intoapache:mainfrom
Tim-53:fix-20437-hash-join-dictionary-panic

Tim-53 commented Feb 20, 2026

Uh oh!

jonathanc-n left a comment

Uh oh!

jonathanc-n Feb 20, 2026

Uh oh!

adriangb Feb 22, 2026

Uh oh!

alamb Feb 23, 2026

Uh oh!

alamb left a comment

Uh oh!

alamb Feb 23, 2026

Uh oh!

alamb Feb 23, 2026

Uh oh!

alamb Feb 23, 2026

Uh oh!

alamb Feb 23, 2026

Uh oh!

alamb Mar 4, 2026 •

edited

Loading

Uh oh!

alamb commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Tim-53 commented Feb 20, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

jonathanc-n left a comment

Choose a reason for hiding this comment

Uh oh!

jonathanc-n Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

adriangb Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alamb Mar 4, 2026 •

edited

Loading