Skip to content

Conversation

@aryan-212
Copy link

@aryan-212 aryan-212 commented Jan 25, 2026

Which issue does this PR close?

  • This PR is part of the Utf8View support epic. It adds Utf8View support in the Spark-compat layer.

Rationale for this change

In our internal project we're only suppporting Utf8View (because of design constraints) and the current implementation of SparkConcat only supports Utf8. The SparkConcat function should accept Utf8View and mixed string types in line with the main DataFusion concat. This PR adds that support and follows the same patterns as DataFusion’s concat.

Prevents errors like :

The type of Utf8 AND Utf8View of like physical should be same.
This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues

from a query like:-

select i_item_sk,
       item_info
from
  (select i_item_sk,
          CONCAT('Item: ', i_item_desc) as item_info
   from item) sub
where item_info LIKE 'Item: Electronic%'
order by 1;

What changes are included in this PR?

  • Extend the type signature to accept Utf8View in addition to Utf8 and LargeUtf8 via TypeSignature::Variadic(vec![Utf8View, Utf8, LargeUtf8]) matching DataFusion’s concat.

  • In return_field_from_args, compute the result type with precedence Utf8View > LargeUtf8 > Utf8.
    In spark_concat, handle Utf8View and LargeUtf8 in scalar paths (zero-argument and all-NULL).

Are these changes tested?

Yes.

  • Unit tests: cargo test --package datafusion-spark function::string::concat::tests, including test_concat_utf8view.
  • Sqllogictest: spark/string/concat.slt includes a “Utf8View: no extra CAST in plan” case that uses EXPLAIN and a temporary table to ensure no extra CASTs when using arrow_cast(..., 'Utf8View') with table columns.

Are there any user-facing changes?

  • API: SparkConcat’s signature is extended to include Utf8View in the variadic list. No breaking changes.

used gpt to rephrase some of these points

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Jan 25, 2026
@aryan-212 aryan-212 force-pushed the utf8view-sparkconcat branch from fc92fa6 to e052fa3 Compare January 25, 2026 08:27
@aryan-212 aryan-212 changed the title feat: implement StringView for SparkConcat feat(spark): implement StringView for SparkConcat Jan 25, 2026
Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @aryan-212 👍

It is better to add some mixed type tests

# Test mixed types: Utf8View + Utf8
query T
SELECT concat(arrow_cast('hello', 'Utf8View'), ' world');
----
hello world

# Test all three types mixed together
query T
SELECT concat('a', arrow_cast('b', 'LargeUtf8'), arrow_cast('c', 'Utf8View'));
----
abc

@aryan-212 aryan-212 force-pushed the utf8view-sparkconcat branch 3 times, most recently from d791634 to 9329a43 Compare January 25, 2026 10:28
@aryan-212
Copy link
Author

Added them @Weijun-H, thanks for reviewing 🙇

@aryan-212 aryan-212 force-pushed the utf8view-sparkconcat branch 2 times, most recently from ad92725 to 7bb3ba0 Compare January 25, 2026 15:10
@aryan-212 aryan-212 requested a review from Jefffrey January 25, 2026 15:10
@aryan-212 aryan-212 force-pushed the utf8view-sparkconcat branch from 7bb3ba0 to 02362fe Compare January 25, 2026 15:21
@aryan-212 aryan-212 force-pushed the utf8view-sparkconcat branch 2 times, most recently from 1c997dc to 28156b8 Compare January 25, 2026 17:16
@aryan-212
Copy link
Author

@Jefffrey , made the required test changes. Please have a look 🙇

@aryan-212 aryan-212 force-pushed the utf8view-sparkconcat branch 3 times, most recently from 3715990 to 577b604 Compare January 25, 2026 17:40
@aryan-212 aryan-212 force-pushed the utf8view-sparkconcat branch from 577b604 to 1358178 Compare January 25, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants