Skip to content

feat: add Java scalar and table UDF runtime APIs#598

Open
lfkpoa wants to merge 1 commit intoduckdb:mainfrom
lfkpoa:feature/java-udf-single
Open

feat: add Java scalar and table UDF runtime APIs#598
lfkpoa wants to merge 1 commit intoduckdb:mainfrom
lfkpoa:feature/java-udf-single

Conversation

@lfkpoa
Copy link

@lfkpoa lfkpoa commented Mar 9, 2026

Summary

  • Add end-to-end Java UDF runtime support for scalar and table functions, including JNI bindings, Java API types, and registration plumbing.
  • Extend scalar UDF ergonomics with arity/zero-arg/varargs/class-mapped overloads and document usage in UDF.MD and README.md.
  • Add/expand coverage in TestBindings and TestDuckDBJDBC, and include runtime fixes required for Variant and appender/thread-safety integration.

@staticlibs
Copy link
Collaborator

Hi, thanks for the PR! This is a highly requested feature.

I will need a few days to do a full review, just a quick question so far: is a there a fundamental reason to have functions like register_scalar_udf in duckdb_java.cpp instead of pushing all this logic to Java side on top of DuckDBBindings calls?

@lfkpoa
Copy link
Author

lfkpoa commented Mar 10, 2026

Hi! Thank you for the feedback.

I really want this feature, that's why I tried to build it.
I had to make some choices, but feel free to question them or point to another direction.

  • DuckDB executes UDF callbacks from native execution threads and expects C function pointers plus native callback state (extra_info).
  • Java cannot directly provide those C callbacks; JNI glue must own thread attach/detach, global references, callback lifecycle, and error translation.
     
    I implemented functions like register_scalar_udf and table-function registration in duckdb_java.cpp because that is where the callback bridge (DuckDB C callback <-> JNI <-> Java objects) can be implemented safely.

But if you rather have them in DuckBindings *.cpp files, I can try to refactor it, but I think it has to stay at C side.

@staticlibs
Copy link
Collaborator

@lfkpoa

Thanks for the details!

But if you rather have them in DuckBindings *.cpp files

No, under DuckDBBindings I meant DuckDBBindings.java that exposes C API from duckdb.h with as minimal additions on C++ JNI side as possible.

I had to make some choices, but feel free to question them or point to another direction.

The ideal direction will be to do everything (or almost everything) on Java side calling C API methods from DuckDBBindings.java. I may underestimate the amount of plumbing required and it may indeed be better to keep the logic in C++ JNI layer in this case - but lets look into the details of this point by point.

DuckDB executes UDF callbacks from native execution threads and expects C function pointers plus native callback state (extra_info).

I would think we can just attach DuckDB native threads to JVM? I mean when calling duckdb_table_function_set_function from Java - to wrap the Java callback in the native callback that will attach the thread and handle the extra_info passing?

Java cannot directly provide those C callbacks; JNI glue must own thread attach/detach, global references, callback lifecycle, and error translation.

On error translation - I would think that may be enough to catch exceptions in a Java wrapper and call duckdb_function_set_error in it. Not ready to comment on other points.

I implemented functions like register_scalar_udf and table-function registration in duckdb_java.cpp because that is where the callback bridge (DuckDB C callback <-> JNI <-> Java objects) can be implemented safely.

The problem here is that we want to get rid of native calls in DuckDBNative.java, that is not possible right now, but we surely don't want new JNI calls there unless absolutely necessary. Some ideal case would be to have the plumbing logic in util.hpp/util.cpp (or separate headers) and to use it from bindings_*.cpp calls trying to keep the overall logic as close to the intended original C API calls logic as possible.

Again, it is possible that I don't understand properly all the involved complexity of the required translation, as the above is written from the general desired approach about JDBC. So if you can immediately see the blockers with this approach - lets look into the details of these blockers.

@lfkpoa
Copy link
Author

lfkpoa commented Mar 10, 2026

Thanks a lot for the detailed feedback — this is very helpful.
I don't see any blockers.

I understand your point now: the goal is to keep DuckDBBindings.java as the main Java-side surface for C API usage, with minimal JNI-specific additions, and avoid adding new DuckDBNative calls unless absolutely necessary. That direction makes sense to me.

I’ll do a deeper pass on the current UDF registration path and evaluate how much of the orchestration can be moved/refactored toward the DuckDBBindings approach you described, while keeping only the unavoidable native callback plumbing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants