refactor: migrate Druid HTTP client from Netty 3 to 4#19567
refactor: migrate Druid HTTP client from Netty 3 to 4#19567clintropolis wants to merge 6 commits into
Conversation
changes:
* `NettyHttpClient` now uses netty 4 (along with all associated configs, handlers, etc)
* allocator: uses Netty 4.2's default `AdaptiveByteBufAllocator` (pooled internally, direct by default), replacing Netty 3's unpooled heap allocator. Operators running close to `-XX:MaxDirectMemorySize` may need to bump it. Allocator type is swappable at runtime via `-Dio.netty.allocator.type={adaptive,pooled,unpooled}` (operators might wish to experiment on upgrade).
* fix a leak in `SequenceInputStreamResponseHandler`, copy chunk bytes to `byte[]` before queueing (was wrapping a pooled `ByteBuf` that `SimpleChannelInboundHandler` releases after `channelRead0` returns)
* lots of import changes to swap to netty 4
* cleanup pom, licenses.yaml, OWASP suppressions
* added 'paranoid' leak detection to embedded-tests
| Assert.assertEquals( | ||
| json, | ||
| StringUtils.fromUtf8(ByteStreams.toByteArray(new ChannelBufferInputStream(request.getContent()))) | ||
| StringUtils.fromUtf8(ByteStreams.toByteArray(new ByteBufInputStream(request.getContent()))) |
| Assert.assertEquals( | ||
| "{\"foo\":3}", | ||
| StringUtils.fromUtf8(ByteStreams.toByteArray(new ChannelBufferInputStream(request.getContent()))) | ||
| StringUtils.fromUtf8(ByteStreams.toByteArray(new ByteBufInputStream(request.getContent()))) |
FrankChen021
left a comment
There was a problem hiding this comment.
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 1 |
| P2 | 0 |
| P3 | 0 |
| Total | 1 |
Reviewed 164 of 164 changed files.
This is an automated review by Codex GPT-5.5
| method, | ||
| urlFile.isEmpty() ? "/" : urlFile | ||
| urlFile.isEmpty() ? "/" : urlFile, | ||
| request.hasContent() ? request.getContent() : Unpooled.EMPTY_BUFFER |
There was a problem hiding this comment.
[P1] Do not hand the request-owned ByteBuf to Netty
This passes the Request's stored ByteBuf directly into a Netty 4 FullHttpRequest. Netty's HTTP encoder releases FullHttpMessages after encoding, and may also advance the content reader index, so after the first send request.getContent() can be released or consumed. Kerberos' 401 retry path calls request.copy() after the first response, which can then fail or resend an empty body for authenticated POSTs. Please build the outbound request from a retained duplicate/copy, or make Request store bytes and create a fresh ByteBuf per send.
| } | ||
| int contentLength = 0; | ||
| String header; | ||
| while (!(header = in.readLine()).equals("")) { |
| final Lifecycle lifecycle = new Lifecycle(); | ||
| try { | ||
| final HttpClient client = HttpClientInit.createClient(HttpClientConfig.builder().build(), lifecycle); | ||
| final URL url = new URL(StringUtils.format("http://localhost:%d/", serverSocket.getLocalPort())); |
FrankChen021
left a comment
There was a problem hiding this comment.
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 1 |
| P2 | 0 |
| P3 | 0 |
| Total | 1 |
Reviewed 52 of 52 changed files.
Findings that could not be attached inline:
- processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorFactory.java:91 - [P1] Projection matching bypasses clustered column capabilities. The clustered index path now depends on clustered-aware column capabilities, but makeCursorHolder tries index.getProjection(spec) before checking index.getClusteredBaseSummary(). For clustered segments, the top-level physical index does not expose the clustered logical columns, so aggregate projection matching can treat clustered filter/group columns as absent from the base table and select a projection that does not carry them. That returns a normal QueryableIndexCursorHolder over the projection instead of entering clustered dispatch, which can silently produce empty or incorrect aggregate results. Projection selection needs to be clustered-aware, or aggregate projection matching should be deferred/skipped until the clustered schema can validate the query.
This is an automated review by Codex GPT-5.5
Description
Third time's a charm? Previous attempts #12032 and #14479 afaict both got stuck trying to resolve failures in the integration tests (the old docker based integration tests). I started fresh for this attempt, but the embedded-tests that have replaced our old frameworks made it a lot easier to dig in and determine the problem (and credit to prior attempts which were used as reference for comparison)
The root problem with prior attempts that resulted in the observed hanging is that Netty 4's
setAutoRead(true)is just a config flag. Netty 3'ssetReadable(true)also kicked an immediate read. For Druid's chunked responses where the server flushes the status line in one TCP write and the body in subsequent writes (e.g.QueryResource), the body bytes sit on the wire for ~5 minutes until connection close. Fix is a one-linectx.read()afterhandleResponseinNettyHttpClientwith a comment explaining the semantic difference.Allocator and direct memory
The HTTP client previously used Netty 3's
HeapChannelBufferFactorywhich is unpooled heap buffers. Netty 4.2's default isAdaptiveByteBufAllocator(pooled internally viaAdaptivePoolingAllocator, direct by default). This is a substantive change in memory characteristics:running close to their
-XX:MaxDirectMemorySizebudget may need to bump it.higher than under Netty 3 even when idle, normal pooled-allocator behavior.
This PR is using Netty 4.2's default allocator (
adaptive) rather than pinning to thePooledByteBufAllocatorthat was the Netty 4 default until 4.2 (or the unpooled heap behavior of Netty 3). Since Druid operators are coming from heap-unpooled (Netty 3), neither Netty 4 choice represents a "known good baseline"; both are new characteristics for this workload. However, using pooled off-heap buffers of some kind should offer a performance improvement so it seems worth the risk.The allocator can be swapped without a code change:
-Dio.netty.allocator.type=…adaptive(default in Netty 4.2)pooledPooledByteBufAllocatorunpooled-Dio.netty.noPreferDirect=trueto also stay on-heapOther useful knobs:
io.netty.noPreferDirectfalsetrue⇒ pooled but heap rather than directio.netty.allocator.maxOrderio.netty.maxDirectMemory-XX:MaxDirectMemorySizeio.netty.leakDetection.levelsimpledisabled/simple/advanced/paranoidOther stuff
InputStreamHolder.fromChannelBuffer(ByteBuf, long)now copies bytes synchronously instead of wrapping aByteBufInputStreamover a pooled buffer thatSimpleChannelInboundHandlerreleases afterchannelRead0returns. The prior PRs hit CodeQL leak warnings onSequenceInputStreamResponseHandler.java; that handler is now leak-safe too, and so areDirectDruidClientandDataServerResponseHandlerwhich all funnel through the same helper.