perf(arrow): zero-copy BFloat16Array::from(Vec<bf16>) via Buffer::from_vec#7614
Open
LuciferYang wants to merge 1 commit into
Open
perf(arrow): zero-copy BFloat16Array::from(Vec<bf16>) via Buffer::from_vec#7614LuciferYang wants to merge 1 commit into
LuciferYang wants to merge 1 commit into
Conversation
…m_vec Follow-up to PR lance-format#7500 (wjones127's suggestion: lance-format#7500 (comment)). The prior fix eliminated the per-element Vec<u8> allocation but still walked the input a second time, copying two bytes per element into a freshly allocated MutableBuffer. Hand the input Vec directly to Buffer::from_vec so the allocation is reused as-is — no per-element copy, no MutableBuffer::extend loop. Buffer::from_vec requires `T: ArrowNativeType`, which bf16 does not implement (arrow-rs registers the trait only for f16/f32/f64 among the float family; Apache Arrow has no bf16 primitive type). Bridge the gap with bytemuck::cast_vec::<bf16, u16>. Soundness comes from three facts: 1. bf16 is #[repr(transparent)] over u16 in half 2.7, so it has the same size, alignment, and layout as u16. 2. half's bytemuck feature (enabled here) derives Zeroable + Pod on bf16, and cast_vec's runtime size/align check therefore always passes. 3. The crate-root #[cfg(not(target_endian = "little"))] compile_error! (added in lance-format#7511) ties the reinterpretation to little-endian hosts, matching the FixedSizeBinary(2) byte order Lance writes elsewhere. Big-endian builds fail to compile. Dependency additions: - half: add "bytemuck" feature (workspace Cargo.toml). - bytemuck: workspace dep with default-features = false + the "extern_crate_alloc" feature that cast_vec requires. bytemuck 1.25.0 is already resolved transitively; this promotes it to a direct dependency. - rust/lance-arrow/Cargo.toml: pull in bytemuck. - python/Cargo.lock and java/lance-jni/Cargo.lock refreshed to record bytemuck_derive (the excluded lockfiles per the workspace's three- lockfile rule). Verified: cargo fmt, cargo clippy -p lance-arrow --tests --benches -- -D warnings, cargo test -p lance-arrow (93 unit + 6 doc tests). Existing test_basics pins BFloat16Array::from(Vec) == from_iter_values, so behavior is regression-tested.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Contributor
Author
|
cc @wjones127 , Refer to the previous discussion: #7500 (comment) , I've temporarily adopted Option 2. If you think Option 1 should be implemented, please let me know. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Follow-up to #7500 (wjones127's suggestion in #7500 (comment)).
BFloat16Array::from(Vec<bf16>)still walked the input a second time, copying two bytes per element into a freshly allocatedMutableBuffer. This hands the inputVecstraight toBuffer::from_vec, so the existing allocation is reused as-is — no per-element copy, noMutableBuffer::extendloop.Buffer::from_vecrequiresT: ArrowNativeType, whichbf16does not implement (arrow-rs registers the trait only forf16/f32/f64among the float family; Apache Arrow has no bf16 primitive type, which is why Lance stores it asFixedSizeBinary(2)). The gap is bridged withbytemuck::cast_vec::<bf16, u16>, which reinterprets the allocation in place.Why it is sound
bf16is#[repr(transparent)]overu16inhalf2.7, so it has the same size (2) and alignment (2) asu16.cast_vectherefore hits its equal-size/equal-align path and reuses the allocation without realloc; a mismatch would panic, never reach UB.half'sbytemuckfeature (enabled here) derivesZeroable + Podonbf16, satisfyingcast_vec's trait bounds.#[cfg(not(target_endian = "little"))] compile_error!added in docs(arrow): add SAFETY comments to lance-arrow unsafe blocks #7511 ties the reinterpretation to little-endian hosts, matching theFixedSizeBinary(2)byte order Lance writes elsewhere. Big-endian builds fail to compile, so the output is byte-identical to the previous per-elementto_le_bytes()path.Changes
From<Vec<bf16>>: replace theMutableBuffercopy loop withbytemuck::cast_vec+Buffer::from_vec.half: add thebytemuckfeature; addbytemuck(default-features = false,extern_crate_alloc) as a direct workspace dependency (already resolved transitively, now promoted).python/Cargo.lockandjava/lance-jni/Cargo.lock(the workspace-excluded lockfiles) to recordbytemuck_derive.as_slice's alignment doc/SAFETY comment previously asserted every value buffer comes fromMutableBuffer(≥32-byte aligned); reworded to also cover the newBuffer::from_vec::<u16>path (2-byte aligned), both of which satisfybf16's 2-byte requirement.Tests
test_basicsnow pins the raw little-endian bytes emitted byFrom<Vec<bf16>>([0x80,0x3F, 0x00,0x40, 0x40,0x40]for[1.0, 2.0, 3.0]) viaFixedSizeBinaryArray::value, so a layout or byte-order regression is caught directly rather than only through Debug formatting. The priorassert_eq!(array, array2)compared two outputs of the sameFromimpl and could not catch a regression.cargo fmt,cargo clippy -p lance-arrow --tests --benches -- -D warnings, andcargo test -p lance-arrow(93 unit + 6 doc tests) are green.