Fix SIGSEGV in Highway kernels: use unaligned Load/Store on std::vector-backed blobs#383
Merged
Merged
Conversation
Aligned hn::Load/hn::Store on std::vector-backed blobs (16-byte aligned) fault on targets with vectors wider than 16 B. Use unaligned LoadU/StoreU in MulAddLoop, DotProductHw, AddHw, AddInplaceHw, ReluHw.
Contributor
Author
Collaborator
|
thank you very much for both PRs — and my sincere apologies for the long delay in responding. I’ve been away from the project for the past week and didn’t get a chance to look at these earlier. I’ll review the code in detail within the next 24–48 hours and will either merge or leave any minor comments. Please bear with me just a little longer. Thanks again for your patience and your excellent contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
highway/module crashes with SIGSEGV on any x86 target ≥ AVX2 (e.g.building with
-march=skylake,-march=haswell,-march=cascadelake). Fiveelementwise kernels in
src/hw_kernels_hw.cppuse Highway's alignedLoad/Storeon blob data that is onlystd::vector<float>-aligned, which isnarrower than the SIMD vector width.
Symptom
fdt::hw::AddHw hw_kernels_hw.cpp:65
... (any pipeline that calls AddHw / DotProductHw / ReluHw / MulAddHw / AddInplaceHw)
SIGSEGV inside
hn::Store(hn::Add(hn::Load(...), hn::Load(...)), ...).Root cause
HwBlob::data_, astd::vector<float>. Its base pointer isonly aligned to the default allocator alignment (16 bytes on x86-64 glibc),
not the SIMD vector width.
Load/Storeare the aligned variants and require the pointer tobe aligned to the vector size — 32 bytes for AVX2, 64 bytes for AVX-512.
satisfy the requirement, so it works. On AVX2+ the aligned move (
vmovaps)hits a 16-byte-but-not-32-byte address →
#GP→ SIGSEGV.PaddedChannels) makes per-pixelPtr()offsets a multipleof the vector size, but the base
data_.data()is still only 16-bytealigned, so it does not help.
This is an inconsistency within the file itself: the other ~150 load/store sites
already use the unaligned
LoadU/StoreU. Only these 5 functions use thealigned variant:
MulAddLoop,DotProductHw,AddHw,AddInplaceHw,ReluHw.