Skip to content

Fix SIGSEGV in Highway kernels: use unaligned Load/Store on std::vector-backed blobs#383

Merged
Wwupup merged 1 commit into
ShiqiYu:masterfrom
pszemus:fix-sigsegv-in-highway-kernels
Jun 28, 2026
Merged

Fix SIGSEGV in Highway kernels: use unaligned Load/Store on std::vector-backed blobs#383
Wwupup merged 1 commit into
ShiqiYu:masterfrom
pszemus:fix-sigsegv-in-highway-kernels

Conversation

@pszemus

@pszemus pszemus commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

The highway/ module crashes with SIGSEGV on any x86 target ≥ AVX2 (e.g.
building with -march=skylake, -march=haswell, -march=cascadelake). Five
elementwise kernels in src/hw_kernels_hw.cpp use Highway's aligned
Load/Store on blob data that is only std::vector<float>-aligned, which is
narrower than the SIMD vector width.

Symptom

fdt::hw::AddHw hw_kernels_hw.cpp:65
... (any pipeline that calls AddHw / DotProductHw / ReluHw / MulAddHw / AddInplaceHw)

SIGSEGV inside hn::Store(hn::Add(hn::Load(...), hn::Load(...)), ...).

Root cause

  • Blob storage is HwBlob::data_, a std::vector<float>. Its base pointer is
    only aligned to the default allocator alignment (16 bytes on x86-64 glibc),
    not the SIMD vector width.
  • Highway's Load/Store are the aligned variants and require the pointer to
    be aligned to the vector size — 32 bytes for AVX2, 64 bytes for AVX-512.
  • On an SSE target (128-bit, 16-byte alignment) the 16-byte base happens to
    satisfy the requirement, so it works. On AVX2+ the aligned move (vmovaps)
    hits a 16-byte-but-not-32-byte address → #GP → SIGSEGV.
  • Channel padding (PaddedChannels) makes per-pixel Ptr() offsets a multiple
    of the vector size, but the base data_.data() is still only 16-byte
    aligned, so it does not help.

This is an inconsistency within the file itself: the other ~150 load/store sites
already use the unaligned LoadU/StoreU. Only these 5 functions use the
aligned variant: MulAddLoop, DotProductHw, AddHw, AddInplaceHw, ReluHw.

Aligned hn::Load/hn::Store on std::vector-backed blobs (16-byte aligned)
fault on targets with vectors wider than 16 B. Use unaligned LoadU/StoreU
in MulAddLoop, DotProductHw, AddHw, AddInplaceHw, ReluHw.
@pszemus

pszemus commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@ShiqiYu @Wwupup can you guys please look at this?

@Wwupup

Wwupup commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

@ShiqiYu @Wwupup can you guys please look at this?

thank you very much for both PRs — and my sincere apologies for the long delay in responding. I’ve been away from the project for the past week and didn’t get a chance to look at these earlier. I’ll review the code in detail within the next 24–48 hours and will either merge or leave any minor comments. Please bear with me just a little longer.

Thanks again for your patience and your excellent contribution!

@Wwupup Wwupup merged commit acf7b25 into ShiqiYu:master Jun 28, 2026
@pszemus pszemus deleted the fix-sigsegv-in-highway-kernels branch June 29, 2026 04:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants