Fix buffer overflow in k-mer prefilter with highly conserved sequences by mzueva · Pull Request #1091 · soedinglab/MMseqs2

mzueva · 2026-03-23T16:06:44Z

CacheFriendlyOperations::findDuplicates underestimates output size when checking for buffer overflow. The check uses std::min(elementCount, currBinSize/2), assuming at most half of bin entries are duplicates. This assumption breaks when query k-mers are shared across a large fraction of the target database — as it happens with antibody variable region sequences, where conserved framework k-mers match ~70% of targets on consistent diagonals. The result is silent out-of-bounds writes to the foundDiagonals buffer, causing segfaults during prefiltering. With multiple threads, the corruption typically crashes at low progress (~8%).

To reproduce: run mmseqs easy-search with an antibody query against a multi-million antibody database (e.g. 10M clonotype sequences). Any dataset where conserved k-mers match a large fraction of targets will trigger it.

To fix: replace std::min(elementCount, currBinSize/2) with elementCount for a correct upper bound.

… at most half the bin entries can be duplicates, but with repetitive k-mer hits (e.g. antibody frameworks) elementCount can approach currBinSize, causing out-of-bounds writes. Use elementCount directly for a correct bound.

mzueva mentioned this pull request Mar 23, 2026

Buffer overflow in k-mer prefilter with highly conserved sequences #1092

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix buffer overflow in k-mer prefilter with highly conserved sequences#1091

Fix buffer overflow in k-mer prefilter with highly conserved sequences#1091
mzueva wants to merge 1 commit intosoedinglab:masterfrom
platforma-open:fix-cache-overflow

mzueva commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mzueva commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant