mapping performance improvement#773
Open
alexey0308 wants to merge 10 commits intoalexdobin:masterfrom
Open
Conversation
* [stitchWindowAligns()] used to create a new [Transcript] instance irregardless of if it will be successfully stitched to the transcript. Early check allows to check for most common reasons to reject the alignment. * To ease reading, `WA[iA]` is aliased as `align`. * minor formatting
Owner
|
Hi Alexey, thanks a lot for this very interesting PR (and the previous as well). Cheers |
|
Does anybody knows whether using a gtf annotation + Star + human genome I could improve alignment speed? If yes, how many times I can speed up the alignment? |
Owner
|
Hi @rherai using annotations increases accuracy but not the speed. Cheers |
birdingman0626
added a commit
to birdingman0626/STAR-Win
that referenced
this pull request
Apr 13, 2026
Bug fixes from upstream STAR PRs: - PR alexdobin#2163: Remove OOB write in sjdbInsertJunctions.cpp (SA.writePacked at index nSA is one past the end; memory corruption confirmed by Valgrind) - PR alexdobin#2676: Fix memory leaks in outputSJ.cpp (sjA, sjFilter, sjChunks arrays allocated but never freed) - PR alexdobin#535: Fix segfault in SA lookup shortcut in ReadAlign_maxMappableLength2strands.cpp (unreliable shortcut caused unsigned underflow and SIGSEGV on certain genomes) Performance optimizations: - PR alexdobin#791: PackedArray bitmask optimization (replace expensive double-shift with single AND in hot operator[]; ~1-2% improvement) - PR alexdobin#791: Add FastResetVector.h (O(modified) reset instead of O(N) memset; available for winBin array optimization) - PR alexdobin#773 (partial): Early rejection in stitchWindowAligns.cpp to skip unnecessary Transcript copies when alignment will obviously fail HTSlib upgrade evaluated and deferred: current 1.3 has minimal security surface (no CRAM, no network I/O, trusted input only). Note: uniquely mapped count changes slightly due to PR alexdobin#535 fix — the buggy shortcut was incorrectly skipping valid alignments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
During mapping,
stitchWindowAlignsuses recursive callsiAThe call 1) is performed only if
stitchAlignToTranscriptdoes not return a bad score.Since
stitchAlignToTranscriptmodifies theTranscriptobject, an additional copy is created, which is expansive.This pull request attempts to address this performance overhead:
additional cheaper check is performed, which allows to avoid copy-creation of a new
Transcriptobject if it will not be used in the recursion.The changes comprise of
Progress output with and without the PR code on a test subset: