Boot speedup: buffer FASL reads + cache file-scope (~2.6x faster boot)#1775
Open
dg1sbg wants to merge 2 commits into
Open
Boot speedup: buffer FASL reads + cache file-scope (~2.6x faster boot)#1775dg1sbg wants to merge 2 commits into
dg1sbg wants to merge 2 commits into
Conversation
loadltv read the FASL one byte / one small field at a time straight from the stream (read_u8 -> stream_read_byte, and read_uN/blob reads -> stream_read_byte8), and every such call is a virtual stream dispatch plus an interrupt-park (BEGIN_PARK/END_PARK). Loading the base image at startup goes entirely through this path, so boot was dominated by per-byte stream overhead. Add a 64 KB read-ahead buffer and route all reads through it (read_u8, read_u16/32/64/80/128, the base/utf8-string and machine-code blob reads, the header, and the bytecode-module read that used cl__read_sequence). The loader owns the stream for the whole load (load_bytecode_stream creates a fresh loader and the caller closes the stream afterwards), so reading ahead is safe; it also now errors cleanly on unexpected EOF instead of reading uninitialized data. Boot (clasp --non-interactive -e '(core:exit)') drops from ~5.1s to ~2.2s on Apple Silicon; the read() sys time falls from ~2.0s to ~0.1s. Full regression suite is byte-identical (1877 pass, 0 load errors). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
di_op_location (loading a bytecode module's source-location debug info) called core__file_scope(path) for every location. The base image has many locations per source file that share the same path literal, and core__file_scope copies the string and does a locked hash intern each time, so this was redundant work on the boot path. Cache the last (path-literal -> FileScope) in the loader and reuse it on an eq match, falling back to core__file_scope on a miss (so correctness holds even without literal coalescing). Boot drops a further ~0.25s (to ~1.95s); full regression byte-identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bike
reviewed
Jun 2, 2026
Member
Bike
left a comment
There was a problem hiding this comment.
Looks good other than the small stuff!
| T_sp _lastDebugPath = nil<T_O>(); | ||
| T_sp _lastDebugScope = nil<T_O>(); | ||
|
|
||
| // Read buffer. loadltv otherwise pulls the FASL one byte / one small field at |
Member
There was a problem hiding this comment.
again, describing what used to happen but no longer happens is good for a commit message, not for comments.
| SimpleVector_byte8_t_sp bytes = SimpleVector_byte8_t_O::make(len); | ||
| cl__read_sequence(bytes, _stream, clasp_make_fixnum(0), nil<T_O>()); | ||
| // Read the module bytecode straight into the vector through our buffer | ||
| // (equivalent to the previous cl__read_sequence, but without per-call stream |
| FileScope_sp sfi = gc::As<FileScope_sp>(sfi_mv); | ||
| FileScope_sp sfi; | ||
| if (path == _lastDebugPath && _lastDebugScope.notnilp()) { | ||
| sfi = gc::As_unsafe<FileScope_sp>(_lastDebugScope); |
Member
There was a problem hiding this comment.
We're moving to use T_sp functions instead of gc::As - this should be _lastDebugScope.as_unsafe<FileScope_O>(). and probably as_assert rather than as_unsafe but that's no big deal.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two small, independent loader changes that more than halve clasp's boot time. Both touch only
src/core/loadltv.ccand are portable C++ (no platform specifics).1. Buffer FASL reads in
loadltvloadltvwas reading the FASL one byte / one small field at a time straight from the stream —read_u8()→stream_read_byte(), and the multi-byte/blob reads →stream_read_byte8(). Every such call is a virtual stream dispatch plus an interrupt-park (BEGIN_PARK/END_PARK). The base image is loaded entirely through this path, so boot was dominated by per-byte stream overhead.Add a 64 KB read-ahead buffer and route every read through it:
read_u8,read_u16/32/64/80/128, the base-string / utf8-string / machine-code blob reads, the header, and the bytecode-module read that was usingcl__read_sequence. The loader owns the stream for the whole load (load_bytecode_streammakes a fresh loader and the caller closes the stream afterwards), so reading ahead is safe; it also now errors cleanly on unexpected EOF instead of reading uninitialized data (which closes a latent issue).2. Cache the last (path → FileScope) in
di_op_locationAfter buffering, the next-biggest chunk of boot was constructing debug-info locations.
di_op_locationcallscore__file_scope(path)per location — that copies the string and does a locked hash intern. The base image has many locations per source file that share the same path literal, so this was redundant.Cache the last (path-literal → FileScope) in the loader and reuse it on an
eqmatch, falling back tocore__file_scopeon a miss — so correctness holds even without literal coalescing.Measured (on Apple Silicon, macOS, LLVM 22, boehmprecise variant)
time clasp --noinform --non-interactive -e '(core:exit)', best of 8 runs:read()sys time~2.6× faster boot. The 16× drop in
read()sys time is the direct, machine-independent proof that the per-byte I/O overhead is gone.Testing
Full clasp regression suite byte-identical to the pre-change baseline after each commit: 1877 pass, 0 load errors / bus errors / segfaults. (The 4 remaining "failures" are the pre-existing known issues listed in
set-unexpected-failures.lisp— unrelated.) Because the suite compiles and loads thousands of FASLs through the changed path, the buffered parser and the file-scope cache are well-exercised.🤖 Generated with Claude Code