Skip to content

Boot speedup: buffer FASL reads + cache file-scope (~2.6x faster boot)#1775

Open
dg1sbg wants to merge 2 commits into
clasp-developers:mainfrom
dg1sbg:perf/boot-loadltv
Open

Boot speedup: buffer FASL reads + cache file-scope (~2.6x faster boot)#1775
dg1sbg wants to merge 2 commits into
clasp-developers:mainfrom
dg1sbg:perf/boot-loadltv

Conversation

@dg1sbg
Copy link
Copy Markdown
Contributor

@dg1sbg dg1sbg commented May 26, 2026

Summary

Two small, independent loader changes that more than halve clasp's boot time. Both touch only src/core/loadltv.cc and are portable C++ (no platform specifics).

1. Buffer FASL reads in loadltv

loadltv was reading the FASL one byte / one small field at a time straight from the stream — read_u8()stream_read_byte(), and the multi-byte/blob reads → stream_read_byte8(). Every such call is a virtual stream dispatch plus an interrupt-park (BEGIN_PARK/END_PARK). The base image is loaded entirely through this path, so boot was dominated by per-byte stream overhead.

Add a 64 KB read-ahead buffer and route every read through it: read_u8, read_u16/32/64/80/128, the base-string / utf8-string / machine-code blob reads, the header, and the bytecode-module read that was using cl__read_sequence. The loader owns the stream for the whole load (load_bytecode_stream makes a fresh loader and the caller closes the stream afterwards), so reading ahead is safe; it also now errors cleanly on unexpected EOF instead of reading uninitialized data (which closes a latent issue).

2. Cache the last (path → FileScope) in di_op_location

After buffering, the next-biggest chunk of boot was constructing debug-info locations. di_op_location calls core__file_scope(path) per location — that copies the string and does a locked hash intern. The base image has many locations per source file that share the same path literal, so this was redundant.

Cache the last (path-literal → FileScope) in the loader and reuse it on an eq match, falling back to core__file_scope on a miss — so correctness holds even without literal coalescing.

Measured (on Apple Silicon, macOS, LLVM 22, boehmprecise variant)

time clasp --noinform --non-interactive -e '(core:exit)', best of 8 runs:

boot wall read() sys time
baseline 5.14 s 2.02 s
+ buffered reads ~2.20 s 0.12 s
+ file-scope cache ~1.95 s (min 1.94 s)

~2.6× faster boot. The 16× drop in read() sys time is the direct, machine-independent proof that the per-byte I/O overhead is gone.

Testing

Full clasp regression suite byte-identical to the pre-change baseline after each commit: 1877 pass, 0 load errors / bus errors / segfaults. (The 4 remaining "failures" are the pre-existing known issues listed in set-unexpected-failures.lisp — unrelated.) Because the suite compiles and loads thousands of FASLs through the changed path, the buffered parser and the file-scope cache are well-exercised.

🤖 Generated with Claude Code

dg1sbg and others added 2 commits June 2, 2026 15:52
loadltv read the FASL one byte / one small field at a time straight from the
stream (read_u8 -> stream_read_byte, and read_uN/blob reads -> stream_read_byte8),
and every such call is a virtual stream dispatch plus an interrupt-park
(BEGIN_PARK/END_PARK). Loading the base image at startup goes entirely through
this path, so boot was dominated by per-byte stream overhead.

Add a 64 KB read-ahead buffer and route all reads through it (read_u8,
read_u16/32/64/80/128, the base/utf8-string and machine-code blob reads, the
header, and the bytecode-module read that used cl__read_sequence). The loader
owns the stream for the whole load (load_bytecode_stream creates a fresh loader
and the caller closes the stream afterwards), so reading ahead is safe; it also
now errors cleanly on unexpected EOF instead of reading uninitialized data.

Boot (clasp --non-interactive -e '(core:exit)') drops from ~5.1s to ~2.2s on
Apple Silicon; the read() sys time falls from ~2.0s to ~0.1s. Full regression
suite is byte-identical (1877 pass, 0 load errors).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
di_op_location (loading a bytecode module's source-location debug info) called
core__file_scope(path) for every location. The base image has many locations
per source file that share the same path literal, and core__file_scope copies
the string and does a locked hash intern each time, so this was redundant work
on the boot path.

Cache the last (path-literal -> FileScope) in the loader and reuse it on an eq
match, falling back to core__file_scope on a miss (so correctness holds even
without literal coalescing). Boot drops a further ~0.25s (to ~1.95s); full
regression byte-identical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dg1sbg dg1sbg force-pushed the perf/boot-loadltv branch from 613681a to a655ab9 Compare June 2, 2026 14:01
Copy link
Copy Markdown
Member

@Bike Bike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other than the small stuff!

Comment thread src/core/loadltv.cc
T_sp _lastDebugPath = nil<T_O>();
T_sp _lastDebugScope = nil<T_O>();

// Read buffer. loadltv otherwise pulls the FASL one byte / one small field at
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, describing what used to happen but no longer happens is good for a commit message, not for comments.

Comment thread src/core/loadltv.cc
SimpleVector_byte8_t_sp bytes = SimpleVector_byte8_t_O::make(len);
cl__read_sequence(bytes, _stream, clasp_make_fixnum(0), nil<T_O>());
// Read the module bytecode straight into the vector through our buffer
// (equivalent to the previous cl__read_sequence, but without per-call stream
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix this comment as well

Comment thread src/core/loadltv.cc
FileScope_sp sfi = gc::As<FileScope_sp>(sfi_mv);
FileScope_sp sfi;
if (path == _lastDebugPath && _lastDebugScope.notnilp()) {
sfi = gc::As_unsafe<FileScope_sp>(_lastDebugScope);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're moving to use T_sp functions instead of gc::As - this should be _lastDebugScope.as_unsafe<FileScope_O>(). and probably as_assert rather than as_unsafe but that's no big deal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants