Boot speedup: buffer FASL reads + cache file-scope (~2.6x faster boot) by dg1sbg · Pull Request #1775 · clasp-developers/clasp

dg1sbg · 2026-05-26T05:42:36Z

Summary

Two small, independent loader changes that more than halve clasp's boot time. Both touch only src/core/loadltv.cc and are portable C++ (no platform specifics).

1. Buffer FASL reads in `loadltv`

loadltv was reading the FASL one byte / one small field at a time straight from the stream — read_u8() → stream_read_byte(), and the multi-byte/blob reads → stream_read_byte8(). Every such call is a virtual stream dispatch plus an interrupt-park (BEGIN_PARK/END_PARK). The base image is loaded entirely through this path, so boot was dominated by per-byte stream overhead.

Add a 64 KB read-ahead buffer and route every read through it: read_u8, read_u16/32/64/80/128, the base-string / utf8-string / machine-code blob reads, the header, and the bytecode-module read that was using cl__read_sequence. The loader owns the stream for the whole load (load_bytecode_stream makes a fresh loader and the caller closes the stream afterwards), so reading ahead is safe; it also now errors cleanly on unexpected EOF instead of reading uninitialized data (which closes a latent issue).

2. Cache the last (path → FileScope) in `di_op_location`

After buffering, the next-biggest chunk of boot was constructing debug-info locations. di_op_location calls core__file_scope(path) per location — that copies the string and does a locked hash intern. The base image has many locations per source file that share the same path literal, so this was redundant.

Cache the last (path-literal → FileScope) in the loader and reuse it on an eq match, falling back to core__file_scope on a miss — so correctness holds even without literal coalescing.

Measured (on Apple Silicon, macOS, LLVM 22, boehmprecise variant)

time clasp --noinform --non-interactive -e '(core:exit)', best of 8 runs:

	boot wall	`read()` sys time
baseline	5.14 s	2.02 s
+ buffered reads	~2.20 s	0.12 s
+ file-scope cache	~1.95 s (min 1.94 s)	—

~2.6× faster boot. The 16× drop in read() sys time is the direct, machine-independent proof that the per-byte I/O overhead is gone.

Testing

Full clasp regression suite byte-identical to the pre-change baseline after each commit: 1877 pass, 0 load errors / bus errors / segfaults. (The 4 remaining "failures" are the pre-existing known issues listed in set-unexpected-failures.lisp — unrelated.) Because the suite compiles and loads thousands of FASLs through the changed path, the buffered parser and the file-scope cache are well-exercised.

🤖 Generated with Claude Code

loadltv read the FASL one byte / one small field at a time straight from the stream (read_u8 -> stream_read_byte, and read_uN/blob reads -> stream_read_byte8), and every such call is a virtual stream dispatch plus an interrupt-park (BEGIN_PARK/END_PARK). Loading the base image at startup goes entirely through this path, so boot was dominated by per-byte stream overhead. Add a 64 KB read-ahead buffer and route all reads through it (read_u8, read_u16/32/64/80/128, the base/utf8-string and machine-code blob reads, the header, and the bytecode-module read that used cl__read_sequence). The loader owns the stream for the whole load (load_bytecode_stream creates a fresh loader and the caller closes the stream afterwards), so reading ahead is safe; it also now errors cleanly on unexpected EOF instead of reading uninitialized data. Boot (clasp --non-interactive -e '(core:exit)') drops from ~5.1s to ~2.2s on Apple Silicon; the read() sys time falls from ~2.0s to ~0.1s. Full regression suite is byte-identical (1877 pass, 0 load errors). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

di_op_location (loading a bytecode module's source-location debug info) called core__file_scope(path) for every location. The base image has many locations per source file that share the same path literal, and core__file_scope copies the string and does a locked hash intern each time, so this was redundant work on the boot path. Cache the last (path-literal -> FileScope) in the loader and reuse it on an eq match, falling back to core__file_scope on a miss (so correctness holds even without literal coalescing). Boot drops a further ~0.25s (to ~1.95s); full regression byte-identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bike

Looks good other than the small stuff!

Bike · 2026-06-02T15:36:28Z

+  T_sp _lastDebugPath = nil<T_O>();
+  T_sp _lastDebugScope = nil<T_O>();
+
+  // Read buffer. loadltv otherwise pulls the FASL one byte / one small field at


again, describing what used to happen but no longer happens is good for a commit message, not for comments.

Bike · 2026-06-02T15:40:18Z

    SimpleVector_byte8_t_sp bytes = SimpleVector_byte8_t_O::make(len);
-    cl__read_sequence(bytes, _stream, clasp_make_fixnum(0), nil<T_O>());
+    // Read the module bytecode straight into the vector through our buffer
+    // (equivalent to the previous cl__read_sequence, but without per-call stream


please fix this comment as well

Bike · 2026-06-02T15:46:06Z

-    FileScope_sp sfi = gc::As<FileScope_sp>(sfi_mv);
+    FileScope_sp sfi;
+    if (path == _lastDebugPath && _lastDebugScope.notnilp()) {
+      sfi = gc::As_unsafe<FileScope_sp>(_lastDebugScope);


We're moving to use T_sp functions instead of gc::As - this should be _lastDebugScope.as_unsafe<FileScope_O>(). and probably as_assert rather than as_unsafe but that's no big deal.

dg1sbg mentioned this pull request May 26, 2026

Security hardening: bound FASL sizes, refuse unsafe socket buffers, shell-quote save-lisp paths (S2–S10 except S1/S6) #1776

Open

dg1sbg and others added 2 commits June 2, 2026 15:52

dg1sbg force-pushed the perf/boot-loadltv branch from 613681a to a655ab9 Compare June 2, 2026 14:01

Bike reviewed Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boot speedup: buffer FASL reads + cache file-scope (~2.6x faster boot)#1775

Boot speedup: buffer FASL reads + cache file-scope (~2.6x faster boot)#1775
dg1sbg wants to merge 2 commits into
clasp-developers:mainfrom
dg1sbg:perf/boot-loadltv

dg1sbg commented May 26, 2026

Uh oh!

Bike left a comment

Uh oh!

Bike Jun 2, 2026

Uh oh!

Bike Jun 2, 2026

Uh oh!

Bike Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dg1sbg commented May 26, 2026

Summary

1. Buffer FASL reads in loadltv

2. Cache the last (path → FileScope) in di_op_location

Measured (on Apple Silicon, macOS, LLVM 22, boehmprecise variant)

Testing

Uh oh!

Bike left a comment

Choose a reason for hiding this comment

Uh oh!

Bike Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Bike Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Bike Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Buffer FASL reads in `loadltv`

2. Cache the last (path → FileScope) in `di_op_location`