Skip to content

Dx/order journal#884

Draft
ZanCorDX wants to merge 10 commits intodevelopfrom
dx/order-journal
Draft

Dx/order journal#884
ZanCorDX wants to merge 10 commits intodevelopfrom
dx/order-journal

Conversation

@ZanCorDX
Copy link
Collaborator

📝 Summary

Adds a new clickhouse table to store all orders seen by the builder.

✅ I have completed the following steps:

  • Run make lint
  • Run make test
  • Added tests (if applicable)

@claude
Copy link
Contributor

claude bot commented Feb 18, 2026

Claude finished @ZanCorDX's task in 10m 53s —— View job


Review of PR #884 — Order Journal

  • Read repo guidelines
  • Analyze PR diff and all changed files
  • Review code quality, bugs, security, and performance
  • Post review feedback with inline comments

Summary

This PR adds order journal tracking by introducing an OrderJournalObserver trait and a ClickHouse-backed OrderJournalWriter, plus a refactoring of how the prefetcher receives simulated orders (switching from broadcast::Receiver to std::sync::mpsc::Receiver). The architecture and separation of concerns are reasonable. Below are the issues found.


Bug: OrderIntakeStore::consume_next_batch ignores channel close

File: crates/rbuilder/src/building/builders/parallel_builder/order_intake_store.rs:29-33

blocking_consume_next_commands() returns Ok(None) when the channel is closed, signaling the builder should stop. However, consume_next_batch discards this return value and always returns Ok(true):

pub fn consume_next_batch(&mut self) -> eyre::Result<bool> {
    self.order_consumer.blocking_consume_next_commands()?;  // None is ignored
    self.order_consumer.apply_new_commands(&mut self.order_sink);
    Ok(true)  // always true
}

The caller at parallel_builder/mod.rs:257 checks if !ok { break 'building; }, so the parallel builder loop will never exit on channel close. This was likely a pre-existing issue that became more visible with the new SimulatedOrderJournalCommand wrapping.

Fix:

pub fn consume_next_batch(&mut self) -> eyre::Result<bool> {
    let should_continue = self.order_consumer.blocking_consume_next_commands()?.is_some();
    self.order_consumer.apply_new_commands(&mut self.order_sink);
    Ok(should_continue)
}

Fix this →


Stale doc comments on enum values

File: crates/rbuilder-operator/src/clickhouse.rs:134-148

Doc comments reference 'Tx' = 1, 'Bundle' = 2 and 'Add' = 1, 'Remove' = 2, but the actual values are 0, 1 (changed in commit c097884). The schema comment block on lines 157-159 has the same issue. Either the comments need updating, or the values need to match the actual ClickHouse schema. See inline comment.


Other inline comments posted

  • clickhouse.rs:606usize to u32 truncation on sequence_number. See comment.
  • building/mod.rs:228-240 — Per-event String::clone + syscall on the hot order delivery path. See comment.
  • flashbots_config.rs:378-392 — Shared DiskBackup between block writer and journal writer. See comment.

Other observations (not filed as inline comments)

  1. Prefetcher refactoring is a net positive. Switching from broadcast::Receiver (with potential Lagged errors) to mpsc::Receiver (which is lossless and simpler) is a good change. The new fan-out pattern in start_building_job (one thread receiving from tokio mpsc, then distributing to broadcast for builders + sync mpsc for prefetcher + observer) is clean.

  2. BackupNotTooBigRelaySubmissionPolicy::should_submit fix. The change to use .with_label_values(&[BLOCKS_TABLE_NAME]) instead of .get() on the IntGaugeVec is a correctness fix — the old code would have panicked or returned a meaningless value since IntGaugeVec doesn't have a .get() without labels. Good catch.

  3. No tests for the new OrderJournalWriter. The PR checklist notes "Added tests" is unchecked. The journal observer trait has a NullOrderJournalObserver for testing, which is good, but there are no unit tests verifying the OrderJournalWriter logic (correct mapping of SimulatedOrderCommand to OrderJournalRow, correct enum values, etc.).

Comment on lines +134 to +141
/// Enum matching ClickHouse `Enum8('Tx' = 1, 'Bundle' = 2)`.
#[derive(Debug, Clone, Copy, Serialize_repr, Deserialize_repr)]
#[repr(i8)]
pub enum OrderJournalOrderType {
Tx = 0,
Bundle = 1,
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale doc comments: The comments reference 'Tx' = 1, 'Bundle' = 2 and 'Add' = 1, 'Remove' = 2, but the actual Rust values were changed to 0, 1 in a subsequent commit. Please update the doc comments to match the actual enum values (or fix the enum values if the ClickHouse schema truly expects 1-based values).

The same issue applies to the schema comment on lines 157-159 (order_type Enum8('Tx' = 1, 'Bundle' = 2) and operation_type Enum8('Add' = 1, 'Remove' = 2)).

order_type,
order_hash: order_id.fixed_bytes(),
operation_type,
sequence_number: command.sequence_number() as u32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JournalSequenceNumber is usize (64-bit on 64-bit platforms) but is cast to u32 here. If the sequence number ever exceeds u32::MAX, this silently wraps. While unlikely within a single slot, consider using u32::try_from().unwrap_or(u32::MAX) or at least documenting the assumption.

Comment on lines 228 to 240
let mut next_journal_sequence_number = 0;
while let Some(input) = input.orders.blocking_recv() {
// Failing is not critical.
let _ = root_hasher_prefetcher_sender.try_send(input.clone());
let journal_command =
SimulatedOrderJournalCommand::new(input, next_journal_sequence_number);
next_journal_sequence_number += 1;
order_journal_observer.order_delivered(&slot_data_copy, &journal_command);
// we don't create new subscribers to the broadcast so here we can be sure that err means end of receivers
if broadcast_input.send(input).is_err() {
if broadcast_input.send(journal_command).is_err() {
trace!("Cancelling simulated orders send job, destination stopped");
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor performance note: order_delivered (in OrderJournalWriter) clones self.builder_name (a String) and calls OffsetDateTime::now_utc() (syscall) for every order event on this hot path. For thousands of orders per slot this adds up. Consider using Arc<str> for the builder name to avoid per-event allocations.

Comment on lines 378 to 392
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both BuiltBlocksWriter and OrderJournalWriter share the same disk_backup instance (via Clone on line 382). If DiskBackup uses the same underlying storage path, both writers' backup data could interleave and compete for the same disk space quota. Is this intentional, or should each writer have its own backup instance with separate paths?

@claude
Copy link
Contributor

claude bot commented Feb 18, 2026

Claude finished @ZanCorDX's task in 6m 34s —— View job


Review of PR #884 — Order Journal (updated after sync)

  • Read repo guidelines
  • Analyze updated PR diff against develop
  • Review changed files for bugs, security, and performance issues
  • Check if previous review issues were addressed
  • Post review feedback with inline comments

Critical: build_new_block contains a reachable panic!

File: crates/rbuilder/src/building/builders/parallel_builder/block_building_result_assembler.rs:257-260

The newly added panic!("TODO: next_journal_sequence_number not set ...") is reachable — the loop above breaks normally when all order groups are exhausted, then set_trace_fill_time runs, then the panic fires. The #[allow(unreachable_code)] annotation suppresses the compiler warning but does not make the code unreachable. If the parallel builder's build_new_block is ever called in production, the builder will crash. This must be resolved before merging. See inline comment.


Bug: OrderIntakeStore::consume_next_batch ignores channel close (still unfixed)

File: crates/rbuilder/src/building/builders/parallel_builder/order_intake_store.rs:29-33

This issue from the previous review remains: blocking_consume_next_commands() returns Ok(None) when the channel closes, but consume_next_batch discards this and always returns Ok(true). The parallel builder loop will never exit on channel close.

Fix:

pub fn consume_next_batch(&mut self) -> eyre::Result<bool> {
    let should_continue = self.order_consumer.blocking_consume_next_commands()?.is_some();
    self.order_consumer.apply_new_commands(&mut self.order_sink);
    Ok(should_continue)
}

Stale doc comments on enum values (still unfixed)

File: crates/rbuilder-operator/src/clickhouse.rs:135-148

Comments reference 'Tx' = 1, 'Bundle' = 2 and 'Add' = 1, 'Remove' = 2, but the actual Rust enum values are 0 and 1. The schema doc block on lines 157-159 has the same mismatch. See inline comment.


Previous review issues — status

Issue Status
consume_next_batch ignores channel close Still open
Stale doc comments on enum values Still open
usize to u32 truncation on sequence_number (line 612→525) Still presentcommand.sequence_number() as u32 at line 612
Per-event String::clone + syscall on hot path Still presentself.builder_name.clone() at line 606
Shared DiskBackup between writers Still presentdisk_backup.clone() at flashbots_config.rs:381

New observation: next_journal_sequence_number tracking

The next_journal_sequence_number field added to BuiltBlockTrace and BlockRow tracks the journal watermark at block build time. This is correctly threaded through the ordering builder path (ordering_builder.rs:135-148block_building_helper.rs:289). However, the parallel builder path hardcodes 0 at block_building_result_assembler.rs:191 and 269, making the field meaningless for parallel-built blocks. The TODO panic (discussed above) is an acknowledgment of this gap but is the wrong solution.

Comment on lines +257 to +260
panic!(
"TODO: next_journal_sequence_number not set in BlockBuildingHelperFromProvider::new"
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This panic! will crash the builder in production if build_new_block is ever called. The #[allow(unreachable_code)] annotation hides the fact that this code IS reachable — the loop above breaks normally when all groups are exhausted, falling through to set_trace_fill_time and then hitting this panic. This needs to be resolved before merging: either properly propagate next_journal_sequence_number through the parallel builder's OrderIntakeStore, or return an error instead of panicking.

Comment on lines +135 to +141
/// Enum matching ClickHouse `Enum8('Tx' = 1, 'Bundle' = 2)`.
#[derive(Debug, Clone, Copy, Serialize_repr, Deserialize_repr)]
#[repr(i8)]
pub enum OrderJournalOrderType {
Tx = 0,
Bundle = 1,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale doc comments: Comments reference 'Tx' = 1, 'Bundle' = 2 and 'Add' = 1, 'Remove' = 2, but the actual Rust enum values are 0 and 1. The schema doc block on lines 157-159 has the same mismatch. Either update the comments to match the code, or fix the enum values to match the ClickHouse schema.

Suggested change
/// Enum matching ClickHouse `Enum8('Tx' = 1, 'Bundle' = 2)`.
#[derive(Debug, Clone, Copy, Serialize_repr, Deserialize_repr)]
#[repr(i8)]
pub enum OrderJournalOrderType {
Tx = 0,
Bundle = 1,
}
/// Enum matching ClickHouse `Enum8('Tx' = 0, 'Bundle' = 1)`.
#[derive(Debug, Clone, Copy, Serialize_repr, Deserialize_repr)]
#[repr(i8)]
pub enum OrderJournalOrderType {
Tx = 0,
Bundle = 1,
}
/// Enum matching ClickHouse `Enum8('Add' = 0, 'Remove' = 1)`.

@claude
Copy link
Contributor

claude bot commented Feb 18, 2026

Claude finished @ZanCorDX's task in 11m 19s —— View job


Review of PR #884 — Order Journal (re-review after sync)

  • Read repo guidelines
  • Analyze latest PR diff against develop
  • Check if previous review issues were addressed
  • Review new/changed code for bugs, security, and performance
  • Post review feedback with inline comments

Critical: build_new_block contains a reachable panic!

File: crates/rbuilder/src/building/builders/parallel_builder/block_building_result_assembler.rs:257-259

The panic!("TODO: next_journal_sequence_number not set ...") will crash the builder in production. The #[allow(unreachable_code)] annotation suppresses the compiler warning but does not make the code unreachable — the loop above breaks normally when all groups are exhausted, then set_trace_fill_time runs, then the panic fires. This must be resolved before merging. See inline comment.


Bug: OrderIntakeStore::consume_next_batch ignores channel close (still unfixed)

File: crates/rbuilder/src/building/builders/parallel_builder/order_intake_store.rs:29-33

This issue from previous reviews remains. blocking_consume_next_commands() returns Ok(None) when the channel closes, but consume_next_batch discards this and always returns Ok(true). The caller at parallel_builder/mod.rs:257 checks if !ok { break 'building; }, so the parallel builder loop will never exit on channel close — it will spin calling blocking_consume_next_commands() which keeps returning Ok(None).

Fix:

pub fn consume_next_batch(&mut self) -> eyre::Result<bool> {
    let should_continue = self.order_consumer.blocking_consume_next_commands()?.is_some();
    self.order_consumer.apply_new_commands(&mut self.order_sink);
    Ok(should_continue)
}

Fix this →


Stale doc comments on enum values (still unfixed)

File: crates/rbuilder-operator/src/clickhouse.rs:137-151, 160-162

Comments reference 'Tx' = 1, 'Bundle' = 2 and 'Add' = 1, 'Remove' = 2, but the actual Rust enum values are 0 and 1. The schema doc block has the same mismatch. See inline comments: enums, schema.


Previous review issues — status

Issue Status
build_new_block reachable panic Still openinline
consume_next_batch ignores channel close Still open (not in diff; noted above)
Stale doc comments on enum values Still openinline
usize to u32 truncation on sequence_number Still presentinline
Per-event String::clone on hot path Still presentself.builder_name.clone() at clickhouse.rs:645
Shared DiskBackup between writers Not an issueDiskBackup uses Arc<RwLock<redb::Database>> and is designed to be shared across tables

Observations

  1. Prefetcher refactoring is sound. The switch from broadcast::Receiver to std::sync::mpsc::Receiver for the root hash prefetcher eliminates Lagged errors and simplifies the flow. The fan-out pattern in start_building_job (line 238-254) cleanly distributes to broadcast (builders) + sync mpsc (prefetcher) + observer.

  2. BackupNotTooBigRelaySubmissionPolicy::should_submit now correctly sums both tables. The change to use .with_label_values(&[BLOCKS_TABLE_NAME]) and .with_label_values(&[ORDER_JOURNAL_TABLE_NAME]) properly aggregates disk backup size across both the blocks and journal tables.

  3. OrderJournalWriter Drop impl correctly removes the slot from active_slots, preventing resource leaks.

  4. No tests for OrderJournalWriter. The PR checklist notes "Added tests" is unchecked. The NullOrderJournalObserver exists for test isolation, but there are no unit tests verifying the mapping logic in OrderJournalWriter::order_delivered.

Comment on lines +257 to +259
panic!(
"TODO: next_journal_sequence_number not set in BlockBuildingHelperFromProvider::new"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: Reachable panic! in production code

This panic! is reachable — the loop above breaks normally when all order groups are exhausted, set_trace_fill_time runs, and then this fires. The #[allow(unreachable_code)] on line 182 suppresses the compiler warning but doesn't make the code actually unreachable.

If the parallel builder is enabled, build_new_block will crash the builder process. This must be resolved before merging. Options:

  1. Properly propagate next_journal_sequence_number through the parallel builder's OrderIntakeStore (it already wraps OrderConsumer which tracks it).
  2. Return an Err(...) instead of panicking if you want to defer the implementation.
  3. At minimum, use eyre::bail!("...") so it returns an error instead of crashing.

Comment on lines +137 to +151
/// Enum matching ClickHouse `Enum8('Tx' = 1, 'Bundle' = 2)`.
#[derive(Debug, Clone, Copy, Serialize_repr, Deserialize_repr)]
#[repr(i8)]
pub enum OrderJournalOrderType {
Tx = 0,
Bundle = 1,
}

/// Enum matching ClickHouse `Enum8('Add' = 1, 'Remove' = 2)`.
#[derive(Debug, Clone, Copy, Serialize_repr, Deserialize_repr)]
#[repr(i8)]
pub enum OrderJournalOperationType {
Add = 0,
Remove = 1,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale doc comments: enum values mismatch

The comments say 'Tx' = 1, 'Bundle' = 2 and 'Add' = 1, 'Remove' = 2, but the actual Rust values are 0 and 1. The same mismatch exists in the schema doc block on lines 160-162. Either update the comments to match the code, or fix the enum values to match the ClickHouse schema.

Suggested change
/// Enum matching ClickHouse `Enum8('Tx' = 1, 'Bundle' = 2)`.
#[derive(Debug, Clone, Copy, Serialize_repr, Deserialize_repr)]
#[repr(i8)]
pub enum OrderJournalOrderType {
Tx = 0,
Bundle = 1,
}
/// Enum matching ClickHouse `Enum8('Add' = 1, 'Remove' = 2)`.
#[derive(Debug, Clone, Copy, Serialize_repr, Deserialize_repr)]
#[repr(i8)]
pub enum OrderJournalOperationType {
Add = 0,
Remove = 1,
}
/// Enum matching ClickHouse `Enum8('Tx' = 0, 'Bundle' = 1)`.
#[derive(Debug, Clone, Copy, Serialize_repr, Deserialize_repr)]
#[repr(i8)]
pub enum OrderJournalOrderType {
Tx = 0,
Bundle = 1,
}
/// Enum matching ClickHouse `Enum8('Add' = 0, 'Remove' = 1)`.
#[derive(Debug, Clone, Copy, Serialize_repr, Deserialize_repr)]
#[repr(i8)]
pub enum OrderJournalOperationType {
Add = 0,
Remove = 1,
}

Comment on lines +160 to +162
/// order_type Enum8('Tx' = 1, 'Bundle' = 2)
/// order_hash FixedString(32)
/// operation_type Enum8('Add' = 1, 'Remove' = 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These schema doc values (= 1, = 2) don't match the actual enum values (= 0, = 1). Needs to be updated for consistency.

triggering_bid_relay,
bid_to_beat_seen_time,
bid_to_beat_seen_relay,
next_journal_sequence_number: built_block_trace.next_journal_sequence_number as u32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JournalSequenceNumber is usize (64-bit on x86_64), cast to u32 here. If the sequence number ever exceeds u32::MAX, this silently wraps. Consider u32::try_from(...).unwrap_or(u32::MAX) or changing the ClickHouse column to UInt64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments