sweep: don't ratchet starting fee rate on non-fee failures by jtobin · Pull Request #10898 · lightningnetwork/lnd

jtobin · 2026-06-11T13:23:59Z

Supplementary to #10897 (and replaces #10816).

The sweeper responds to a failed sweep attempt by ratcheting the input's fee rate for the next one. For fee-related failures this is the right response, but the same machinery also fires on resource failures, e.g. ErrNotEnoughInputs or ErrNotEnoughBudget. These failures carry no fee rate signal, so ratcheting the fee rate can't resolve them.

Each spurious ratchet raises the bar the input's budget must clear, and once the starting rate exceeds an input's budget, the aggregator filters it out of every future input set and the sweep is silently stranded. This PR skips the ratchet for failures with no fee dimension, preserving the input's existing starting fee rate across retries.

The TxUnknownSpend path also attaches a retry fee rate. It's deliberately left untouched here, since an unknown spend can reflect a genuine fee race with a counterparty; with #10897, that residual ratchet can no longer strand small-budget inputs.

gemini-code-assist · 2026-06-11T13:24:11Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where the sweeper would incorrectly increase the starting fee rate for sweep attempts that failed due to resource constraints rather than fee-related issues. By distinguishing between fee-related failures and resource-related failures, the system now avoids ratcheting the fee rate when it cannot resolve the underlying problem, ensuring that inputs are not unnecessarily stranded.

Highlights

Prevent unnecessary fee rate ratcheting: Modified the sweeper logic to skip fee rate increments when encountering resource-related failures like ErrNotEnoughInputs or ErrNotEnoughBudget, which do not have a fee-rate dimension.
Preserve existing fee rates: Updated the sweeper to preserve the existing starting fee rate when a failure is reported with a zero fee rate, preventing inputs from being stranded due to artificially inflated fee requirements.
Comprehensive testing: Added new unit tests in both fee_bumper_test.go and sweeper_test.go to verify that resource failures do not trigger fee rate increments and that existing rates are correctly preserved.

New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request modifies the sweeper's fee bumping logic to avoid ratcheting up the starting fee rate on resource-related failures (such as ErrNotEnoughInputs and ErrNotEnoughBudget). Instead, these failures preserve the existing starting fee rate to prevent inputs from being permanently stranded. Corresponding unit tests have been added and updated to verify this behavior. A review comment suggests correcting a grammatical error in a warning log message from 'Fail to' to 'Failed to'.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-11T13:25:53Z

+		errors.Is(err, ErrNotEnoughInputs) {

-	// Calculate the next fee rate for the retry.
+		log.Warnf("Fail to fee bump tx %v: %v", oldTx.TxHash(), err)


The log message uses the incorrect verb tense "Fail to". It should be "Failed to" to be grammatically correct and consistent with other log messages in this file (e.g., "Failed to bump tx").

Suggested change

log.Warnf("Fail to fee bump tx %v: %v", oldTx.TxHash(), err)

log.Warnf("Failed to fee bump tx %v: %v", oldTx.TxHash(), err)

github-actions · 2026-06-11T13:38:57Z

PR Severity: CRITICAL. All changed files are in sweep/* (fee_bumper.go and sweeper.go), which handles output sweeping, fund recovery, and fee bumping — a CRITICAL package. 2 non-test files, 86 non-test lines changed (well under bump thresholds). Test files excluded: fee_bumper_test.go, sweeper_test.go. To override, add a severity-override-{critical,high,medium,low} label.

When a sweep fails to publish, the sweeper computes a higher retry fee rate and stores it as the input's new starting rate. That's the right move for fee-related failures, where the broadcast was rejected for not paying enough, but the same machinery also fires on resource failures (no wallet UTXO available, the input set's budget exceeded), which carry no fee-rate signal. Repeatedly ratcheting the rate on these can push it past an input's intrinsic budget, after which the aggregator silently skips the input and the sweep is stranded forever. Distinguish fee-rate-bearing failures from resource ones at the point where the bump result is produced, and skip the rate ratchet for the latter: the input's existing starting fee rate is preserved across non-fee failures, so the next retry starts at the rate the original sweep request specified.

lightninglabs-deploy · 2026-06-18T15:08:29Z

@jtobin, remember to re-request review from reviewers when ready

saubyk · 2026-06-25T00:52:51Z

/gateway review

lightninglabs-gateway · 2026-06-25T00:53:05Z

✅ Review posted: #10898 (review)

6 finding(s); 6 inline, 0 in body.

🔁 Need a re-review after pushing changes? Reply with /gateway re-review.
Maintainers can also /gateway dismiss <id> to silence specific findings, or anyone can /gateway explain <id> for elaboration.

lightninglabs-gateway

This PR stops the sweeper from ratcheting an input's starting fee rate on resource failures (ErrNotEnoughInputs, ErrNotEnoughBudget), which previously could push the starting rate past an input's budget and silently strand the sweep. The intent is sound and the fix is well-targeted, and the new tests directly assert the no-ratchet behavior on both the initial and replacement paths.

The mechanism, however, is a magic value: a BumpResult with FeeRate == 0 now means "preserve the existing starting rate," consumed by markInputsPublishFailed. This is only correct if zero is produced exclusively by the two resource-failure branches and never by a legitimate fee computation — and the retained ErrMaxPosition branch and the "all other failures" branch in handleReplacementTxError both still attach calculateRetryFeeRate(r) to a TxFailed result. The correctness of the entire change rests on that disjointness, which the visible diff does not establish. On a CRITICAL fund-recovery path this needs to be confirmed (or the intent carried explicitly) before merge. Secondary concerns: the two sibling error handlers now classify the same error set with divergent code shapes, and a comment carried over from the old shared-case structure now misdescribes ErrMaxPosition.

Findings: 🔴 0 Blocker · 🟠 2 Major · 🟡 4 Minor · 🔵 0 Nit

lightninglabs-gateway · 2026-06-25T00:57:54Z

+		// FeeRate at zero; preserving the input's existing starting
+		// rate in that case avoids stranding inputs whose intrinsic
+		// budget can't accommodate a higher rate.
+		if feeRate == 0 {


🟠 Major F1: FeeRate==0 overloaded as a 'preserve' sentinel; collision with a legitimate zero is unproven

markInputsPublishFailed now treats feeRate == 0 as "no fee dimension — preserve the input's existing StartingFeeRate" and continues without writing it. This invariant holds only if a successful fee computation can never yield zero. It is not closed in the changed code: the retained ErrMaxPosition case in handleInitialTxError (sweep/fee_bumper.go:1118) and the "all other fee-bump failures" tail of handleReplacementTxError both assign result.FeeRate = calculateRetryFeeRate(r) to a TxFailed result, and handleInitialTxError assigns that value even when calculateRetryFeeRate returned an error (only Event/Err flip to fatal). BumpResult.Validate() requires a non-zero FeeRate only for TxConfirmed, so a zero on TxFailed passes silently. If calculateRetryFeeRate or feeFunction.FeeRate() can legitimately return 0 (a fee floor of zero, a zero-weight/zero-deadline edge, or the pre-first-ratchet starting rate), that result is silently reinterpreted as "preserve," freezing the input at a stale rate — the same class of mis-pricing this PR is trying to eliminate. calculateRetryFeeRate and FeeRate() are not present in the provided file_contents (the file was truncated), so I cannot confirm reachability; this is a blocker if zero is reachable on any non-resource path. The robust fix is to not overload the value channel — carry the intent explicitly, e.g. fn.Option[chainfee.SatPerKWeight] or a preserveStartingFeeRate bool on BumpResult, so "no fee signal" and "a computed rate of zero" are distinguishable.

lightninglabs-gateway · 2026-06-25T00:57:54Z

+	case errors.Is(err, ErrNotEnoughInputs),
+		errors.Is(err, ErrNotEnoughBudget):
+
+		result.Event = TxFailed


🟠 Major F2: Preserving the rate on budget failure may strand a deadline-sensitive sweep

For ErrNotEnoughBudget/ErrNotEnoughInputs the input now retries indefinitely at its original starting fee rate. For a budget-exhausted input that is the intended fix, but the same branch is reachable for inputs with a confirmation deadline (e.g. an outgoing-HTLC timeout sweep) whose budget is momentarily insufficient while the deadline still approaches. If deadline pressure escalates the fee only through the now-skipped ratchet path, such a sweep can sit at a low rate and miss its deadline — a fund-safety regression and a potential griefing lever for a counterparty that can keep a channel near its budget ceiling. I cannot tell from the diff whether a separate deadline-driven escalation (independent of this ratchet) keeps time-sensitive sweeps moving; the relevant fee-function/deadline code is not in the loaded context. Confirm that preserving the starting rate on a budget failure cannot strand a deadline-bound sweep before relying on this behavior; if it can, the resource-failure handling needs to distinguish deadline-bound inputs.

lightninglabs-gateway · 2026-06-25T00:57:54Z

 	// so these inputs can be retried with a different group in the next
-	// block.
+	// block. The error is fee-related (the linear fee function has run
+	// past its budget), so we still calculate a retry fee rate.


🟡 Minor F3: Retained ErrMaxPosition ratchet branch lacks an asserting test

The refactor splits the former fallthrough so ErrMaxPosition keeps calling calculateRetryFeeRate while the resource errors no longer do. This divergence is the load-bearing semantic of the PR, and the deleted fallthrough is easy to misread. The new TestHandleInitialTxErrorNoRatchetOnResourceFailure covers the resource branch, but the diff shows no test asserting that ErrMaxPosition still ratchets (still attaches a non-zero FeeRate). Add coverage so a future edit can't silently fold ErrMaxPosition back into the no-ratchet set.

lightninglabs-gateway · 2026-06-25T00:57:54Z

@@ -1114,20 +1114,13 @@ func (t *TxPublisher) handleInitialTxError(r *monitorRecord, err error) {



🟡 Minor F4: ErrMaxPosition comment conflates position exhaustion with budget exhaustion

The rewritten comment on case errors.Is(err, ErrMaxPosition) opens "When the error is due to budget being used up" — wording carried over from when this case shared a block with ErrNotEnoughBudget. Now that the budget errors live in a separate case below, this conflates the fee-function position error with a budget error and will mislead the next reader. Reword to describe position exhaustion of the linear fee function.

lightninglabs-gateway · 2026-06-25T00:57:54Z

+	// at zero signals the sweeper to preserve the input's existing
+	// starting fee rate; otherwise repeated resource failures would
+	// ratchet the rate past the input's intrinsic budget and strand it.
+	if errors.Is(err, ErrNotEnoughBudget) ||


🟡 Minor F5: Resource-failure classification duplicated across two handlers with divergent shapes

errors.Is(err, ErrNotEnoughInputs) || errors.Is(err, ErrNotEnoughBudget), plus a near-identical multi-line justification, now appears in both handleInitialTxError (as a switch case) and handleReplacementTxError (as an if/early-return). This is the second reorganization of this classification in ~4 months (blame: PR #9447 2025-02-12, PR #9627 2025-03-20), so the contract is still churning — and the two sites must stay in lockstep. Extract a single predicate (e.g. isResourceFailure(err error) bool) and reference it from both, so a future third resource error can't be added to one handler and missed in the other.

lightninglabs-gateway · 2026-06-25T00:57:54Z

+		errors.Is(err, ErrNotEnoughInputs) {

-	// Calculate the next fee rate for the retry.
+		log.Warnf("Fail to fee bump tx %v: %v", oldTx.TxHash(), err)


🟡 Minor F6: Relocated log line keeps grammar error 'Fail to fee bump'

The relocated warning logs "Fail to fee bump tx %v: %v", inconsistent with the sibling log.Errorf("Failed to bump tx %v: %v", ...) a few lines down (also already flagged in review). Since the surrounding block is being rewritten anyway, fix it in this pass.

Suggested change

log.Warnf("Fail to fee bump tx %v: %v", oldTx.TxHash(), err)

log.Warnf("Failed to fee bump tx %v: %v", oldTx.TxHash(), err)

lightninglabs-gateway · 2026-06-25T00:57:57Z

🤖 gateway audit metadata for this PR — auto-generated, please don't edit.

jtobin self-assigned this Jun 11, 2026

jtobin mentioned this pull request Jun 11, 2026

sweep: fix utxo contention and spurious fee ratchet #10816

Closed

jtobin linked an issue Jun 11, 2026 that may be closed by this pull request

[bug]: After Taproot Assets channel force-closure, funds are not swept. lightninglabs/taproot-assets#1888

Closed

gemini-code-assist Bot reviewed Jun 11, 2026

View reviewed changes

jtobin linked an issue Jun 11, 2026 that may be closed by this pull request

[flake]: suspected custom_channels_htlc_force_close_MPP flake lightninglabs/taproot-assets#1976

Closed

github-actions Bot added the severity-critical Requires expert review - security/consensus critical label Jun 11, 2026

jtobin added 2 commits June 11, 2026 13:14

docs: add release note

b1da0b2

jtobin force-pushed the sweep-fee-ratchet branch from 893025b to b1da0b2 Compare June 11, 2026 15:48

jtobin requested a review from ziggie1984 June 12, 2026 11:35

lightninglabs-gateway Bot suggested changes Jun 25, 2026

View reviewed changes

lightninglabs-gateway Bot added the gateway-active label Jun 25, 2026

	log.Warnf("Fail to fee bump tx %v: %v", oldTx.TxHash(), err)
	log.Warnf("Failed to fee bump tx %v: %v", oldTx.TxHash(), err)

		@@ -1114,20 +1114,13 @@ func (t TxPublisher) handleInitialTxError(r monitorRecord, err error) {

Uh oh!

Conversation

jtobin commented Jun 11, 2026

Uh oh!

gemini-code-assist Bot commented Jun 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

lightninglabs-deploy commented Jun 18, 2026

Uh oh!

saubyk commented Jun 25, 2026

Uh oh!

lightninglabs-gateway Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lightninglabs-gateway Bot left a comment

Choose a reason for hiding this comment

Uh oh!

lightninglabs-gateway Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

lightninglabs-gateway Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

lightninglabs-gateway Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

lightninglabs-gateway Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

lightninglabs-gateway Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

lightninglabs-gateway Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

lightninglabs-gateway Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lightninglabs-gateway Bot commented Jun 25, 2026 •

edited

Loading