Skip to content

PR3 Next Hop improvements#10735

Closed
NomDeTom wants to merge 29 commits into
meshtastic:developfrom
NomDeTom:PR3-tmm-nexthop
Closed

PR3 Next Hop improvements#10735
NomDeTom wants to merge 29 commits into
meshtastic:developfrom
NomDeTom:PR3-tmm-nexthop

Conversation

@NomDeTom

@NomDeTom NomDeTom commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

This pull request introduces several improvements and a robust multi-hop relay system.

It is built on top of #10719 and requires the inherited requirements from there.

Testing and validation:

  • Introduced a comprehensive hardware-level test (test_nexthop_multihop_recovery.py) for multi-hop NextHop directed-message delivery and relay-recovery. This test ensures that messages traverse relays correctly and that delivery recovers after relay outages, only running when a suitable topology is detected.

🤝 Attestations

  • I have tested that my proposed changes behave as described.
  • I have tested that my proposed changes do not cause any obvious regressions on the following devices:
    • Heltec (Lora32) V3
    • LilyGo T-Deck
    • LilyGo T-Beam
    • RAK WisBlock 4631
    • Seeed Studio T-1000E tracker card
    • Other (please specify below)

@NomDeTom NomDeTom requested a review from Copilot June 17, 2026 13:07
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

⚡ Try this PR in the Web Flasher

Flash this PR in the Web Flasher

firmware commit boards expires

Warning

This is an automated, unreviewed CI test build. Back up your device configuration
before flashing, and only flash devices you are able to recover.

Supported boards built by this PR (24)
Device Board Platform
Crowpanel Adv 3.5 TFT elecrow-adv-35-tft esp32-s3
Heltec HT62 heltec-ht62-esp32c3-sx1262 esp32-c3
Heltec Mesh Node 096 heltec-mesh-node-t096 nrf52840
Heltec Mesh Node T1 heltec-mesh-node-t1 nrf52840
Heltec Mesh Node T114 heltec-mesh-node-t114 nrf52840
Heltec V3 heltec-v3 esp32-s3
Heltec V4 heltec-v4 esp32-s3
Raspberry Pi Pico pico rp2040
Raspberry Pi Pico W picow rp2040
RAK WisMesh Tag rak_wismeshtag nrf52840
RAK WisBlock 11200 rak11200 esp32
RAK WisBlock 11310 rak11310 rp2040
RAK3312 rak3312 esp32-s3
RAK WisBlock 4631 rak4631 nrf52840
Seeed Wio Tracker L1 seeed_wio_tracker_L1 nrf52840
Seeed Xiao NRF52840 Kit seeed_xiao_nrf52840_kit nrf52840
Seeed Xiao ESP32-S3 seeed-xiao-s3 esp32-s3
Station G2 station-g2 esp32-s3
Station G3 station-g3 esp32-s3
LILYGO T-Deck t-deck-tft esp32-s3
LILYGO T-Echo t-echo nrf52840
LILYGO T-Echo Plus t-echo-plus nrf52840
LilyGo T3-C6 tlora-c6 esp32-c6
Seeed SenseCAP T1000-E tracker-t1000-e nrf52840

Build artifacts expire on 2026-07-18. Updated for 3e3182c.

@NomDeTom NomDeTom changed the title Next Hop improvements PR3 Next Hop improvements Jun 17, 2026
@github-actions github-actions Bot added needs-review Needs human review enhancement New feature or request labels Jun 17, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves directed-message (DM) delivery reliability in dense/volatile meshes by hardening NextHop routing against 1-byte last-byte collisions, adding route health decay/recovery behavior, and preserving PKI keys for long-tail nodes via a new warm-tier store. It also extends Traffic Management and position broadcast policy to better control airtime usage, and adds both unit and hardware-level validation (including a multi-hop bench test).

Changes:

  • Add warm-tier identity/key storage (WarmNodeStore) and route PKI key lookups through hot+warm tiers to keep DMs decryptable after NodeDB evictions.
  • Improve NextHop routing correctness and recovery (ambiguity-aware last-byte resolution, route-health TTL/failure decay, ACK-based success refresh) and integrate traceroute + traffic management hints.
  • Add new unit tests (warm store, NodeDB blocked retention, NextHop routing, position policy) plus a hardware multi-hop recovery test; add nRF52840 linker/guardrails for the reserved warm-store flash region.

Reviewed changes

Copilot reviewed 41 out of 44 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
variants/nrf52840/nrf52840.ini Use capped linker script to keep firmware below warm-store flash region.
variants/nrf52840/nrf52.ini Add post-link guard to enforce warm-store region is unused.
userPrefs.jsonc Document compile-time overrides for Traffic Management hop trimming.
test/test_warm_store/test_main.cpp New unit tests for WarmNodeStore admission/eviction/persistence behavior.
test/test_position_module/test_main.cpp New unit tests for PositionModule broadcast-policy helpers.
test/test_nodedb_blocked/test_main.cpp New unit tests for NodeDB demotion + favorite/ignored retention semantics.
test/test_nexthop_routing/test_main.cpp New unit tests for NextHop reliability mitigations (M1–M3) and hop-preserve logic.
src/platform/nrf52/nrf52840_s140_v7.ld Cap FLASH length to avoid warm-store raw-flash pages.
src/platform/nrf52/nrf52840_s140_v6.ld New v6 SoftDevice linker script with the same warm-store cap.
src/modules/TrafficManagementModule.h Refactor unified cache layout + add next-hop hint cache hooks and test clock.
src/modules/TraceRouteModule.cpp Mirror traceroute-derived next-hop into Traffic Management overflow cache.
src/modules/PositionModule.h Expose pure helpers and add stationary-detection helper signature.
src/modules/PositionModule.cpp Add stationary/fixed-position floor for position broadcast interval.
src/modules/AdminModule.cpp Use protected-flag API for favorite/ignore; allow blocking unknown nodes; optional warm-tier dump.
src/mesh/WarmNodeStore.h Introduce warm-tier record format and persistence backend contract.
src/mesh/WarmNodeStore.cpp Implement warm-tier store with nRF52840 raw-flash ring + file backend elsewhere.
src/mesh/Router.cpp Harden hop-preserve against last-byte collisions; use warm-tier keys for PKI decrypt/encrypt paths.
src/mesh/ReliableRouter.cpp Refresh route-health on end-to-end ACK success.
src/mesh/PositionPrecision.h Expose truncateCoordinate helpers for reuse/testing.
src/mesh/PositionPrecision.cpp Make truncateCoordinate public + handle precision 0/≥32 safely.
src/mesh/PacketHistory.cpp Clarify byte-domain semantics and where collision risk is mitigated.
src/mesh/NodeDBLegacyMigration.cpp Sanitize UTF-8 strings during legacy NodeDB migration to avoid encode failures.
src/mesh/NodeDB.h Add warm-tier hooks, protected-node cap API, and last-byte resolution primitives.
src/mesh/NextHopRouter.h Add route-health table + optional early-flood gate and test visibility.
src/mesh/NextHopRouter.cpp Enforce unique-neighbor next-hop gating; decay stale routes; mirror confirmed hops into TMM cache.
src/mesh/MeshTypes.h Define freshness window constant for last-byte neighbor resolution.
src/mesh/mesh-pb-constants.h Add warm-tier sizing + satellite caps; change Traffic Management defaults.
src/mesh/generated/meshtastic/module_config.pb.h Regenerated: add TrafficManagementConfig hop-trim/apply-to-private fields.
src/mesh/generated/meshtastic/localonly.pb.h Regenerated size updates due to module config growth.
src/mesh/generated/meshtastic/deviceonly.pb.h Regenerated size updates due to preferences growth.
src/mesh/FloodingRouter.cpp Prevent “higher-hopcount upgrade” from undoing hop trimming.
src/mesh/Default.h Add stationary position broadcast floor + hop-trim grace defaults and API.
src/mesh/Default.cpp Implement hop-trim grace derivation by role/portnum.
src/mesh/Channels.h Add isWellKnownChannel helper.
src/mesh/Channels.cpp Implement well-known-channel detection across preset display names.
src/graphics/draw/MenuHandler.cpp Route ignore/favorite through protected-flag logic and avoid unnecessary saves on refusal.
mcp-server/tests/mesh/test_nexthop_multihop_recovery.py New hardware multi-hop DM delivery + relay outage recovery test (topology-gated).
extra_scripts/nrf52_warm_region.py Post-link check to fail builds that overlap reserved warm-store flash pages.
docs/nexthop-routing-reliability.md Add in-repo design/analysis doc for NextHop reliability mitigations.
.github/copilot-instructions.md Document warm tier, satellite caps, and on-boot self-care expectations for contributors.

Comment thread src/mesh/mesh-pb-constants.h
Comment thread src/mesh/WarmNodeStore.h
Comment thread docs/nexthop-routing-reliability.md Outdated
Comment thread src/modules/PositionModule.cpp Outdated
@NomDeTom NomDeTom force-pushed the PR3-tmm-nexthop branch 2 times, most recently from c810627 to cc7762e Compare June 17, 2026 13:50
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Firmware Size Report

22 targets | vs develop: 22 increased, net +195,484 (+190.9 KB)

Target Size vs develop
t-deck-tft 3,801,472 📈 +12,080 (+11.8 KB)
rak3312 2,262,304 📈 +12,064 (+11.8 KB)
seeed-xiao-s3 2,266,240 📈 +12,064 (+11.8 KB)
t-eth-elite 2,481,232 📈 +11,536 (+11.3 KB)
heltec-vision-master-e213-inkhud 2,215,168 📈 +11,488 (+11.2 KB)
Show 17 more target(s)
Target Size vs develop
rak11200 1,851,104 📈 +11,136 (+10.9 KB)
elecrow-adv-35-tft 3,406,576 📈 +10,864 (+10.6 KB)
station-g3 2,255,632 📈 +10,704 (+10.5 KB)
heltec-v3 2,253,664 📈 +10,640 (+10.4 KB)
tlora-c6 2,358,224 📈 +10,048 (+9.8 KB)
heltec-ht62-esp32c3-sx1262 2,124,768 📈 +10,000 (+9.8 KB)
picow 1,236,148 📈 +9,912 (+9.7 KB)
pico 774,824 📈 +9,344 (+9.1 KB)
rak11310 797,448 📈 +9,344 (+9.1 KB)
seeed_xiao_rp2040 773,016 📈 +9,336 (+9.1 KB)
pico2w 1,212,108 📈 +9,332 (+9.1 KB)
pico2 762,192 📈 +8,768 (+8.6 KB)
seeed_xiao_rp2350 760,336 📈 +8,760 (+8.6 KB)
wio-e5 235,708 📈 +2,488 (+2.4 KB)
rak3172 183,228 📈 +2,200 (+2.1 KB)
heltec-v4 2,265,488 📈 +1,952 (+1.9 KB)
station-g2 2,255,632 📈 +1,424 (+1.4 KB)

Updated for dc21903

@NomDeTom NomDeTom force-pushed the PR3-tmm-nexthop branch 2 times, most recently from 17c791e to ebe7be2 Compare June 17, 2026 23:35
NomDeTom and others added 18 commits June 18, 2026 10:32
…store

Reworks the TrafficManagementModule cache layer (policing behaviour unchanged
from upstream) and adds a routing-hint overflow store:

- Flatten the ring: replace the cuckoo-hashed unified cache and the bucketed
  PSRAM NodeInfo index with plain flat arrays + linear scan (same idiom as
  WarmNodeStore). At LoRa packet rates an O(n) scan of the cache is negligible,
  and it removes a large amount of hashing/displacement complexity. The cache
  entry is 11 B; timestamps use a uniform +1 presence-offset so a 0 byte always
  means "empty" across every sub-store. Adds rebaseEpoch() so cached state
  survives the ~19 h relative-timestamp horizon instead of being flushed.

- Next-hop overflow cache: setNextHop/getNextHopHint store a confirmed last-byte
  relay for a destination, written only from NextHopRouter's ACK-confirmed
  decision (and mirrored from TraceRoute). NextHopRouter::getNextHop falls back
  to this cache when the hot NodeDB has no hint, so DMs/relays to long-tail
  nodes keep routing after the node ages out of NodeInfoLite.

- Persistence: preloadNextHopsFromNodeDB warm-starts the cache from persisted
  NodeInfoLite hints on first maintenance pass; next_hop entries are kept alive
  across the maintenance sweep (no TTL) and never clobbered by a stale preload.

All packet-policing logic (rate limit, position dedup, unknown-packet drop,
NodeInfo direct response, hop exhaustion) is the existing upstream behaviour,
untouched. HAS_TRAFFIC_MANAGEMENT defaults on so the module is compiled in. (see note).

Tests: upstream policing suite now actually runs (adds the MeshTypes.h include
that gates HAS_TRAFFIC_MANAGEMENT) plus 4 next-hop tests. Role-aware throttles,
politeness, precision clamp, port-interval and mesh-radius gating — and the
rate-limit >255 saturation fix — are deferred to the advanced-TMM branch.

Note: default dedup movement grid moves to ~91m, which also means 1.5km required to end up with the same signature position - coarser and therefore further than before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`node` in preloadNextHopsFromNodeDB() is never written through — mark
it const to satisfy cppcheck's constVariablePointer check in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Position dedup in TrafficManagementModule::handleReceived is gated on
channels.isWellKnownChannel(mp.channel). The test helper
installWellKnownPrimaryChannel() sets up channelFile/config.lora so that
gate is true, but it was defined and never called — so the dedup path was
never reached. test_tm_positionDedup_dropsDuplicateWithinWindow therefore
failed (duplicate forwarded -> CONTINUE instead of STOP), and
test_tm_positionDedup_allowsMovedPosition passed only vacuously.

Call installWellKnownPrimaryChannel() in both dedup tests so the dedup
path is genuinely exercised.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt-0

Copilot review (PR meshtastic#10706):
- preloadNextHopsFromNodeDB() now returns bool; runOnce only latches
  nextHopPreloaded once the preload actually ran (retries if nodeDB wasn't
  ready), instead of skipping it forever.
- Remove the empty `#if HAS_VARIABLE_HOPS` blocks in the test.

Test correctness:
- Three more position-dedup tests were missing installWellKnownPrimaryChannel()
  (dropsDuplicate/allowsMoved were fixed earlier; allowsDuplicateAfterInterval,
  cacheFlush, priorRateState were not) — without the well-known-channel gate the
  dedup path never runs, so their STOP assertions failed.

Fake-time injection (no more real sleeps):
- Add TrafficManagementModule::s_testNowMs + nowMs(), mirroring HopScalingModule;
  route all TMM tick/time reads through nowMs(). Tests advance a virtual clock via
  s_testNowMs instead of testDelay() sleeping real 5-6 min across a tick — the
  suite drops from ~15 min to ~30 s. Production behaviour is unchanged (nowMs()
  inlines to millis()).

Fingerprint-0 fix:
- computePositionFingerprint() never returns 0 now (remap 0 -> 0xFF, mirroring
  getLastByteOfNodeNum), so a real position that hashes to 0 doesn't collide with
  the "no position seen" sentinel and its duplicates dedup correctly.

test_traffic_management: 34/34 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The TMM relay dedup suppresses other nodes' duplicate positions for ~11h; mirror
that on the originator so we don't emit identical positions that get dropped anyway.

- Hold position broadcasts to a 12h floor when fixed_position is set (any role).
- Hold to the same floor when our position is unchanged beyond the broadcast
  precision (the user/channel-max resolution the on-wire position is truncated to).
- Genuine movement beyond that resolution keeps the normal interval, and the
  smart-broadcast branch still sends early on sub-interval movement.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PositionModule (stationary check) and TrafficManagementModule (dedup fingerprint)
each had their own coordinate-truncation primitive. Promote PositionPrecision's
truncateCoordinate to a shared, declared function and route both at it.

- Un-static truncateCoordinate; fold in the precision 0/>=32 guard so it's safe on
  the TMM dedup path that previously relied on truncateLatLon's guard.
- Add a uint8_t-precision overload (forwards to the uint32_t one) so TMM's uint8_t
  precision calls need no cast; the return stays int32_t (it's a coordinate).
- Remove TMM's duplicate truncateLatLon; PositionModule compares truncated coords
  directly instead of round-tripping through Position structs.

Core PositionModule no longer reaches into the optional TMM module for this.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split the two broadcast-policy decisions out of runOnce/positionUnchangedSinceLastSend
into pure static helpers so they're unit-testable without the module or a fake clock:

- positionWithinPrecisionCell(): two coords truncate to the same precision grid cell
  (stationary); precision 0 or >=32 never suppresses.
- effectiveBroadcastIntervalMs(): stationary positions are held to the 12h floor when
  that's the longer interval, else the normal configured interval.

test/test_position_module covers jitter-stays/move-leaves the cell, the 0 and >=32
precision guards, and the floor/interval selection. No fake time needed — the time
delta is the existing interval mechanism; only the floor decision is new and it's pure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Keyed on the originating node's advertised role (NodeDB lookup of p->from). Both
exceptions only relax filtering, never tighten it past the operator's config.

- Tracker / TAK tracker: cap the position dedup window at 1 hour so a stationary
  tracker may refresh a duplicate position hourly instead of every ~11h.
- Lost-and-found: throttle only to the shortest tick window (one kPosTimeTickMs),
  and skip the relayed-position precision clamp entirely (no anti-dox).

New cap default lives in Default.h (default_traffic_mgmt_tracker_position_min_interval_secs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror the TMM role exceptions on the originator side, but fixed position is still
held to the 12h floor for every role — a tracker or lost-and-found that pins itself
isn't doing its job, so it gets no exception.

- Fixed position (any role): 12h floor, unchanged.
- Lost-and-found (not fixed): never treated as stationary — broadcasts freely.
- Tracker / TAK tracker (not fixed): movement judged at the node's own configured
  (unclamped) precision instead of the on-wire public-clamped precision, so finer
  moves still trigger a send; floored only when stationary at that finer resolution.

positionUnchangedSinceLastSend() gains a useConfiguredPrecision flag for this.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- TrafficManagementModule.cpp: fix pos_fingerprint comment — zero is not
  "astronomically unlikely" but actively remapped to 0xFF; update comment
  to state the actual invariant
- AdminModule.cpp: log a warning for remote set_favorite_node and
  set_ignored_node requests that are refused at the protected-node cap
  (previously a silent no-op for non-local callers)
- WarmNodeStore.h: MIGRATION_VERBOSE default already set to 0 in PR1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Change bare PositionModule.h to modules/PositionModule.h — build_flags
sets -Isrc, not -Isrc/modules, so the bare form fails to resolve in
the native PlatformIO test env.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Default::hopTrimGrace(role, portnum) derives all tiers from one base grace
(default 2): infra and on-specialty senders +1, deprecated roles -1, else base.
userPrefs: USERPREFS_TMM_HOP_TRIM_DISABLE and _GRACE_BASE.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
alterReceived clamps a relayed position/telemetry broadcast's reach to the local
hop-scaling cap + role grace (hop_start adjusted to keep hopsAway honest; far end
stops with no final hop). FloodingRouter refuses the higher-hopcount upgrade for
trim-eligible packets via wouldHopTrim, so it can't undo the trim. Compile-time,
on by default, killed by USERPREFS_TMM_HOP_TRIM_DISABLE.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
NomDeTom and others added 11 commits June 18, 2026 10:33
8 cases warming getLastRequiredHop() through the real HopScaling sampling path
(fake time + sender roles in NodeDB): near/far clamp, grace tiers, cold no-op,
never-raise, and wouldHopTrim gating. 42/42 traffic-management cases pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ming)

- NodeDB.cpp addFromContact: only clear favourite bit and erase
  satellites when setProtectedFlag(IS_IGNORED) succeeds; log warning
  on cap refusal without side-effecting the entry
- PositionModule.cpp: replace (lat==0 && lon==0) sentinel with
  lastGpsSend==0 — the coordinate (0,0) is valid and would permanently
  disable stationary detection for nodes there
- mesh-pb-constants.h: add per-platform TRAFFIC_MANAGEMENT_CACHE_SIZE
  caps (STM32WL=0, nRF52840=200, ESP32-S3/portduino=2000, generic=1000)
  to avoid overallocating ~10 KB on RAM-constrained targets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…bility

- Introduced a new test suite for multi-hop NextHop directed-message delivery and relay recovery in `test_nexthop_multihop_recovery.py`. This includes tests for end-to-end delivery and recovery after relay drop.
- Implemented unit tests in `test_main.cpp` for NextHop routing reliability mitigations, covering:
  - M1: Ambiguity-aware last-byte resolution.
  - M2: NextHopRouter's strict-neighbor gate and hop limit checks.
  - M3: Route-health freshness and failure decay.
- Enhanced mock classes to facilitate controlled testing of node behaviors and routing logic.
- docs/nexthop-routing-reliability.md: update status from "no code
  changes yet" to reflect that mitigations and tests are implemented

RAM pressure and MIGRATION_VERBOSE concerns addressed upstream in
PR2.5 (per-platform TRAFFIC_MANAGEMENT_CACHE_SIZE) and PR2 (verbose
default=0) respectively; (0,0) sentinel fixed in PR2.5.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- NextHopRouter.cpp: qualify two RouteHealth *h locals as const — only
  read for stale-route checks, never mutated through the pointer
- Router.cpp: qualify meshtastic_NodeInfoLite *node as const in
  shouldDecrementHopLimit — only read for favorite/role predicate
- test_position_module/test_main.cpp: change bare PositionModule.h to
  modules/PositionModule.h — build_flags sets -Isrc, not -Isrc/modules,
  so the bare form fails to resolve in the native PlatformIO test env

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@NomDeTom NomDeTom closed this Jun 19, 2026
thebentern added a commit that referenced this pull request Jun 20, 2026
* TrafficManagement: flat unified cache + persistent next-hop overflow store

Reworks the TrafficManagementModule cache layer (policing behaviour unchanged
from upstream) and adds a routing-hint overflow store:

- Flatten the ring: replace the cuckoo-hashed unified cache and the bucketed
  PSRAM NodeInfo index with plain flat arrays + linear scan (same idiom as
  WarmNodeStore). At LoRa packet rates an O(n) scan of the cache is negligible,
  and it removes a large amount of hashing/displacement complexity. The cache
  entry is 11 B; timestamps use a uniform +1 presence-offset so a 0 byte always
  means "empty" across every sub-store. Adds rebaseEpoch() so cached state
  survives the ~19 h relative-timestamp horizon instead of being flushed.

- Next-hop overflow cache: setNextHop/getNextHopHint store a confirmed last-byte
  relay for a destination, written only from NextHopRouter's ACK-confirmed
  decision (and mirrored from TraceRoute). NextHopRouter::getNextHop falls back
  to this cache when the hot NodeDB has no hint, so DMs/relays to long-tail
  nodes keep routing after the node ages out of NodeInfoLite.

- Persistence: preloadNextHopsFromNodeDB warm-starts the cache from persisted
  NodeInfoLite hints on first maintenance pass; next_hop entries are kept alive
  across the maintenance sweep (no TTL) and never clobbered by a stale preload.

All packet-policing logic (rate limit, position dedup, unknown-packet drop,
NodeInfo direct response, hop exhaustion) is the existing upstream behaviour,
untouched. HAS_TRAFFIC_MANAGEMENT defaults on so the module is compiled in. (see note).

Tests: upstream policing suite now actually runs (adds the MeshTypes.h include
that gates HAS_TRAFFIC_MANAGEMENT) plus 4 next-hop tests. Role-aware throttles,
politeness, precision clamp, port-interval and mesh-radius gating — and the
rate-limit >255 saturation fix — are deferred to the advanced-TMM branch.

Note: default dedup movement grid moves to ~91m, which also means 1.5km required to end up with the same signature position - coarser and therefore further than before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* TrafficManagement: fix cppcheck constVariablePointer warning

`node` in preloadNextHopsFromNodeDB() is never written through — mark
it const to satisfy cppcheck's constVariablePointer check in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add multi-hop NextHop recovery tests and unit tests for routing reliability

- Introduced a new test suite for multi-hop NextHop directed-message delivery and relay recovery in `test_nexthop_multihop_recovery.py`. This includes tests for end-to-end delivery and recovery after relay drop.
- Implemented unit tests in `test_main.cpp` for NextHop routing reliability mitigations, covering:
  - M1: Ambiguity-aware last-byte resolution.
  - M2: NextHopRouter's strict-neighbor gate and hop limit checks.
  - M3: Route-health freshness and failure decay.
- Enhanced mock classes to facilitate controlled testing of node behaviors and routing logic.

* grafting fixed

* Address Copilot review for PR #10735 (NextHop improvements)

- docs/nexthop-routing-reliability.md: update status from "no code
  changes yet" to reflect that mitigations and tests are implemented

RAM pressure and MIGRATION_VERBOSE concerns addressed upstream in
PR2.5 (per-platform TRAFFIC_MANAGEMENT_CACHE_SIZE) and PR2 (verbose
default=0) respectively; (0,0) sentinel fixed in PR2.5.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* CI: fix cppcheck constVariablePointer and test include path

- NextHopRouter.cpp: qualify two RouteHealth *h locals as const — only
  read for stale-route checks, never mutated through the pointer
- Router.cpp: qualify meshtastic_NodeInfoLite *node as const in
  shouldDecrementHopLimit — only read for favorite/role predicate
- test_position_module/test_main.cpp: change bare PositionModule.h to
  modules/PositionModule.h — build_flags sets -Isrc, not -Isrc/modules,
  so the bare form fails to resolve in the native PlatformIO test env

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* WarmStore: cache device role + protected category in last_heard low bits

Steal the low 6 bits of WarmNodeEntry.last_heard to carry an evicted node's
device role (4 bits) and a protected category (2 bits) for the hop-trim path,
at zero record-size cost (entry stays 40 B; no RAM/flash growth). The high bits
remain a real unix-seconds timestamp, quantised to 64 s — ample for warm LRU
ordering of long-tail nodes.

- absorb() packs role/protectedCat; place()/ring replay store the raw word so
  metadata round-trips through flash. LRU compares masked time (warmTimeOf).
- take() rehydration masks the metadata bits and restores the cached role so a
  re-admitted node isn't stuck at CLIENT until its next NodeInfo.
- NodeDB classifies the category (favorite/ignored/verified -> Flag;
  tracker/sensor/tak_tracker -> Role) at each eviction site.
- WarmNodeStore::lookupMeta() exposes role/category to consumers.
- Bump WARM_RING_MAGIC (WRNG->WRN2): old rings read as erased and rebuild;
  warm data is a non-critical evictee cache, so discard-on-upgrade is safe.

Tests: test_warm_store 11/11 (new meta round-trip + quantisation-aware ordering);
NodeDB compiles (test_nodedb_blocked 4/4).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WarmStore: migrate v1 rings/files by discarding last_heard, not the data

Previously the WRNG->WRN2 magic bump treated old rings as erased, discarding all
warm entries — including the PKI public keys that let evicted nodes keep
decrypting DMs. Instead, read v1 (WRNG / WRM1) records and keep each node's
identity + public key, discarding only last_heard (its low bits would otherwise
be misread as the new role/protected metadata). Records re-rank and re-learn
their role on next contact.

- Ring backend (nRF52840): ringReadHeader accepts both magics and reports v1 via
  an out-param; replay zeroes last_heard for v1 records. If the active head page
  is v1, force a rotation so new v2 records never land in a v1-headered page
  (which would discard their freshly-set role on the next load). Legacy pages
  convert to v2 as the ring rotates.
- File backend (warm.dat): bump WARM_STORE_MAGIC WRM1->WRM2; accept WRM1, verify
  CRC against the stored bytes, then discard last_heard and mark dirty so the
  next save rewrites as v2.

Tests: test_warm_store 12/12 (adds test_ws_v1_migration_discardsLastHeard:
key survives, role/protected reset).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WarmStore: guard role bit-width + test eviction carries role/protected

- static_assert that the device role enum still fits the 4-bit warm metadata
  field (WARM_ROLE_MASK); fails the build loudly if a new role is added past 15
  rather than silently truncating role on eviction. (Max role today = 12.)
- Add test_migration_carriesRoleAndProtectedIntoWarm: a demoted TRACKER lands in
  the warm tier with its key, role=TRACKER and protected category=Role; a demoted
  CLIENT carries role=CLIENT/None. Exercises the NodeDB eviction path +
  warmProtectedCategory classification (the warm-store unit tests only cover
  absorb() directly).

Tests: test_nodedb_blocked 5/5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix copilot comments

* fix(test): restore #if HAS_TRAFFIC_MANAGEMENT guard in TMM test

The rebase onto PR1.5 lost the top-level HAS_TRAFFIC_MANAGEMENT guard
that PR1.5 introduced, leaving the #else/#endif tail orphaned and
causing compile errors on non-TMM builds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Ben Meadors <benmmeadors@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request needs-review Needs human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants