Pr2.5 tmm hop trimming#10719
Conversation
⚡ Try this PR in the Web FlasherWarning This is an automated, unreviewed CI test build. Back up your device configuration Supported boards built by this PR (24)
Build artifacts expire on 2026-07-18. Updated for |
a99a8c8 to
647cc6f
Compare
Firmware Size Report22 targets | vs
Show 17 more target(s)
Updated for 0712348 |
| // display name (e.g. a channel named "LongFast" counts even while the | ||
| // radio runs MediumFast). Broader than isDefaultChannel, which only | ||
| // matches the current preset's name and PSK byte 1. | ||
| bool isWellKnownChannel(ChannelIndex chIndex); |
There was a problem hiding this comment.
Would there be a way to use position dedup in a regional channel? We defaulted to AQ== as the key, but we're in time to make changes.
Well known key = well known channel makes sense to me. But i'm late to this discussion.
There was a problem hiding this comment.
There's a compile-time flag to allow some parts to apply to private channels. I'll have a think about how to extend that.
There was a problem hiding this comment.
@luivicur I've extended the "applies to private channels" userpref, and added it to the protobufs required. Fingers crossed...
be8bed5 to
815fb3b
Compare
There was a problem hiding this comment.
Pull request overview
This PR extends the Traffic Management Module (TMM) work (stacking on #10706) by adding hop-trimming for relayed public-channel broadcasts, plus a larger set of supporting changes to make long-tail routing and PKI identity more resilient on constrained targets (notably nRF52840).
Changes:
- Add warm-tier node identity storage (
WarmNodeStore) so evicted NodeDB entries can retain{nodenum, last_heard, public_key}and keep PKI DMs working for long-tail peers. - Add TMM routing-hint overflow cache (next-hop hints) and wire it into
NextHopRouterandTraceRouteModule. - Add TMM relay shaping features: hop-trimming for relayed broadcast telemetry/position (public channels by default) and a relayed position precision clamp.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| variants/nrf52840/nrf52840.ini | Sets an nRF52840 base linker script to keep the app below the warm-store flash region. |
| variants/nrf52840/nrf52.ini | Adds a post-link guard script to fail builds that overlap the warm-store flash region. |
| userPrefs.jsonc | Documents new build-time toggles for applying TMM to private channels and disabling/tuning hop trimming. |
| test/test_warm_store/test_main.cpp | New unit tests for WarmNodeStore admission/eviction, take/rehydrate, and persistence behavior. |
| test/test_traffic_management/test_main.cpp | Expands TMM tests for well-known-channel gating, virtual clock ticks, next-hop cache, and hop trimming (variable hops). |
| test/test_nodedb_blocked/test_main.cpp | New tests for NodeDB migration/demotion behavior and protected-node (favorite/ignored) retention. |
| src/platform/nrf52/nrf52840_s140_v7.ld | Shrinks FLASH region to end before the warm-store raw-flash ring. |
| src/platform/nrf52/nrf52840_s140_v6.ld | Adds a v6 SoftDevice linker script with the same warm-store flash cap. |
| src/modules/TrafficManagementModule.h | Updates cache model docs; adds hop-trim API, next-hop hint API, and a test clock hook. |
| src/modules/TrafficManagementModule.cpp | Implements hop-trimming, relayed position precision clamp, next-hop preload, and flat cache/tick timestamping. |
| src/modules/TraceRouteModule.cpp | Mirrors traceroute-derived next-hop hints into the TMM overflow cache. |
| src/modules/AdminModule.cpp | Routes favorite/ignore through NodeDB protected-cap enforcement; adds optional warm-tier debug dump. |
| src/mesh/WarmNodeStore.h | Introduces warm-tier data model, persistence strategy, and nRF52840 raw-flash ring layout. |
| src/mesh/WarmNodeStore.cpp | Implements warm-tier storage (file backend and nRF52840 raw-flash ring backend). |
| src/mesh/Router.cpp | Uses NodeDB copyPublicKey() (hot + warm tier) for PKI decrypt/encrypt key lookup. |
| src/mesh/NodeDBLegacyMigration.cpp | Sanitizes migrated legacy node names as UTF-8 to avoid nanopb encode failures. |
| src/mesh/NodeDB.h | Adds warm-tier integration, protected-node cap enforcement API, and new self-care helpers. |
| src/mesh/NodeDB.cpp | Implements self-care pass, warm-tier demotion/rehydration, satellite caps, protected-node cap, and always-on TMM defaults. |
| src/mesh/NextHopRouter.cpp | Writes confirmed next-hop into TMM overflow cache and consults it as a fallback. |
| src/mesh/mesh-pb-constants.h | Introduces WARM_NODE_COUNT and satellite caps, changes default node caps, and enables TMM by default. |
| src/mesh/FloodingRouter.cpp | Suppresses “higher-hopcount wins” upgrades when hop-trimming would be undone by the upgrade. |
| src/mesh/Default.h | Updates TMM defaults and adds hop-trim grace-base configuration. |
| src/mesh/Default.cpp | Implements grace calculation by sender role/portnum for hop-trimming. |
| src/mesh/Channels.h | Adds Channels::isWellKnownChannel() API. |
| src/mesh/Channels.cpp | Implements well-known-channel detection by PSK shape and preset display name matching. |
| src/graphics/draw/MenuHandler.cpp | Uses protected-cap enforcement for ignore; avoids saving/observer churn when a cap refusal occurs. |
| extra_scripts/nrf52_warm_region.py | New post-link guard ensuring the firmware image doesn’t overlap the warm-store reserved flash pages. |
| .github/copilot-instructions.md | Documents warm tier, satellite caps, and self-care behaviors for contributors/agents. |
815fb3b to
b69e0c7
Compare
a3ab60f to
756313c
Compare
019ed7a to
e534b80
Compare
…store Reworks the TrafficManagementModule cache layer (policing behaviour unchanged from upstream) and adds a routing-hint overflow store: - Flatten the ring: replace the cuckoo-hashed unified cache and the bucketed PSRAM NodeInfo index with plain flat arrays + linear scan (same idiom as WarmNodeStore). At LoRa packet rates an O(n) scan of the cache is negligible, and it removes a large amount of hashing/displacement complexity. The cache entry is 11 B; timestamps use a uniform +1 presence-offset so a 0 byte always means "empty" across every sub-store. Adds rebaseEpoch() so cached state survives the ~19 h relative-timestamp horizon instead of being flushed. - Next-hop overflow cache: setNextHop/getNextHopHint store a confirmed last-byte relay for a destination, written only from NextHopRouter's ACK-confirmed decision (and mirrored from TraceRoute). NextHopRouter::getNextHop falls back to this cache when the hot NodeDB has no hint, so DMs/relays to long-tail nodes keep routing after the node ages out of NodeInfoLite. - Persistence: preloadNextHopsFromNodeDB warm-starts the cache from persisted NodeInfoLite hints on first maintenance pass; next_hop entries are kept alive across the maintenance sweep (no TTL) and never clobbered by a stale preload. All packet-policing logic (rate limit, position dedup, unknown-packet drop, NodeInfo direct response, hop exhaustion) is the existing upstream behaviour, untouched. HAS_TRAFFIC_MANAGEMENT defaults on so the module is compiled in. (see note). Tests: upstream policing suite now actually runs (adds the MeshTypes.h include that gates HAS_TRAFFIC_MANAGEMENT) plus 4 next-hop tests. Role-aware throttles, politeness, precision clamp, port-interval and mesh-radius gating — and the rate-limit >255 saturation fix — are deferred to the advanced-TMM branch. Note: default dedup movement grid moves to ~91m, which also means 1.5km required to end up with the same signature position - coarser and therefore further than before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`node` in preloadNextHopsFromNodeDB() is never written through — mark it const to satisfy cppcheck's constVariablePointer check in CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Position dedup in TrafficManagementModule::handleReceived is gated on channels.isWellKnownChannel(mp.channel). The test helper installWellKnownPrimaryChannel() sets up channelFile/config.lora so that gate is true, but it was defined and never called — so the dedup path was never reached. test_tm_positionDedup_dropsDuplicateWithinWindow therefore failed (duplicate forwarded -> CONTINUE instead of STOP), and test_tm_positionDedup_allowsMovedPosition passed only vacuously. Call installWellKnownPrimaryChannel() in both dedup tests so the dedup path is genuinely exercised. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt-0 Copilot review (PR meshtastic#10706): - preloadNextHopsFromNodeDB() now returns bool; runOnce only latches nextHopPreloaded once the preload actually ran (retries if nodeDB wasn't ready), instead of skipping it forever. - Remove the empty `#if HAS_VARIABLE_HOPS` blocks in the test. Test correctness: - Three more position-dedup tests were missing installWellKnownPrimaryChannel() (dropsDuplicate/allowsMoved were fixed earlier; allowsDuplicateAfterInterval, cacheFlush, priorRateState were not) — without the well-known-channel gate the dedup path never runs, so their STOP assertions failed. Fake-time injection (no more real sleeps): - Add TrafficManagementModule::s_testNowMs + nowMs(), mirroring HopScalingModule; route all TMM tick/time reads through nowMs(). Tests advance a virtual clock via s_testNowMs instead of testDelay() sleeping real 5-6 min across a tick — the suite drops from ~15 min to ~30 s. Production behaviour is unchanged (nowMs() inlines to millis()). Fingerprint-0 fix: - computePositionFingerprint() never returns 0 now (remap 0 -> 0xFF, mirroring getLastByteOfNodeNum), so a real position that hashes to 0 doesn't collide with the "no position seen" sentinel and its duplicates dedup correctly. test_traffic_management: 34/34 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The TMM relay dedup suppresses other nodes' duplicate positions for ~11h; mirror that on the originator so we don't emit identical positions that get dropped anyway. - Hold position broadcasts to a 12h floor when fixed_position is set (any role). - Hold to the same floor when our position is unchanged beyond the broadcast precision (the user/channel-max resolution the on-wire position is truncated to). - Genuine movement beyond that resolution keeps the normal interval, and the smart-broadcast branch still sends early on sub-interval movement. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PositionModule (stationary check) and TrafficManagementModule (dedup fingerprint) each had their own coordinate-truncation primitive. Promote PositionPrecision's truncateCoordinate to a shared, declared function and route both at it. - Un-static truncateCoordinate; fold in the precision 0/>=32 guard so it's safe on the TMM dedup path that previously relied on truncateLatLon's guard. - Add a uint8_t-precision overload (forwards to the uint32_t one) so TMM's uint8_t precision calls need no cast; the return stays int32_t (it's a coordinate). - Remove TMM's duplicate truncateLatLon; PositionModule compares truncated coords directly instead of round-tripping through Position structs. Core PositionModule no longer reaches into the optional TMM module for this. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split the two broadcast-policy decisions out of runOnce/positionUnchangedSinceLastSend into pure static helpers so they're unit-testable without the module or a fake clock: - positionWithinPrecisionCell(): two coords truncate to the same precision grid cell (stationary); precision 0 or >=32 never suppresses. - effectiveBroadcastIntervalMs(): stationary positions are held to the 12h floor when that's the longer interval, else the normal configured interval. test/test_position_module covers jitter-stays/move-leaves the cell, the 0 and >=32 precision guards, and the floor/interval selection. No fake time needed — the time delta is the existing interval mechanism; only the floor decision is new and it's pure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Keyed on the originating node's advertised role (NodeDB lookup of p->from). Both exceptions only relax filtering, never tighten it past the operator's config. - Tracker / TAK tracker: cap the position dedup window at 1 hour so a stationary tracker may refresh a duplicate position hourly instead of every ~11h. - Lost-and-found: throttle only to the shortest tick window (one kPosTimeTickMs), and skip the relayed-position precision clamp entirely (no anti-dox). New cap default lives in Default.h (default_traffic_mgmt_tracker_position_min_interval_secs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror the TMM role exceptions on the originator side, but fixed position is still held to the 12h floor for every role — a tracker or lost-and-found that pins itself isn't doing its job, so it gets no exception. - Fixed position (any role): 12h floor, unchanged. - Lost-and-found (not fixed): never treated as stationary — broadcasts freely. - Tracker / TAK tracker (not fixed): movement judged at the node's own configured (unclamped) precision instead of the on-wire public-clamped precision, so finer moves still trigger a send; floored only when stationary at that finer resolution. positionUnchangedSinceLastSend() gains a useConfiguredPrecision flag for this. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- TrafficManagementModule.cpp: fix pos_fingerprint comment — zero is not "astronomically unlikely" but actively remapped to 0xFF; update comment to state the actual invariant - AdminModule.cpp: log a warning for remote set_favorite_node and set_ignored_node requests that are refused at the protected-node cap (previously a silent no-op for non-local callers) - WarmNodeStore.h: MIGRATION_VERBOSE default already set to 0 in PR1 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Change bare PositionModule.h to modules/PositionModule.h — build_flags sets -Isrc, not -Isrc/modules, so the bare form fails to resolve in the native PlatformIO test env. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Default::hopTrimGrace(role, portnum) derives all tiers from one base grace (default 2): infra and on-specialty senders +1, deprecated roles -1, else base. userPrefs: USERPREFS_TMM_HOP_TRIM_DISABLE and _GRACE_BASE. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
alterReceived clamps a relayed position/telemetry broadcast's reach to the local hop-scaling cap + role grace (hop_start adjusted to keep hopsAway honest; far end stops with no final hop). FloodingRouter refuses the higher-hopcount upgrade for trim-eligible packets via wouldHopTrim, so it can't undo the trim. Compile-time, on by default, killed by USERPREFS_TMM_HOP_TRIM_DISABLE. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8 cases warming getLastRequiredHop() through the real HopScaling sampling path (fake time + sender roles in NodeDB): near/far clamp, grace tiers, cold no-op, never-raise, and wouldHopTrim gating. 42/42 traffic-management cases pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…PLY_TO_PRIVATE_CHANNELS
…e/apply_to_private)
…ming) - NodeDB.cpp addFromContact: only clear favourite bit and erase satellites when setProtectedFlag(IS_IGNORED) succeeds; log warning on cap refusal without side-effecting the entry - PositionModule.cpp: replace (lat==0 && lon==0) sentinel with lastGpsSend==0 — the coordinate (0,0) is valid and would permanently disable stationary detection for nodes there - mesh-pb-constants.h: add per-platform TRAFFIC_MANAGEMENT_CACHE_SIZE caps (STM32WL=0, nRF52840=200, ESP32-S3/portduino=2000, generic=1000) to avoid overallocating ~10 KB on RAM-constrained targets Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
e534b80 to
d6945e0
Compare
This adds on the previous work of #10706, and adds a gentle-ish reduction of hops on public channels.
🤝 Attestations