Skip to content

Pr2.5 tmm hop trimming#10719

Draft
NomDeTom wants to merge 25 commits into
meshtastic:developfrom
NomDeTom:PR2.5-tmm-hop-trimming
Draft

Pr2.5 tmm hop trimming#10719
NomDeTom wants to merge 25 commits into
meshtastic:developfrom
NomDeTom:PR2.5-tmm-hop-trimming

Conversation

@NomDeTom

Copy link
Copy Markdown
Collaborator

This adds on the previous work of #10706, and adds a gentle-ish reduction of hops on public channels.

🤝 Attestations

  • I have tested that my proposed changes behave as described.
  • I have tested that my proposed changes do not cause any obvious regressions on the following devices:
    • Heltec (Lora32) V3
    • LilyGo T-Deck
    • LilyGo T-Beam
    • RAK WisBlock 4631
    • Seeed Studio T-1000E tracker card
    • Other (please specify below)

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

⚡ Try this PR in the Web Flasher

Flash this PR in the Web Flasher

firmware commit boards expires

Warning

This is an automated, unreviewed CI test build. Back up your device configuration
before flashing, and only flash devices you are able to recover.

Supported boards built by this PR (24)
Device Board Platform
Crowpanel Adv 3.5 TFT elecrow-adv-35-tft esp32-s3
Heltec HT62 heltec-ht62-esp32c3-sx1262 esp32-c3
Heltec Mesh Node 096 heltec-mesh-node-t096 nrf52840
Heltec Mesh Node T1 heltec-mesh-node-t1 nrf52840
Heltec Mesh Node T114 heltec-mesh-node-t114 nrf52840
Heltec V3 heltec-v3 esp32-s3
Heltec V4 heltec-v4 esp32-s3
Raspberry Pi Pico pico rp2040
Raspberry Pi Pico W picow rp2040
RAK WisMesh Tag rak_wismeshtag nrf52840
RAK WisBlock 11200 rak11200 esp32
RAK WisBlock 11310 rak11310 rp2040
RAK3312 rak3312 esp32-s3
RAK WisBlock 4631 rak4631 nrf52840
Seeed Wio Tracker L1 seeed_wio_tracker_L1 nrf52840
Seeed Xiao NRF52840 Kit seeed_xiao_nrf52840_kit nrf52840
Seeed Xiao ESP32-S3 seeed-xiao-s3 esp32-s3
Station G2 station-g2 esp32-s3
Station G3 station-g3 esp32-s3
LILYGO T-Deck t-deck-tft esp32-s3
LILYGO T-Echo t-echo nrf52840
LILYGO T-Echo Plus t-echo-plus nrf52840
LilyGo T3-C6 tlora-c6 esp32-c6
Seeed SenseCAP T1000-E tracker-t1000-e nrf52840

Build artifacts expire on 2026-07-18. Updated for d6945e0.

@github-actions github-actions Bot added needs-review Needs human review enhancement New feature or request labels Jun 15, 2026
@NomDeTom NomDeTom force-pushed the PR2.5-tmm-hop-trimming branch 2 times, most recently from a99a8c8 to 647cc6f Compare June 15, 2026 02:15
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Firmware Size Report

22 targets | vs develop: 22 increased, net +167,940 (+164.0 KB)

Target Size vs develop
rak3312 2,260,928 📈 +10,688 (+10.4 KB)
seeed-xiao-s3 2,264,848 📈 +10,672 (+10.4 KB)
t-deck-tft 3,800,048 📈 +10,656 (+10.4 KB)
t-eth-elite 2,479,808 📈 +10,112 (+9.9 KB)
heltec-vision-master-e213-inkhud 2,213,760 📈 +10,080 (+9.8 KB)
Show 17 more target(s)
Target Size vs develop
station-g3 2,254,864 📈 +9,936 (+9.7 KB)
rak11200 1,849,616 📈 +9,648 (+9.4 KB)
elecrow-adv-35-tft 3,405,168 📈 +9,456 (+9.2 KB)
heltec-v3 2,252,256 📈 +9,232 (+9.0 KB)
picow 1,234,836 📈 +8,600 (+8.4 KB)
tlora-c6 2,356,752 📈 +8,576 (+8.4 KB)
heltec-ht62-esp32c3-sx1262 2,123,216 📈 +8,448 (+8.2 KB)
pico 773,560 📈 +8,080 (+7.9 KB)
rak11310 796,184 📈 +8,080 (+7.9 KB)
seeed_xiao_rp2040 771,760 📈 +8,080 (+7.9 KB)
pico2w 1,210,780 📈 +8,004 (+7.8 KB)
seeed_xiao_rp2350 759,096 📈 +7,520 (+7.3 KB)
pico2 760,928 📈 +7,504 (+7.3 KB)
wio-e5 235,036 📈 +1,816 (+1.8 KB)
rak3172 182,580 📈 +1,552 (+1.5 KB)
station-g2 2,254,864 📈 +656
heltec-v4 2,264,080 📈 +544

Updated for 0712348

@NomDeTom NomDeTom requested review from GUVWAF and thebentern June 15, 2026 08:21
Comment thread src/mesh/Channels.h
// display name (e.g. a channel named "LongFast" counts even while the
// radio runs MediumFast). Broader than isDefaultChannel, which only
// matches the current preset's name and PSK byte 1.
bool isWellKnownChannel(ChannelIndex chIndex);

@luivicur luivicur Jun 16, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a way to use position dedup in a regional channel? We defaulted to AQ== as the key, but we're in time to make changes.

Well known key = well known channel makes sense to me. But i'm late to this discussion.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a compile-time flag to allow some parts to apply to private channels. I'll have a think about how to extend that.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luivicur I've extended the "applies to private channels" userpref, and added it to the protobufs required. Fingers crossed...

@NomDeTom NomDeTom force-pushed the PR2.5-tmm-hop-trimming branch 2 times, most recently from be8bed5 to 815fb3b Compare June 17, 2026 08:33
@NomDeTom NomDeTom requested a review from Copilot June 17, 2026 08:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Traffic Management Module (TMM) work (stacking on #10706) by adding hop-trimming for relayed public-channel broadcasts, plus a larger set of supporting changes to make long-tail routing and PKI identity more resilient on constrained targets (notably nRF52840).

Changes:

  • Add warm-tier node identity storage (WarmNodeStore) so evicted NodeDB entries can retain {nodenum, last_heard, public_key} and keep PKI DMs working for long-tail peers.
  • Add TMM routing-hint overflow cache (next-hop hints) and wire it into NextHopRouter and TraceRouteModule.
  • Add TMM relay shaping features: hop-trimming for relayed broadcast telemetry/position (public channels by default) and a relayed position precision clamp.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
variants/nrf52840/nrf52840.ini Sets an nRF52840 base linker script to keep the app below the warm-store flash region.
variants/nrf52840/nrf52.ini Adds a post-link guard script to fail builds that overlap the warm-store flash region.
userPrefs.jsonc Documents new build-time toggles for applying TMM to private channels and disabling/tuning hop trimming.
test/test_warm_store/test_main.cpp New unit tests for WarmNodeStore admission/eviction, take/rehydrate, and persistence behavior.
test/test_traffic_management/test_main.cpp Expands TMM tests for well-known-channel gating, virtual clock ticks, next-hop cache, and hop trimming (variable hops).
test/test_nodedb_blocked/test_main.cpp New tests for NodeDB migration/demotion behavior and protected-node (favorite/ignored) retention.
src/platform/nrf52/nrf52840_s140_v7.ld Shrinks FLASH region to end before the warm-store raw-flash ring.
src/platform/nrf52/nrf52840_s140_v6.ld Adds a v6 SoftDevice linker script with the same warm-store flash cap.
src/modules/TrafficManagementModule.h Updates cache model docs; adds hop-trim API, next-hop hint API, and a test clock hook.
src/modules/TrafficManagementModule.cpp Implements hop-trimming, relayed position precision clamp, next-hop preload, and flat cache/tick timestamping.
src/modules/TraceRouteModule.cpp Mirrors traceroute-derived next-hop hints into the TMM overflow cache.
src/modules/AdminModule.cpp Routes favorite/ignore through NodeDB protected-cap enforcement; adds optional warm-tier debug dump.
src/mesh/WarmNodeStore.h Introduces warm-tier data model, persistence strategy, and nRF52840 raw-flash ring layout.
src/mesh/WarmNodeStore.cpp Implements warm-tier storage (file backend and nRF52840 raw-flash ring backend).
src/mesh/Router.cpp Uses NodeDB copyPublicKey() (hot + warm tier) for PKI decrypt/encrypt key lookup.
src/mesh/NodeDBLegacyMigration.cpp Sanitizes migrated legacy node names as UTF-8 to avoid nanopb encode failures.
src/mesh/NodeDB.h Adds warm-tier integration, protected-node cap enforcement API, and new self-care helpers.
src/mesh/NodeDB.cpp Implements self-care pass, warm-tier demotion/rehydration, satellite caps, protected-node cap, and always-on TMM defaults.
src/mesh/NextHopRouter.cpp Writes confirmed next-hop into TMM overflow cache and consults it as a fallback.
src/mesh/mesh-pb-constants.h Introduces WARM_NODE_COUNT and satellite caps, changes default node caps, and enables TMM by default.
src/mesh/FloodingRouter.cpp Suppresses “higher-hopcount wins” upgrades when hop-trimming would be undone by the upgrade.
src/mesh/Default.h Updates TMM defaults and adds hop-trim grace-base configuration.
src/mesh/Default.cpp Implements grace calculation by sender role/portnum for hop-trimming.
src/mesh/Channels.h Adds Channels::isWellKnownChannel() API.
src/mesh/Channels.cpp Implements well-known-channel detection by PSK shape and preset display name matching.
src/graphics/draw/MenuHandler.cpp Uses protected-cap enforcement for ignore; avoids saving/observer churn when a cap refusal occurs.
extra_scripts/nrf52_warm_region.py New post-link guard ensuring the firmware image doesn’t overlap the warm-store reserved flash pages.
.github/copilot-instructions.md Documents warm tier, satellite caps, and self-care behaviors for contributors/agents.

Comment thread src/modules/TrafficManagementModule.cpp
Comment thread src/modules/TrafficManagementModule.cpp Outdated
@NomDeTom NomDeTom force-pushed the PR2.5-tmm-hop-trimming branch from 815fb3b to b69e0c7 Compare June 17, 2026 10:59
@NomDeTom NomDeTom requested a review from Copilot June 17, 2026 11:25

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 37 changed files in this pull request and generated 6 comments.

Comment thread src/modules/TrafficManagementModule.cpp
Comment thread src/modules/TrafficManagementModule.cpp Outdated
Comment thread src/mesh/WarmNodeStore.cpp
Comment thread src/mesh/mesh-pb-constants.h
Comment thread src/modules/TrafficManagementModule.cpp
Comment thread src/modules/TrafficManagementModule.cpp Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 37 changed files in this pull request and generated 2 comments.

Comment thread src/modules/PositionModule.cpp
Comment thread src/mesh/NodeDB.cpp Outdated
@NomDeTom NomDeTom force-pushed the PR2.5-tmm-hop-trimming branch 4 times, most recently from 019ed7a to e534b80 Compare June 17, 2026 23:35
NomDeTom and others added 3 commits June 18, 2026 10:32
…store

Reworks the TrafficManagementModule cache layer (policing behaviour unchanged
from upstream) and adds a routing-hint overflow store:

- Flatten the ring: replace the cuckoo-hashed unified cache and the bucketed
  PSRAM NodeInfo index with plain flat arrays + linear scan (same idiom as
  WarmNodeStore). At LoRa packet rates an O(n) scan of the cache is negligible,
  and it removes a large amount of hashing/displacement complexity. The cache
  entry is 11 B; timestamps use a uniform +1 presence-offset so a 0 byte always
  means "empty" across every sub-store. Adds rebaseEpoch() so cached state
  survives the ~19 h relative-timestamp horizon instead of being flushed.

- Next-hop overflow cache: setNextHop/getNextHopHint store a confirmed last-byte
  relay for a destination, written only from NextHopRouter's ACK-confirmed
  decision (and mirrored from TraceRoute). NextHopRouter::getNextHop falls back
  to this cache when the hot NodeDB has no hint, so DMs/relays to long-tail
  nodes keep routing after the node ages out of NodeInfoLite.

- Persistence: preloadNextHopsFromNodeDB warm-starts the cache from persisted
  NodeInfoLite hints on first maintenance pass; next_hop entries are kept alive
  across the maintenance sweep (no TTL) and never clobbered by a stale preload.

All packet-policing logic (rate limit, position dedup, unknown-packet drop,
NodeInfo direct response, hop exhaustion) is the existing upstream behaviour,
untouched. HAS_TRAFFIC_MANAGEMENT defaults on so the module is compiled in. (see note).

Tests: upstream policing suite now actually runs (adds the MeshTypes.h include
that gates HAS_TRAFFIC_MANAGEMENT) plus 4 next-hop tests. Role-aware throttles,
politeness, precision clamp, port-interval and mesh-radius gating — and the
rate-limit >255 saturation fix — are deferred to the advanced-TMM branch.

Note: default dedup movement grid moves to ~91m, which also means 1.5km required to end up with the same signature position - coarser and therefore further than before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`node` in preloadNextHopsFromNodeDB() is never written through — mark
it const to satisfy cppcheck's constVariablePointer check in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
NomDeTom and others added 22 commits June 18, 2026 10:32
Position dedup in TrafficManagementModule::handleReceived is gated on
channels.isWellKnownChannel(mp.channel). The test helper
installWellKnownPrimaryChannel() sets up channelFile/config.lora so that
gate is true, but it was defined and never called — so the dedup path was
never reached. test_tm_positionDedup_dropsDuplicateWithinWindow therefore
failed (duplicate forwarded -> CONTINUE instead of STOP), and
test_tm_positionDedup_allowsMovedPosition passed only vacuously.

Call installWellKnownPrimaryChannel() in both dedup tests so the dedup
path is genuinely exercised.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt-0

Copilot review (PR meshtastic#10706):
- preloadNextHopsFromNodeDB() now returns bool; runOnce only latches
  nextHopPreloaded once the preload actually ran (retries if nodeDB wasn't
  ready), instead of skipping it forever.
- Remove the empty `#if HAS_VARIABLE_HOPS` blocks in the test.

Test correctness:
- Three more position-dedup tests were missing installWellKnownPrimaryChannel()
  (dropsDuplicate/allowsMoved were fixed earlier; allowsDuplicateAfterInterval,
  cacheFlush, priorRateState were not) — without the well-known-channel gate the
  dedup path never runs, so their STOP assertions failed.

Fake-time injection (no more real sleeps):
- Add TrafficManagementModule::s_testNowMs + nowMs(), mirroring HopScalingModule;
  route all TMM tick/time reads through nowMs(). Tests advance a virtual clock via
  s_testNowMs instead of testDelay() sleeping real 5-6 min across a tick — the
  suite drops from ~15 min to ~30 s. Production behaviour is unchanged (nowMs()
  inlines to millis()).

Fingerprint-0 fix:
- computePositionFingerprint() never returns 0 now (remap 0 -> 0xFF, mirroring
  getLastByteOfNodeNum), so a real position that hashes to 0 doesn't collide with
  the "no position seen" sentinel and its duplicates dedup correctly.

test_traffic_management: 34/34 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The TMM relay dedup suppresses other nodes' duplicate positions for ~11h; mirror
that on the originator so we don't emit identical positions that get dropped anyway.

- Hold position broadcasts to a 12h floor when fixed_position is set (any role).
- Hold to the same floor when our position is unchanged beyond the broadcast
  precision (the user/channel-max resolution the on-wire position is truncated to).
- Genuine movement beyond that resolution keeps the normal interval, and the
  smart-broadcast branch still sends early on sub-interval movement.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PositionModule (stationary check) and TrafficManagementModule (dedup fingerprint)
each had their own coordinate-truncation primitive. Promote PositionPrecision's
truncateCoordinate to a shared, declared function and route both at it.

- Un-static truncateCoordinate; fold in the precision 0/>=32 guard so it's safe on
  the TMM dedup path that previously relied on truncateLatLon's guard.
- Add a uint8_t-precision overload (forwards to the uint32_t one) so TMM's uint8_t
  precision calls need no cast; the return stays int32_t (it's a coordinate).
- Remove TMM's duplicate truncateLatLon; PositionModule compares truncated coords
  directly instead of round-tripping through Position structs.

Core PositionModule no longer reaches into the optional TMM module for this.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split the two broadcast-policy decisions out of runOnce/positionUnchangedSinceLastSend
into pure static helpers so they're unit-testable without the module or a fake clock:

- positionWithinPrecisionCell(): two coords truncate to the same precision grid cell
  (stationary); precision 0 or >=32 never suppresses.
- effectiveBroadcastIntervalMs(): stationary positions are held to the 12h floor when
  that's the longer interval, else the normal configured interval.

test/test_position_module covers jitter-stays/move-leaves the cell, the 0 and >=32
precision guards, and the floor/interval selection. No fake time needed — the time
delta is the existing interval mechanism; only the floor decision is new and it's pure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Keyed on the originating node's advertised role (NodeDB lookup of p->from). Both
exceptions only relax filtering, never tighten it past the operator's config.

- Tracker / TAK tracker: cap the position dedup window at 1 hour so a stationary
  tracker may refresh a duplicate position hourly instead of every ~11h.
- Lost-and-found: throttle only to the shortest tick window (one kPosTimeTickMs),
  and skip the relayed-position precision clamp entirely (no anti-dox).

New cap default lives in Default.h (default_traffic_mgmt_tracker_position_min_interval_secs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror the TMM role exceptions on the originator side, but fixed position is still
held to the 12h floor for every role — a tracker or lost-and-found that pins itself
isn't doing its job, so it gets no exception.

- Fixed position (any role): 12h floor, unchanged.
- Lost-and-found (not fixed): never treated as stationary — broadcasts freely.
- Tracker / TAK tracker (not fixed): movement judged at the node's own configured
  (unclamped) precision instead of the on-wire public-clamped precision, so finer
  moves still trigger a send; floored only when stationary at that finer resolution.

positionUnchangedSinceLastSend() gains a useConfiguredPrecision flag for this.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- TrafficManagementModule.cpp: fix pos_fingerprint comment — zero is not
  "astronomically unlikely" but actively remapped to 0xFF; update comment
  to state the actual invariant
- AdminModule.cpp: log a warning for remote set_favorite_node and
  set_ignored_node requests that are refused at the protected-node cap
  (previously a silent no-op for non-local callers)
- WarmNodeStore.h: MIGRATION_VERBOSE default already set to 0 in PR1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Change bare PositionModule.h to modules/PositionModule.h — build_flags
sets -Isrc, not -Isrc/modules, so the bare form fails to resolve in
the native PlatformIO test env.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Default::hopTrimGrace(role, portnum) derives all tiers from one base grace
(default 2): infra and on-specialty senders +1, deprecated roles -1, else base.
userPrefs: USERPREFS_TMM_HOP_TRIM_DISABLE and _GRACE_BASE.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
alterReceived clamps a relayed position/telemetry broadcast's reach to the local
hop-scaling cap + role grace (hop_start adjusted to keep hopsAway honest; far end
stops with no final hop). FloodingRouter refuses the higher-hopcount upgrade for
trim-eligible packets via wouldHopTrim, so it can't undo the trim. Compile-time,
on by default, killed by USERPREFS_TMM_HOP_TRIM_DISABLE.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8 cases warming getLastRequiredHop() through the real HopScaling sampling path
(fake time + sender roles in NodeDB): near/far clamp, grace tiers, cold no-op,
never-raise, and wouldHopTrim gating. 42/42 traffic-management cases pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ming)

- NodeDB.cpp addFromContact: only clear favourite bit and erase
  satellites when setProtectedFlag(IS_IGNORED) succeeds; log warning
  on cap refusal without side-effecting the entry
- PositionModule.cpp: replace (lat==0 && lon==0) sentinel with
  lastGpsSend==0 — the coordinate (0,0) is valid and would permanently
  disable stationary detection for nodes there
- mesh-pb-constants.h: add per-platform TRAFFIC_MANAGEMENT_CACHE_SIZE
  caps (STM32WL=0, nRF52840=200, ESP32-S3/portduino=2000, generic=1000)
  to avoid overallocating ~10 KB on RAM-constrained targets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@NomDeTom NomDeTom force-pushed the PR2.5-tmm-hop-trimming branch from e534b80 to d6945e0 Compare June 18, 2026 10:13

@luivicur luivicur left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My extremely rookie approval

@NomDeTom NomDeTom marked this pull request as draft June 19, 2026 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request needs-review Needs human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants