Skip to content

spec: entity registration MQTT API revamp#4233

Open
albinsuresh wants to merge 1 commit into
thin-edge:mainfrom
albinsuresh:feat/mqtt-api-revamp
Open

spec: entity registration MQTT API revamp#4233
albinsuresh wants to merge 1 commit into
thin-edge:mainfrom
albinsuresh:feat/mqtt-api-revamp

Conversation

@albinsuresh

@albinsuresh albinsuresh commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Proposed changes

A design doc detailing the existing issues with the entity registration MQTT APIs and a proposal to address them, with minimal backward compatibility breakage.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s. You can activate automatic signing by running just prepare-dev once)
  • I ran just format as mentioned in CODING_GUIDELINES
  • I used just check as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

@albinsuresh albinsuresh marked this pull request as ready for review June 25, 2026 19:24
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rina23q rina23q left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your accurate analysis and problem breakdown; it is greatly appreciated.

The proposal is debatable - hope as you expected. I left my opinions on the proposal.

Comment on lines +203 to +204
* On a valid registration message, the agent re-publishes the normalized entity **with** the marker
like `@source: "tedge-agent"`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the client's perspective (I'm imaging a custom cloud mapper development), it is a hassle because client always has to do validation on @source from the payload to distinguish the marked entity.
If a registered channel is separated from a request channel, it would be easy for clients as it just to subscribe the registered entity topic. However, I saw this idea is rejected in Proposal C.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say rejected. The duplication of the same data across 2 topic channels was my concern. We can't change the existing request channels and their "retained" requirement. So, the the clients will continue publishing retained registration messages and if the agent duplicates the same (or a normalised version) on a different channel, the original retained request is now redundant and stays with the broker with no value. That was my concern with proposal C and the only reason why I put this as the preferred one. If we weren't limited by backward compatibility, I also would have picked proposal C.

We can explore that proposal further if the duplication isn't a concern OR, if the agent is allowed to clear the registration message published by the users once the normalised version is published to its own channel. Even with the proposal A, the agent is updating the user's request messages. So, clearing them can also be treated as another mutation. These are all options and nothing is accepted/rejected until we all agree. So, let's discuss, debate and finalise.

Comment on lines +203 to +204
* On a valid registration message, the agent re-publishes the normalized entity **with** the marker
like `@source: "tedge-agent"`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suggestion: add a @last-updated timestamp to validated entity messages

Given that a custom mapper has its own entity store to manage their device IDs. When the mapper restarts and receives a burst of retained messages, it can compare the timestamp against its own cache to decide whether to update or skip.
Without timestamp, it may need to validate all keys and values in the cache if they are up-to-date.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mappers currently don't maintain a persistent entity store. What they have is a transient entity cache and it is rebuilt on every restart from the retained messages on the broker, as that's the source of truth. But yeah, once the agent becomes the owner of those retained entity metadata messages, a @last-updated field could be useful to persistent clients. Especially if we make the mappers persistent again, the way they used to be in the past. Earlier they were relying on the equality of the entity metadata between the entity store state and the message received.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What they have is a transient entity cache and it is rebuilt on every restart from the retained messages on the broker, as that's the source of truth.

We can change this so that the mapper's request the entity store information via the http interface provided by the tedge-agent though?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we're at it, also would be nice to provide both @created-at and @last-updated-at to know if the entity was created previously and updated, or just has been created (when both timestamps are equal)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That would make the startup logic of the mapper more straightforward, where it doesn't have to deal with any of the ordering problems between parent-child entities during that initial surge. The only issue here would be the added strong dependency on the agent process, in addition to mosquitto, during mapper startups, as it won't be able to do anything until the agent is also available. TBH, for starup I don't think that's a big deal.


## Proposal C - User publishes to entity metadata topic; Agent owns separate entity metedata channel

Not refined further as the duplication of the same data across multiple topic trees isn't ideal. No newline at end of file

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I like the Proposal C. Is it really a problem to have two channels, one for registration and the other for truth of source? Their data might be same, but might be different if the registration message is malformed. I'm pretty sure that third-party services wants to subscribe only the truth of source channel.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think leaving an already processed request retained on the broker would be acceptable. That's just wasted resources. I'd say proposal C is feasible if the agent is allowed to clear the registration messages published by the clients once the normalised version is retained on its own channel.

so an HTTP-created entity can be overwritten by any MQTT client afterwards.
HTTP buys validated *writes*, not a protected *source of truth*.

## Catalogue of issues

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would of thought we also have some issues with processing the order of messages when sent to the cloud.

For instance, with the service registration and status updates can result in the status sometimes being ignored.

@Bravo555 Bravo555 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking on it a bit, I'm convinced by the Proposal A. Adding a marker property that only later clients expect and verify is an elegant way to solve the problem with as few breaking changes as possible.
Though I do expect most of the value to come from resolving the inconsistencies between MQTT and HTTP API and updating the tests to cover these missing cases.
If the benefits were ever large enough for clients to update, we could also provide additional mechanisms under a meta topic like te/device/d_id/service/s_id/meta, but "providing additional registration/patch methods that eliminate some of the footguns" is probably not big enough of a benefit to justify the complexity of having to support and document both these methods.
For the same reason, I considered an A+B solution, where, instead of removing the registration at entity metadata topic, we could still process them while also allowing registration at a different topic like te/device/d_id/service/s_id/meta/register. One advantage of that approach is that new clients could opt into this new topic and their registration wouldn't trigger as duplicate "registration" message for older clients who observe this topic but aren't aware of the @source property, but that also probably be a doubtful proposition of bringing in more complexity than it removes.

Comment on lines +53 to +62
1. **Rejected registrations leave broken retained messages.**
A bad registration message, rejected by the agent, stays with the broker even after the rejection.
There is no compensating publish.
The broker keeps serving a message the agent refuses to honour,
so the broker's view and the agent's view disagree permanently and never reconverge.

2. **A valid retained message can be replaced by an invalid one.**
Because the entity metadata topic is writable by anyone,
a second publisher can clobber a good registration with a bad one.
The agent rejects the new message but, per issue 1, the bad message stays retained.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: would the agent overwriting these topics on bad input by itself, without adding @source: agent, be sufficient to solve these issues?

Indeed without @source: agent mappers wouldn't know if the entity is valid so they could send invalid entities to the cloud, but if these entities were then overwritten by the agent to clear the topic, they could be immediately removed. This is suboptimal to be sure, but the the problem described here is only about leftover broken retained messages, which I think could be solved that way?

In the proposal A, there's a note:

Restoring or clearing a retained message that the agent did not author is a fundamental change. It must be documented and comunicated heavily on roll-out.

Could this potentially be a breaking change?

Asking just for clarity of which portions of the proposal address which issues and how the change could be decomposed into a number of smaller action items.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: would the agent overwriting these topics on bad input by itself, without adding @source: agent, be sufficient to solve these issues?

Indeed without @source: agent mappers wouldn't know if the entity is valid so they could send invalid entities to the cloud, but if these entities were then overwritten by the agent to clear the topic, they could be immediately removed. This is suboptimal to be sure, but the the problem described here is only about leftover broken retained messages, which I think could be solved that way?

If we're re-using the same te/+/+/+/+ topic both as the request channel as well as the agent's authoritative channel, then the @source field is critical to avoid the mappers or clients from reacting to an invalid request. Yes, the agent would eventually replace the invalid message with a valid one as per proposal 1. But, what if the agent is down when the request was sent? The mapper would have already reacted to the original request and it wouldn't get that "correction message" until the agent comes back up. By making the mappers and other clients react only to messages with the @source field, we are preventing such issues.

In the proposal A, there's a note:

Restoring or clearing a retained message that the agent did not author is a fundamental change. It must be documented and comunicated heavily on roll-out.

Could this potentially be a breaking change?

This is one part that I'm not fully sure about. I don't expect any normal "publishing clients" that registers themselves and pushes data or receiving commands (which is likely the majority of clients) to be affected at all by this change, as they are less likely to subscribe to the te/+/+/+/+ topics that they are publishing to.

But, if there are other clients like mappers that were tracking the registration messages, they would definitely get notified. But, here also I'm not sure if it can be considered a "breaking behaviour" as those clients should have been prepared to received entity update messages on the same channel, as we've supported that (though in a broken manner). So, I'd expect those clients to treat the addition of the @source field as yet another update.

Asking just for clarity of which portions of the proposal address which issues and how the change could be decomposed into a number of smaller action items.

Sure, I'll add a section or a table after each proposal capturing how it addresses each and every listed issue.

Comment on lines +76 to +80
4. **Empty payload is overloaded as "delete".**
An entity can never have an intentionally empty registration,
and any accidental empty retained publish (a common scripting mistake)
deletes the entity and its entire subtree, with no confirmation.
Combined with issue 2 (anyone can publish), this is a sharp footgun.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, we can't do anything about this issue for the entity metadata topic that wouldn't be a breaking change, so the solution is to just use HTTP API.
Or, for clients who for some reason can't/won't use HTTP, we could consider providing additional methods for registering/updating entity metadata under some new te/device/d_id/service/s_id/meta topic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, we can't do anything about this issue for the entity metadata topic that wouldn't be a breaking change, so the solution is to just use HTTP API.

The fix considered for this, over the MQTT API, was for the agent to restore the original entity registration message if the cleared (deleted) entity had any children that are not deleted yet. So, the clear message would be considered as a delete request only for "leaf" entities. But yeah, this clearly is a breaking change and it would introduce yet another point of divergence from the HTTP API.

Or, for clients who for some reason can't/won't use HTTP, we could consider providing additional methods for registering/updating entity metadata under some new te/device/d_id/service/s_id/meta topic.

This was proposal B. But, introducing a new channel for registration is a strict no-go as we will have to support registrations/deregistrations over the existing te/+/+/+/+ topic anyway for backward compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants