Apply server timeout globally, remove hardcoded datastore timeout by the-glu · Pull Request #1397 · interuss/dss

the-glu · 2026-03-18T13:02:20Z

This PR enforce the value of the timeout flag for all HTTP calls, as it was applied inconsistently through the codebase, especially in SCD where the value was not used at all.

It use an HTTP handle, so it's applied on every call.

There was also another timeout in the code on transactions, hardcoded to 10s. I removed it as it's doesn't seems to make sense anymore with the global timeout (and was also not always applied).

This is potentially disruptive as some slow calls that passed before are now going to be canceled.

This remove a TODO in the code and potentially fix #1393 issue.

mickmis

Agreed that a unified timeout on HTTP endpoints do make sense. Now I'm not sure it implies that timeouts on DB calls become irrelevant (however the middle ones on handlers are).

I'm notably thinking here about two things:

operations initiated not through an HTTP call (e.g. cleanup job in routine or CLI invokation)
that we might want to return a '409' if we have a timeout on DB calls (and also when the max no. of retries have been reached):

        '409':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
          description: >-
            ...
            * Despite repeated attempts, the DSS was unable to complete the
            update because of other simultaneous changes.

So I'd propose here to keep and uniformize timeouts on DB (i.e. ones with DefaultTimeout) in addition to HTTP handlers timeouts.
(and we definitely need to factorize store logic)

And for 2., I don't believe the return of a 409 for a retryable error is implemented at all? If you agree, could open an issue to track this? This is actually quite important as clients would rely on that to implement a retry mechanism.

pkg/rid/store/datastore/store_test.go

the-glu · 2026-03-18T14:43:34Z

1. operations initiated not through an HTTP call (e.g. cleanup job in routine or CLI invocation)

Yes correct, however I would just add theses to the CLI/cleanup job no? I don't think that should be the responsibility of the datastore layer enforce a timeout, especially since they are doing shorts call only in those cases.

2. that we might want to return a '409' if we have a timeout on DB calls (and also when the max no. of retries have been reached):

I'm not sure the case for a generic timeout on the DB layer exist, as by default the HTTP one is enough, and there won't be any retries for normal cases.

Also for retries (in the case of a timeout, not a deadlock), does a case exists where just canceling a query & retrying make sense? Why not just wait longer?

So I'd propose here to keep and uniformize timeouts on DB (i.e. ones with DefaultTimeout) in addition to HTTP handlers timeouts. (and we definitely need to factorize store logic)

Should we do that, I suggest we move it to another PR no? It will need "major" changes (like a new setting, setting validation (HTTP timeout will need to be greater than DBs timeout).

We can keep the cleanup here or not, but right now with default values it's never raised (10s for both, but the DB one will start later anyways), so I would go for cleanup and start with a clean state in the next PR.

And for 2., I don't believe the return of a 409 for a retryable error is implemented at all? If you agree, could open an issue to track this? This is actually quite important as clients would rely on that to implement a retry mechanism.

It seems to be used, but not in case of database timeout, but thinks like version mismatch, etc.

However, does 409 really mean retryable? It mean conflict, meaning it may not be possible anymore to do a specific query.

Edit:

Looking at the code, I think that the transaction timeout is never used to perform retries, and I even more don't see the need for a different timeout on that side. Especially that a timeout on the DB doesn't always mean that it's dues to a conflict / other changes, network may be slow/broken.

The "global" timeout (http + cli|job) is including what is wanted with the DB timeout (DB stop responding) no?

the-glu force-pushed the improve_timeout branch 2 times, most recently from a9b96cc to c005582 Compare March 18, 2026 13:10

mickmis reviewed Mar 18, 2026

View reviewed changes

pkg/rid/store/datastore/store_test.go Outdated Show resolved Hide resolved

Apply server timeout globally, remove hardcoded datastore timeout

17f08ae

the-glu force-pushed the improve_timeout branch from c005582 to 17f08ae Compare March 18, 2026 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply server timeout globally, remove hardcoded datastore timeout#1397

Apply server timeout globally, remove hardcoded datastore timeout#1397
the-glu wants to merge 1 commit intointeruss:masterfrom
Orbitalize:improve_timeout

the-glu commented Mar 18, 2026 •

edited

Loading

Uh oh!

mickmis left a comment

Uh oh!

Uh oh!

the-glu commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

the-glu commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mickmis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

the-glu commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

the-glu commented Mar 18, 2026 •

edited

Loading

the-glu commented Mar 18, 2026 •

edited

Loading