fix(build-index): return 404 on backend unavailability to unblock docker pushes by thijmv · Pull Request #631 · uber/kraken

thijmv · 2026-06-01T14:31:06Z

This PR (2/3) makes the following changes:

Tag store: Return 404 instead of 500 when the backend returns an error during tag resolution. This prevents a Docker push flow that issues a HEAD before PUT, e.g. the containerd snapshotter, from being aborted by a server error. On the subsequent PUT, the tag is written to disk and flushed to the backend once it becomes available.
Build-index nginx: Drop the 1s cache on tag lookups, which was causing poisoned tag resolutions for subsequent pulls after the previous HEAD returned 404.

Anton-Kalpakchiev · 2026-06-01T16:31:40Z

    proxy_cache         tags;
    proxy_cache_methods GET;
    proxy_cache_valid   200 5m;
-    proxy_cache_valid   any 1s;


Do we understand why this cache was here in the first place? Do we envision any negative impact by removing it?

It has been there since the beginning of Kraken, so I assume it was a base configuration, or used to limit thundering herd. Since the 200 requests are still cached, I do not envisage noticeable negative impact under normal load. However, it is something we should monitor. If we experience a lot of server errors, those requests will not be cached anymore, and would all go to the build-index.

Removing this cache seems like a risk. Can we avoid removing it? I'd rather not fiddle with configs that we don't fully understand.

The 404 is cached because of the additional HEAD. When a GET for the tag is then made, it incorrectly returns the 404, even though the tag now exists. This means the subsequent pull following the push fails during the integration tests due to cache poisoning.

While we could add a small delay between the two in the integration tests, any real production scenario where the pull follows the push within a second would still fail.

We still have the proxy_cache_lock in place to avoid having to process identical requests arriving at the same time.

Anton-Kalpakchiev · 2026-06-01T16:34:57Z

 			log.With("tag", tag).Debug("Tag not found in backend")
-			return core.Digest{}, ErrTagNotFound
+		} else {
+			// If we experience a backend error, propagating a 5xx would abort


Isn't the client responsible for retrying upon a server error? What is the issue if the push is aborted by the containerd snapshotter? Isn't it hacky to return 404 when in actuality the blob may be available, but we saw a backend error?

The containerd snapshotter makes a HEAD request before trying to push an image. If that request fails with a server error, the push is aborted, and the user is unable to push said image.

We cannot return 500 to the snapshotter, as pushes will then fail when the backend is temporarily unavailable. I agree that this approach seems hacky, but I do not see a better alternative, given the snapshotter's behaviour.

Why is it a problem if the snapshotter abandons the push upon a server error?

Because when the backend is unavailable, we still expect the tag to be pushed to the build-index's disk, and flushed to the backend once it becomes available again. Without the snapshotter, this behaviour happened correctly. However, since the snapshotter precedes the tag push with an additional HEAD, continuing to return a server error when the backend is unavailable would mean no images or tags can be pushed if the backend is unavailable, which is the scenario we are actively trying to avoid.

For more context, see this integration test.

hmm, makes sense. I still am not sure about the hacky approach. Is there a way to configure the containerd snapshotter to continue after HEAD failure or tweak it in a different way?

Alternatively, can we just avoid using the containerd snapshotter altogether and test OCI manifest parsing in a different way?

The containerd snapshotter cannot be configured to ignore HEAD server errors. Even if it could, environments with the default containerd snapshotter enabled would still encounter said errors.

Regarding your second point, the snapshotter is the only way to exercise the full OCI push and pull flow within the integration tests.

Additionally, given that the containerd snapshotter and OCI manifest themselves are becoming the default, I think it is worth testing against it directly rather than working around it.

…ker pushes

github-actions Bot added the size/xs label Jun 1, 2026

Anton-Kalpakchiev requested changes Jun 1, 2026

View reviewed changes

thijmv force-pushed the thijmv/oci-manifest-support branch from ac8edc6 to a6edeae Compare June 2, 2026 08:39

fix(build-index): return 404 on backend unavailability to unblock doc…

343ca94

…ker pushes

thijmv force-pushed the thijmv/oci-manifest-support_2 branch from 76eeab8 to 343ca94 Compare June 2, 2026 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(build-index): return 404 on backend unavailability to unblock docker pushes#631

fix(build-index): return 404 on backend unavailability to unblock docker pushes#631
thijmv wants to merge 1 commit into
thijmv/oci-manifest-supportfrom
thijmv/oci-manifest-support_2

thijmv commented Jun 1, 2026 •

edited

Loading

Uh oh!

Anton-Kalpakchiev Jun 1, 2026

Uh oh!

thijmv Jun 2, 2026

Uh oh!

Anton-Kalpakchiev Jun 2, 2026

Uh oh!

thijmv Jun 2, 2026 •

edited

Loading

Uh oh!

Anton-Kalpakchiev Jun 1, 2026

Uh oh!

thijmv Jun 2, 2026

Uh oh!

Anton-Kalpakchiev Jun 2, 2026

Uh oh!

thijmv Jun 2, 2026

Uh oh!

Anton-Kalpakchiev Jun 2, 2026

Uh oh!

thijmv Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thijmv commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thijmv Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thijmv commented Jun 1, 2026 •

edited

Loading

thijmv Jun 2, 2026 •

edited

Loading