Skip to content

storage: fix some bugs in the additional layer store#695

Merged
mtrmac merged 3 commits intocontainers:mainfrom
giuseppe:image-layer-store-fixes
Mar 19, 2026
Merged

storage: fix some bugs in the additional layer store#695
mtrmac merged 3 commits intocontainers:mainfrom
giuseppe:image-layer-store-fixes

Conversation

@giuseppe
Copy link
Copy Markdown
Member

more details in each commit

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@github-actions github-actions bot added the storage Related to "storage" package label Mar 12, 2026
Copy link
Copy Markdown
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

storage/store.go Outdated
adriver = a
return nil
}(); err != nil {
if err := s.startUsingGraphDriver(); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graphLock does not protect the graph driver state against concurrent layer creation / deletion / …— see how almost all code here releases it in getLayerStore and the like before doing ~anything.

That’s really the job of the layer store lock.

Yes, the shutdown logic is weird (we can ~legitimately have two concurrent layer store objects for the same store, and that happens in some code paths)

AFAICS Shutdown holds the layer store lock as well, for the whole duration, so holding the layer store lock after obtaining a driver object should be sufficient for safety. But, really, keeping to the normal layering model and indirecting the driver call through a layerStore would be much more idiomatic and much more clearly “not wrong, or wrong the same way as everything else”.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to do more here?

Right now it seems that the implementation of the lookup is safe enough to run without any locking (anyway the outcome can be invalidated after we unlock and this function returns). But architecturally it’s unclean, generally the drivers trust the higher layers to enforce RW locking.

I don’t feel strongly either way, the other parts of this PR are valuable on their own.

@@ -0,0 +1,96 @@
//go:build linux
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having any test coverage here is always good…

Given how this hard-codes specifics of the overlay format, moving the test to drivers/overlay might be cleaner. That would also allow us to use a test-only override for the FUSE checks, without affecting runtime behavior.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've dropped the controversial FUSE checks patch

// filesystem. The GC notification protocol (notifyUseAdditionalLayer /
// notifyReleaseAdditionalLayer) only works when the additional layer store is a
// FUSE filesystem that intercepts the operations and returns ENOENT.
func isAdditionalLayerStoreFUSE(p string) bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m rather unhappy about this run-time condition.

I think the ALS feature makes little sense without FUSE (it requires both diff and blob, storing everything twice otherwise!). And the FUSE interface is already limiting us (e.g. no version negotiation, very annoying to pass more than one parameter) so I think moving towards the direction of static filesystems is not where I’d like this to go.

[Well I’d like to nuke ALS from the codebase, but that’s not happening.]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this change as I thought it could be useful to have a static list of OCI layers, that could be used without any additional change (or to require a FUSE filesystem). Well, they are already usable, we just get a warning

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that works until you try to push… Yes, that could be useful to some people, but it’s another gotcha for everyone to keep track of.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the assumption is that the list of layers can only grow and that layers are never deleted, since there is no refcounting, but that is true also for additional images stores.

In any case I don't have any immediate use case, and the cost of not doing it is just a silly warning, so I've just dropped it.

…rors

When LookupAdditionalLayer successfully looks up a layer from the
additional layer store but then fails on Info() or JSON decoding, the
drivers.AdditionalLayer is never released. This leaks the reference
counter that was incremented during lookup, preventing the additional
layer store from garbage collecting the layer. Call Release() on
error paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe force-pushed the image-layer-store-fixes branch from 0188ec8 to 6ade24d Compare March 12, 2026 21:10
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe force-pushed the image-layer-store-fixes branch from 6ade24d to adbda5a Compare March 12, 2026 21:26
Copy link
Copy Markdown
Member

@TomSweeneyRedHat TomSweeneyRedHat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
and happy green test buttons

@giuseppe
Copy link
Copy Markdown
Member Author

@containers/container-libs-maintainers PTAL

Copy link
Copy Markdown
Member

@Honny1 Honny1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mtrmac mtrmac merged commit 1fe5b30 into containers:main Mar 19, 2026
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

storage Related to "storage" package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants