Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cache refresh on darwin #249

Merged
merged 3 commits into from
Feb 6, 2025
Merged

Conversation

elezar
Copy link
Contributor

@elezar elezar commented Feb 4, 2025

These changes fix running the unit tests on MacOS and make some minor golangci-lint changes.

The core of the problem is that file creation using os.Rename on darwin generates CREATE events and not WRITE events. This change also responds to CREATE events when updating the cache.

@elezar elezar requested a review from klihub February 4, 2025 22:53
@elezar elezar self-assigned this Feb 4, 2025
klihub
klihub previously approved these changes Feb 5, 2025
Copy link
Contributor

@klihub klihub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

nit: I think a git rebase 081e75b3d69ed0911640e92d9b0af41181934a25 --onto 327c7f88092fbe9e378b8b3b5ef87a6633bd4f48 before pushing would have made it more sha1-obvious that #248 and #249 indeed share the bottom-most commit.

@elezar
Copy link
Contributor Author

elezar commented Feb 5, 2025

I was going to rebase #248 on this once it merges.

@elezar elezar mentioned this pull request Feb 5, 2025
@@ -129,6 +130,9 @@ devices:
}

func TestDefaultCacheRefresh(t *testing.T) {
if runtime.GOOS == "darwin" {
Copy link
Contributor Author

@elezar elezar Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails with:

=== RUN   TestDefaultCacheRefresh/empty_cache,_add_one_Spec
    default-cache_test.go:292:
                Error Trace:    default-cache_test.go:292
                Error:          Not equal:
                                expected: []string{"vendor1.com/device=dev1"}
                                actual  : []string(nil)

                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1,4 +1,2 @@
                                -([]string) (len=1) {
                                - (string) (len=23) "vendor1.com/device=dev1"
                                -}
                                +([]string) <nil>

                Test:           TestDefaultCacheRefresh/empty_cache,_add_one_Spec
=== RUN   TestDefaultCacheRefresh/two_Specs,_remove_one
--- FAIL: TestDefaultCacheRefresh (0.05s)
    --- FAIL: TestDefaultCacheRefresh/empty_cache,_add_one_Spec (0.03s)
    --- PASS: TestDefaultCacheRefresh/two_Specs,_remove_one (0.03s)

@@ -225,6 +226,9 @@ devices:
}

func TestRefreshCache(t *testing.T) {
if runtime.GOOS == "darwin" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fails with:

=== RUN   TestRefreshCache/empty_cache,_add_one_Spec
    cache_test.go:581:
                Error Trace:    cache_test.go:581
                Error:          Not equal:
                                expected: []string{"vendor1.com/device=dev1"}
                                actual  : []string(nil)

                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1,4 +1,2 @@
                                -([]string) (len=1) {
                                - (string) (len=23) "vendor1.com/device=dev1"
                                -}
                                +([]string) <nil>

                Test:           TestRefreshCache/empty_cache,_add_one_Spec
=== RUN   TestRefreshCache/one_Spec,_add_another,_no_shadowing,_no_conflicts
    cache_test.go:581:
                Error Trace:    cache_test.go:581
                Error:          Not equal:
                                expected: []string{"vendor1.com/device=dev1", "vendor1.com/device=dev2"}
                                actual  : []string{"vendor1.com/device=dev1"}

                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1,4 +1,3 @@
                                -([]string) (len=2) {
                                - (string) (len=23) "vendor1.com/device=dev1",
                                - (string) (len=23) "vendor1.com/device=dev2"
                                +([]string) (len=1) {
                                + (string) (len=23) "vendor1.com/device=dev1"
                                 }
                Test:           TestRefreshCache/one_Spec,_add_another,_no_shadowing,_no_conflicts
=== RUN   TestRefreshCache/two_Specs,_remove_one
=== RUN   TestRefreshCache/one_Spec,_add_another,_shadowing
    cache_test.go:587:
                Error Trace:    cache_test.go:587
                Error:          Not equal:
                                expected: 0
                                actual  : 1
                Test:           TestRefreshCache/one_Spec,_add_another,_shadowing
=== RUN   TestRefreshCache/one_Spec,_add_another,_conflicts
    cache_test.go:581:
                Error Trace:    cache_test.go:581
                Error:          Not equal:
                                expected: []string{"vendor1.com/device=dev2", "vendor1.com/device=dev3"}
                                actual  : []string{"vendor1.com/device=dev1", "vendor1.com/device=dev2"}

                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1,4 +1,4 @@
                                 ([]string) (len=2) {
                                - (string) (len=23) "vendor1.com/device=dev2",
                                - (string) (len=23) "vendor1.com/device=dev3"
                                + (string) (len=23) "vendor1.com/device=dev1",
                                + (string) (len=23) "vendor1.com/device=dev2"
                                 }
                Test:           TestRefreshCache/one_Spec,_add_another,_conflicts
--- FAIL: TestRefreshCache (0.70s)
    --- FAIL: TestRefreshCache/empty_cache,_add_one_Spec (0.11s)
    --- FAIL: TestRefreshCache/one_Spec,_add_another,_no_shadowing,_no_conflicts (0.13s)
    --- PASS: TestRefreshCache/two_Specs,_remove_one (0.15s)
    --- FAIL: TestRefreshCache/one_Spec,_add_another,_shadowing (0.15s)
    --- FAIL: TestRefreshCache/one_Spec,_add_another,_conflicts (0.16s)

@@ -529,17 +530,24 @@ func (w *watch) watch(fsw *fsnotify.Watcher, m *sync.Mutex, refresh func() error
if watch == nil {
return
}

eventMask := fsnotify.Rename | fsnotify.Remove | fsnotify.Write
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bart0sh after your probing I spent some more time investigating this, and noted that the events that are generated on darwin are CREATE events and not WRITE events.

I thought it safer to not modify existing behaviour on non-darwin platforms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigation. I'm wondering why CREATE was not considered on Linux. I believe it's also generated when file is created. Do we intentionally skip it?

Copy link
Contributor Author

@elezar elezar Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klihub since you were the one who implemented this, maybe you could comment.

Note that fsnotify uses two different sources for the events on Linux and MacOS systems. It may also be that the os.Rename calls that are use in the tests trigger different events. There are also a number of issues such as fsnotify/fsnotify#54 calling out inconsistent behaviour on MacOS.

Copy link
Contributor

@klihub klihub Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigation. I'm wondering why CREATE was not considered on Linux. I believe it's also generated when file is created. Do we intentionally skip it?

On Linux, when you

  • create a non-empty file you always get an fsnotify.Write event (in addition to fsnotify.Create)
  • rename a file, you always get an fsnotify.Rename event (plus an fsnotify.Create if the renamed file is in the same directory)
    Therefore, it should not be necessary to trigger a rescan for a fsnotify.Create.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some more time investigating this, and noted that the events that are generated on darwin are CREATE events and not WRITE events

Nice.

two different sources for the events on Linux and MacOS systems.

Trivia. macOS itself provides two different mechanisms: FSEvents, kqueue. fsnotify seems to use kqueue.

Linux has its own mechanism, called inotify. Here is a bikeshed thread comparing both interfaces.

I like the docs of https://pythonhosted.org/watchdog/installation.html#supported-platforms-and-caveats.

Interesting: https://watchexec.github.io/docs/macos-fsevents.html

@elezar elezar changed the title Minor test fixes Fix cache refresh on darwin Feb 5, 2025
@elezar elezar requested review from klihub and bart0sh February 5, 2025 14:37
eventMask := fsnotify.Rename | fsnotify.Remove | fsnotify.Write
// On macOS, we also need to watch for Create events.
if runtime.GOOS == "darwin" {
eventMask |= fsnotify.Create
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if Darwin behaves the same way as Linux, IOW a rename to the same directory generates fsnotify.Rename+fsnotify.Create and a rename to a different directory generates an fsnotify.Rename, then I think it should not be necessary to watch for fsnotify.Create.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, does it mean that we don't need this if statement at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klihub events are the same, but their order is different. On Linux it generates fsnotify.Rename+fsnotify.Create, on Darwin: fsnotify.Create+fsnotify.Rename.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific issue in the tests is that we:

  1. Start a watch on a directory
  2. Create a file foo.yaml.tmp in that folder
  3. Write to foo.yaml.tmp.
  4. Rename that file to foo.yaml

This was not triggering the refresh in the auto refresh case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume the following example (adapted from the test code):

package main

import (
	"log"
	"os"
	"path/filepath"
)

func main() {
	dir := "/tmp/fsnotify"
	filename := "test.txt"

	data := "hello world"

	file := filepath.Join(dir, filename)
	tmp := file + ".tmp"
	if err := os.WriteFile(tmp, []byte(data), 0644); err != nil {
		log.Fatalf("failed to write file %q: %w", tmp, err)
	}
	if err := os.Rename(tmp, file); err != nil {
		log.Fatalf("failed to rename %q to %q: %w", tmp, file, err)
	}
}

Using the example from: https://github.com/fsnotify/fsnotify/tree/main/cmd/fsnotify I run:

./fsnotify watch /tmp/fsnotify

When I run the example above on darwin, I see:

./fsnotify watch /tmp/fsnotify
22:14:04.7182 ready; press ^C to exit
22:17:11.7594   1 CREATE        "/tmp/fsnotify/test.txt"

Repeating in a linux container, I see:

root@7ff8b7d75a3c:/work# ./fsnotify watch /tmp/fsnotify
21:22:25.3845 ready; press ^C to exit
21:22:52.6097   1 CREATE        "/tmp/fsnotify/test.txt.tmp"
21:22:52.6097   2 WRITE         "/tmp/fsnotify/test.txt.tmp"
21:22:52.6098   3 RENAME        "/tmp/fsnotify/test.txt.tmp"
21:22:52.6098   4 CREATE        "/tmp/fsnotify/test.txt" ← "/tmp/fsnotify/test.txt.tmp"

Note that when looking at the event loop, we see:

			if !ok {
				return
			}

			if (event.Op & eventMask) == 0 {
				continue
			}
			if event.Op == fsnotify.Write || event.Op == fsnotify.Create {
				if ext := filepath.Ext(event.Name); ext != ".json" && ext != ".yaml" {
					continue
				}
			}

			m.Lock()
			if event.Op == fsnotify.Remove && w.tracked[event.Name] {
				w.update(dirErrors, event.Name)
			} else {
				w.update(dirErrors)
			}
			_ = refresh()
			m.Unlock()

meaning that it's the RENAME that is triggering the w.update(dirErrors) unconditionally in this case.

Note that the CREATE added in the darwin case is what is needed to trigger the refresh there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unix.Renameat2 generates the same events as os.Rename: fsnotify.Rename+fsnotify.Create.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elezar

There are no WRITE events genereated on darwin

Yep, but it's generated if the file already exists. We should probably consider testing it with the cache API calls. It could also make sense to modify it to use os.Create before ioutil.WriteFile to force generation of the WRITE event if we want the same code to work on Darwin and Linux.

BTW, since then CDI supports Darwin? I was under impression that only Linux is supported. We should explicitly mention this in the docs, I believe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't control HOW users write specs to the spec directories, so cannot make assumptions about which APIs are being used.

My main motivator for this modification was to allow me to run the unit tests on my Mac.

Copy link
Contributor

@klihub klihub Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elezar I am fine with the current state where on Darwin we also watch for the create event, if that helps your daily workflow which I gather is on MacOS/Darwin. IMO, this is not necessary on Linux and not helpful either (it would only generate an extra event if someone created an empty file without ever writing any content to it).

About the rest of the discussion related to trying to write/create files differently to generate a different set, or order, of inotify/fsnotify events... I think that is a bit futile. As @elezar pointed out, we don't control how users write Spec files and neither should we try. Spec reading/parsing is supposed to be fault tolerant for multiple reasons, the uncontrollability and thus unpredictability of the set and order of events generated being one of them. Therefore, as long as we make sure that we never fail to act on an event eventually, and we don't delay acting on events too long (this is the reason why we don't have an timer-based event filtering/deduplication) we should be fine... IMO.

Copy link
Contributor

@klihub klihub Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klihub events are the same, but their order is different. On Linux it generates fsnotify.Rename+fsnotify.Create, on Darwin: fsnotify.Create+fsnotify.Rename.

On linux, both events are generated after the file has been renamed. On Darwin, I'm not sure, although one would hope that a rename(2) and related variants would be atomic (IIRC, required for POSIX-compliance) which should mean that both are generated after the fact, otherwise it is not really atomic if an intermediate state is externally observable. Then again, since the tests did fail on Darwin and I assume the failing tests check the right thing to decide success/failure, some of this mut be false on Darwin... but then based on @elezar's experience and fix it looks like adding an extra watch on Darwin for fsnotify.Create fixes things there... so I think this should be now fine there, too.

Copy link
Contributor

@klihub klihub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@elezar I have a small questio-nit, but even with that considered this looks good to me.

@bart0sh bart0sh merged commit a76f272 into cncf-tags:main Feb 6, 2025
8 checks passed
@elezar elezar deleted the fix-darwin-tests branch February 17, 2025 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants