-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cache refresh on darwin #249
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to rebase #248 on this once it merges. |
pkg/cdi/default-cache_test.go
Outdated
@@ -129,6 +130,9 @@ devices: | |||
} | |||
|
|||
func TestDefaultCacheRefresh(t *testing.T) { | |||
if runtime.GOOS == "darwin" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails with:
=== RUN TestDefaultCacheRefresh/empty_cache,_add_one_Spec
default-cache_test.go:292:
Error Trace: default-cache_test.go:292
Error: Not equal:
expected: []string{"vendor1.com/device=dev1"}
actual : []string(nil)
Diff:
--- Expected
+++ Actual
@@ -1,4 +1,2 @@
-([]string) (len=1) {
- (string) (len=23) "vendor1.com/device=dev1"
-}
+([]string) <nil>
Test: TestDefaultCacheRefresh/empty_cache,_add_one_Spec
=== RUN TestDefaultCacheRefresh/two_Specs,_remove_one
--- FAIL: TestDefaultCacheRefresh (0.05s)
--- FAIL: TestDefaultCacheRefresh/empty_cache,_add_one_Spec (0.03s)
--- PASS: TestDefaultCacheRefresh/two_Specs,_remove_one (0.03s)
pkg/cdi/cache_test.go
Outdated
@@ -225,6 +226,9 @@ devices: | |||
} | |||
|
|||
func TestRefreshCache(t *testing.T) { | |||
if runtime.GOOS == "darwin" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fails with:
=== RUN TestRefreshCache/empty_cache,_add_one_Spec
cache_test.go:581:
Error Trace: cache_test.go:581
Error: Not equal:
expected: []string{"vendor1.com/device=dev1"}
actual : []string(nil)
Diff:
--- Expected
+++ Actual
@@ -1,4 +1,2 @@
-([]string) (len=1) {
- (string) (len=23) "vendor1.com/device=dev1"
-}
+([]string) <nil>
Test: TestRefreshCache/empty_cache,_add_one_Spec
=== RUN TestRefreshCache/one_Spec,_add_another,_no_shadowing,_no_conflicts
cache_test.go:581:
Error Trace: cache_test.go:581
Error: Not equal:
expected: []string{"vendor1.com/device=dev1", "vendor1.com/device=dev2"}
actual : []string{"vendor1.com/device=dev1"}
Diff:
--- Expected
+++ Actual
@@ -1,4 +1,3 @@
-([]string) (len=2) {
- (string) (len=23) "vendor1.com/device=dev1",
- (string) (len=23) "vendor1.com/device=dev2"
+([]string) (len=1) {
+ (string) (len=23) "vendor1.com/device=dev1"
}
Test: TestRefreshCache/one_Spec,_add_another,_no_shadowing,_no_conflicts
=== RUN TestRefreshCache/two_Specs,_remove_one
=== RUN TestRefreshCache/one_Spec,_add_another,_shadowing
cache_test.go:587:
Error Trace: cache_test.go:587
Error: Not equal:
expected: 0
actual : 1
Test: TestRefreshCache/one_Spec,_add_another,_shadowing
=== RUN TestRefreshCache/one_Spec,_add_another,_conflicts
cache_test.go:581:
Error Trace: cache_test.go:581
Error: Not equal:
expected: []string{"vendor1.com/device=dev2", "vendor1.com/device=dev3"}
actual : []string{"vendor1.com/device=dev1", "vendor1.com/device=dev2"}
Diff:
--- Expected
+++ Actual
@@ -1,4 +1,4 @@
([]string) (len=2) {
- (string) (len=23) "vendor1.com/device=dev2",
- (string) (len=23) "vendor1.com/device=dev3"
+ (string) (len=23) "vendor1.com/device=dev1",
+ (string) (len=23) "vendor1.com/device=dev2"
}
Test: TestRefreshCache/one_Spec,_add_another,_conflicts
--- FAIL: TestRefreshCache (0.70s)
--- FAIL: TestRefreshCache/empty_cache,_add_one_Spec (0.11s)
--- FAIL: TestRefreshCache/one_Spec,_add_another,_no_shadowing,_no_conflicts (0.13s)
--- PASS: TestRefreshCache/two_Specs,_remove_one (0.15s)
--- FAIL: TestRefreshCache/one_Spec,_add_another,_shadowing (0.15s)
--- FAIL: TestRefreshCache/one_Spec,_add_another,_conflicts (0.16s)
Signed-off-by: Evan Lezar <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
d760681
to
d68aa49
Compare
@@ -529,17 +530,24 @@ func (w *watch) watch(fsw *fsnotify.Watcher, m *sync.Mutex, refresh func() error | |||
if watch == nil { | |||
return | |||
} | |||
|
|||
eventMask := fsnotify.Rename | fsnotify.Remove | fsnotify.Write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bart0sh after your probing I spent some more time investigating this, and noted that the events that are generated on darwin
are CREATE
events and not WRITE
events.
I thought it safer to not modify existing behaviour on non-darwin platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for investigation. I'm wondering why CREATE was not considered on Linux. I believe it's also generated when file is created. Do we intentionally skip it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klihub since you were the one who implemented this, maybe you could comment.
Note that fsnotify
uses two different sources for the events on Linux and MacOS systems. It may also be that the os.Rename
calls that are use in the tests trigger different events. There are also a number of issues such as fsnotify/fsnotify#54 calling out inconsistent behaviour on MacOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for investigation. I'm wondering why CREATE was not considered on Linux. I believe it's also generated when file is created. Do we intentionally skip it?
On Linux, when you
- create a non-empty file you always get an fsnotify.Write event (in addition to fsnotify.Create)
- rename a file, you always get an fsnotify.Rename event (plus an fsnotify.Create if the renamed file is in the same directory)
Therefore, it should not be necessary to trigger a rescan for a fsnotify.Create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some more time investigating this, and noted that the events that are generated on darwin are CREATE events and not WRITE events
Nice.
two different sources for the events on Linux and MacOS systems.
Trivia. macOS itself provides two different mechanisms: FSEvents, kqueue. fsnotify seems to use kqueue.
Linux has its own mechanism, called inotify. Here is a bikeshed thread comparing both interfaces.
I like the docs of https://pythonhosted.org/watchdog/installation.html#supported-platforms-and-caveats.
Interesting: https://watchexec.github.io/docs/macos-fsevents.html
eventMask := fsnotify.Rename | fsnotify.Remove | fsnotify.Write | ||
// On macOS, we also need to watch for Create events. | ||
if runtime.GOOS == "darwin" { | ||
eventMask |= fsnotify.Create |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if Darwin behaves the same way as Linux, IOW a rename to the same directory generates fsnotify.Rename+fsnotify.Create and a rename to a different directory generates an fsnotify.Rename, then I think it should not be necessary to watch for fsnotify.Create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, does it mean that we don't need this if
statement at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klihub events are the same, but their order is different. On Linux it generates fsnotify.Rename+fsnotify.Create, on Darwin: fsnotify.Create+fsnotify.Rename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specific issue in the tests is that we:
- Start a watch on a directory
- Create a file
foo.yaml.tmp
in that folder - Write to
foo.yaml.tmp
. - Rename that file to
foo.yaml
This was not triggering the refresh in the auto refresh case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assume the following example (adapted from the test code):
package main
import (
"log"
"os"
"path/filepath"
)
func main() {
dir := "/tmp/fsnotify"
filename := "test.txt"
data := "hello world"
file := filepath.Join(dir, filename)
tmp := file + ".tmp"
if err := os.WriteFile(tmp, []byte(data), 0644); err != nil {
log.Fatalf("failed to write file %q: %w", tmp, err)
}
if err := os.Rename(tmp, file); err != nil {
log.Fatalf("failed to rename %q to %q: %w", tmp, file, err)
}
}
Using the example from: https://github.com/fsnotify/fsnotify/tree/main/cmd/fsnotify I run:
./fsnotify watch /tmp/fsnotify
When I run the example above on darwin
, I see:
./fsnotify watch /tmp/fsnotify
22:14:04.7182 ready; press ^C to exit
22:17:11.7594 1 CREATE "/tmp/fsnotify/test.txt"
Repeating in a linux container, I see:
root@7ff8b7d75a3c:/work# ./fsnotify watch /tmp/fsnotify
21:22:25.3845 ready; press ^C to exit
21:22:52.6097 1 CREATE "/tmp/fsnotify/test.txt.tmp"
21:22:52.6097 2 WRITE "/tmp/fsnotify/test.txt.tmp"
21:22:52.6098 3 RENAME "/tmp/fsnotify/test.txt.tmp"
21:22:52.6098 4 CREATE "/tmp/fsnotify/test.txt" ← "/tmp/fsnotify/test.txt.tmp"
Note that when looking at the event loop, we see:
if !ok {
return
}
if (event.Op & eventMask) == 0 {
continue
}
if event.Op == fsnotify.Write || event.Op == fsnotify.Create {
if ext := filepath.Ext(event.Name); ext != ".json" && ext != ".yaml" {
continue
}
}
m.Lock()
if event.Op == fsnotify.Remove && w.tracked[event.Name] {
w.update(dirErrors, event.Name)
} else {
w.update(dirErrors)
}
_ = refresh()
m.Unlock()
meaning that it's the RENAME
that is triggering the w.update(dirErrors)
unconditionally in this case.
Note that the CREATE
added in the darwin case is what is needed to trigger the refresh there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unix.Renameat2 generates the same events as os.Rename: fsnotify.Rename+fsnotify.Create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no WRITE events genereated on darwin
Yep, but it's generated if the file already exists. We should probably consider testing it with the cache API calls. It could also make sense to modify it to use os.Create
before ioutil.WriteFile
to force generation of the WRITE
event if we want the same code to work on Darwin and Linux.
BTW, since then CDI supports Darwin? I was under impression that only Linux is supported. We should explicitly mention this in the docs, I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't control HOW users write specs to the spec directories, so cannot make assumptions about which APIs are being used.
My main motivator for this modification was to allow me to run the unit tests on my Mac.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elezar I am fine with the current state where on Darwin we also watch for the create event, if that helps your daily workflow which I gather is on MacOS/Darwin. IMO, this is not necessary on Linux and not helpful either (it would only generate an extra event if someone created an empty file without ever writing any content to it).
About the rest of the discussion related to trying to write/create files differently to generate a different set, or order, of inotify/fsnotify events... I think that is a bit futile. As @elezar pointed out, we don't control how users write Spec files and neither should we try. Spec reading/parsing is supposed to be fault tolerant for multiple reasons, the uncontrollability and thus unpredictability of the set and order of events generated being one of them. Therefore, as long as we make sure that we never fail to act on an event eventually, and we don't delay acting on events too long (this is the reason why we don't have an timer-based event filtering/deduplication) we should be fine... IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klihub events are the same, but their order is different. On Linux it generates fsnotify.Rename+fsnotify.Create, on Darwin: fsnotify.Create+fsnotify.Rename.
On linux, both events are generated after the file has been renamed. On Darwin, I'm not sure, although one would hope that a rename(2) and related variants would be atomic (IIRC, required for POSIX-compliance) which should mean that both are generated after the fact, otherwise it is not really atomic if an intermediate state is externally observable. Then again, since the tests did fail on Darwin and I assume the failing tests check the right thing to decide success/failure, some of this mut be false on Darwin... but then based on @elezar's experience and fix it looks like adding an extra watch on Darwin for fsnotify.Create fixes things there... so I think this should be now fine there, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@elezar I have a small questio-nit, but even with that considered this looks good to me.
These changes fix running the unit tests on MacOS and make some minor golangci-lint changes.
The core of the problem is that file creation using
os.Rename
ondarwin
generatesCREATE
events and notWRITE
events. This change also responds to CREATE events when updating the cache.