Describe the bug
Hi! We were trying to use the filesystem backend as a local on-disk cache for camera frame data — frames published as zenoh PUTs at ~30 fps, with a 48 h retention janitor pruning old files. Things worked fine for a couple of days while the cache was small, then zenohd started OOMing on every restart.
I dug into the source to figure out what was going on, and I think I found the culprit. When storage_manager starts a storage, it calls get_all_entries() to seed its latest_updates map. get_all_entries walks every file and calls read_file() on each — but read_file does a full read_to_end() of the payload, even though get_all_entries only needs the timestamp and discards the payload and encoding. You end up storing the entire cache - for every payload - in memory all at once and zenohd crashes.
get_all_entries only needs the timestamp and it's already available without reading the payload via data_info_mgr.get_encoding_and_timestamp(). Switching get_all_entries to that
path should fix the issue. Happy to put in a PR if y'all agree with the approach.
To reproduce
- Configure zenohd with storage_manager and a filesystem-backend storage, e.g.:
{
"plugins": {
"storage_manager": {
"volumes": { "fs": {} },
"storages": {
"cache": {
"key_expr": "demo/cache/**",
}
}
}
}
}
- Publish enough data under
demo/cache/** to exceed the host's available memory.
- Restart zenohd.
Expected: zenohd starts up and serves queries against demo/cache/** from disk.
Actual: zenohd hangs after starting the storage manager and the process is OOM-killed.
System info
- Platform: NixOS 25.11 on aarch64
- zenohd: 1.9.0
- Store size at time of crash: ~130 GB, ~1.3M files
- Available host memory: 64 GB
Describe the bug
Hi! We were trying to use the filesystem backend as a local on-disk cache for camera frame data — frames published as zenoh PUTs at ~30 fps, with a 48 h retention janitor pruning old files. Things worked fine for a couple of days while the cache was small, then zenohd started OOMing on every restart.
I dug into the source to figure out what was going on, and I think I found the culprit. When storage_manager starts a storage, it calls
get_all_entries()to seed its latest_updates map.get_all_entrieswalks every file and callsread_file()on each — butread_filedoes a fullread_to_end()of the payload, even thoughget_all_entriesonly needs the timestamp and discards the payload and encoding. You end up storing the entire cache - for every payload - in memory all at once and zenohd crashes.get_all_entriesonly needs the timestamp and it's already available without reading the payload viadata_info_mgr.get_encoding_and_timestamp(). Switchingget_all_entriesto thatpath should fix the issue. Happy to put in a PR if y'all agree with the approach.
To reproduce
demo/cache/**to exceed the host's available memory.Expected: zenohd starts up and serves queries against
demo/cache/**from disk.Actual: zenohd hangs after starting the storage manager and the process is OOM-killed.
System info