Conversation
karlseguin
left a comment
There was a problem hiding this comment.
This needs unit tests. I realize that testing the cache, that uses the file system, is going to be a challenge. But there's just too much code and cases not to have this well covered. We do have unit tests which make http requests to the internal test http server. Need to add routes that return the desired headers. On startup (testing.zig tests:beforeAll) wipe some temp folder and set that as the cache.
| self.dir.close(); | ||
| } | ||
|
|
||
| pub fn cache(self: *FsCache) Cache { |
| const HASHED_TMP_PATH_LEN = HASHED_PATH_LEN + 4; | ||
|
|
||
| fn hashKey(key: []const u8) [HASHED_KEY_LEN]u8 { | ||
| const h = std.hash.Wyhash.hash(0, key); |
There was a problem hiding this comment.
Wyhash is a general purpose hashing function, you'll get collisions, which could result in the wrong cached page being returned.
| return hex; | ||
| } | ||
|
|
||
| fn serializeMeta(writer: *std.Io.Writer, meta: *const CachedMetadata) !void { |
There was a problem hiding this comment.
A lot of writing / parsing code. Did you consider either using zon or, if space is an issue, a json array?
Whatever you do, I'd add a version number as the first field in the header.
| meta_file.close(); | ||
| } | ||
| errdefer self.dir.deleteFile(&meta_tmp_path) catch {}; | ||
| try self.dir.rename(&meta_tmp_path, &meta_path); |
There was a problem hiding this comment.
I think there are error (and panic)cases where the meta file sticks around. Not sure how problematic it is to have a meta but no body file.
What are the advantages of splitting this up into 2 files? I feel that a single file would have less edge-cases (e.g. successful write to one file, but not the other) and could result in fewer system calls.
| const body_file = try self.dir.createFile(&body_tmp_path, .{}); | ||
| errdefer { | ||
| body_file.close(); | ||
| self.dir.deleteFile(&body_tmp_path) catch {}; |
There was a problem hiding this comment.
Can writeAll result in a partial write? Because, if you do partially write, and then this fails, you now have garbage in the file that you might fetch in get
| } | ||
|
|
||
| // no-store: must not be stored | ||
| if (cc.no_store) return null; |
There was a problem hiding this comment.
we could exit the loop immediately if that's the cache? Don't even need to store this in CacheControl:
cc = CacheControl.parse(hdr.value) orelse return null;That's true for all exit paths.
| if (has_authorization and !cc.is_public and cc.s_maxage == null) return null; | ||
|
|
||
| // Only cache 200 for now. Technically, we can cache others. | ||
| switch (status) { |
There was a problem hiding this comment.
Entire body was buffered and headers parsed..this should be done much earlier.
| } | ||
|
|
||
| pub fn deinit(self: CachedMetadata, allocator: std.mem.Allocator) void { | ||
| allocator.free(self.url); |
There was a problem hiding this comment.
Arena, ideally from the arenapool.
| } else if (std.ascii.eqlIgnoreCase(directive, "must-revalidate")) { | ||
| cc.must_revalidate = true; | ||
| } else if (std.ascii.eqlIgnoreCase(directive, "immutable")) { | ||
| cc.immutable = true; |
There was a problem hiding this comment.
We don't use a a lot of these. Should we? If not, should we bother parsing them and storing them in the meta?
| if (cc.max_age == null) return null; | ||
|
|
||
| // Set-Cookie without explicit public | ||
| if (has_set_cookie and !cc.is_public) return null; |
There was a problem hiding this comment.
We should never cache anything that isn't public. While we're the user-agent, the cache doesn't have any isolation and acts much more like a reverse proxy's cache.
| } else if (std.ascii.startsWithIgnoreCase(directive, "s-maxage=")) { | ||
| cc.s_maxage = std.fmt.parseInt(u64, directive[9..], 10) catch null; | ||
| // s-maxage takes precedence over max-age | ||
| cc.max_age = cc.s_maxage orelse cc.max_age; |
There was a problem hiding this comment.
we only serialize max_age, so if the header is defined as s-maxage=1000,max-age=0 it will do the opposite and max-age will take precedence over s-maxage.
| // cache-control | ||
| try writer.print("{d}\n", .{meta.cache_control.max_age orelse 0}); | ||
| try writer.print("{}\n{}\n{}\n{}\n", .{ | ||
| meta.cache_control.max_age != null, |
There was a problem hiding this comment.
do we ever cache something with no max-age? What does a max-age of 0 mean?
| }; | ||
| } | ||
|
|
||
| pub fn cacheDir(self: *const Config) ?[]const u8 { |
There was a problem hiding this comment.
This should have a default folder, something like $XDG_CACHE_DIR/lightpanda
| // self.debug_transfer_intercept_state = @intFromEnum(transfer._intercept_state); | ||
| // self.debug_transfer_auth_challenge = transfer._auth_challenge != null; | ||
| // self.debug_transfer_easy_id = if (transfer._conn) |c| @intFromPtr(c.easy) else 0; | ||
| // } |
There was a problem hiding this comment.
Why are you removing this log?
| } | ||
| }; | ||
|
|
||
| pub const LiveTransfer = struct { |
There was a problem hiding this comment.
At least to reduce the number of conflicts
This adds caching at the HTTP level, allowing the use of a local directory as a RFC 9111 cache.