Skip to content

HTTP Caching#1934

Open
mookums wants to merge 15 commits intomainfrom
http-cache
Open

HTTP Caching#1934
mookums wants to merge 15 commits intomainfrom
http-cache

Conversation

@mookums
Copy link
Contributor

@mookums mookums commented Mar 20, 2026

This adds caching at the HTTP level, allowing the use of a local directory as a RFC 9111 cache.

@mookums mookums marked this pull request as ready for review March 20, 2026 09:10
Copy link
Collaborator

@karlseguin karlseguin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs unit tests. I realize that testing the cache, that uses the file system, is going to be a challenge. But there's just too much code and cases not to have this well covered. We do have unit tests which make http requests to the internal test http server. Need to add routes that return the desired headers. On startup (testing.zig tests:beforeAll) wipe some temp folder and set that as the cache.

self.dir.close();
}

pub fn cache(self: *FsCache) Cache {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used?

const HASHED_TMP_PATH_LEN = HASHED_PATH_LEN + 4;

fn hashKey(key: []const u8) [HASHED_KEY_LEN]u8 {
const h = std.hash.Wyhash.hash(0, key);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wyhash is a general purpose hashing function, you'll get collisions, which could result in the wrong cached page being returned.

return hex;
}

fn serializeMeta(writer: *std.Io.Writer, meta: *const CachedMetadata) !void {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of writing / parsing code. Did you consider either using zon or, if space is an issue, a json array?

Whatever you do, I'd add a version number as the first field in the header.

meta_file.close();
}
errdefer self.dir.deleteFile(&meta_tmp_path) catch {};
try self.dir.rename(&meta_tmp_path, &meta_path);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are error (and panic)cases where the meta file sticks around. Not sure how problematic it is to have a meta but no body file.

What are the advantages of splitting this up into 2 files? I feel that a single file would have less edge-cases (e.g. successful write to one file, but not the other) and could result in fewer system calls.

const body_file = try self.dir.createFile(&body_tmp_path, .{});
errdefer {
body_file.close();
self.dir.deleteFile(&body_tmp_path) catch {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can writeAll result in a partial write? Because, if you do partially write, and then this fails, you now have garbage in the file that you might fetch in get

}

// no-store: must not be stored
if (cc.no_store) return null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could exit the loop immediately if that's the cache? Don't even need to store this in CacheControl:

cc = CacheControl.parse(hdr.value) orelse return null;

That's true for all exit paths.

if (has_authorization and !cc.is_public and cc.s_maxage == null) return null;

// Only cache 200 for now. Technically, we can cache others.
switch (status) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entire body was buffered and headers parsed..this should be done much earlier.

}

pub fn deinit(self: CachedMetadata, allocator: std.mem.Allocator) void {
allocator.free(self.url);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arena, ideally from the arenapool.

} else if (std.ascii.eqlIgnoreCase(directive, "must-revalidate")) {
cc.must_revalidate = true;
} else if (std.ascii.eqlIgnoreCase(directive, "immutable")) {
cc.immutable = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use a a lot of these. Should we? If not, should we bother parsing them and storing them in the meta?

if (cc.max_age == null) return null;

// Set-Cookie without explicit public
if (has_set_cookie and !cc.is_public) return null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never cache anything that isn't public. While we're the user-agent, the cache doesn't have any isolation and acts much more like a reverse proxy's cache.

} else if (std.ascii.startsWithIgnoreCase(directive, "s-maxage=")) {
cc.s_maxage = std.fmt.parseInt(u64, directive[9..], 10) catch null;
// s-maxage takes precedence over max-age
cc.max_age = cc.s_maxage orelse cc.max_age;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only serialize max_age, so if the header is defined as s-maxage=1000,max-age=0 it will do the opposite and max-age will take precedence over s-maxage.

// cache-control
try writer.print("{d}\n", .{meta.cache_control.max_age orelse 0});
try writer.print("{}\n{}\n{}\n{}\n", .{
meta.cache_control.max_age != null,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we ever cache something with no max-age? What does a max-age of 0 mean?

};
}

pub fn cacheDir(self: *const Config) ?[]const u8 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a default folder, something like $XDG_CACHE_DIR/lightpanda

// self.debug_transfer_intercept_state = @intFromEnum(transfer._intercept_state);
// self.debug_transfer_auth_challenge = transfer._auth_challenge != null;
// self.debug_transfer_easy_id = if (transfer._conn) |c| @intFromPtr(c.easy) else 0;
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing this log?

}
};

pub const LiveTransfer = struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least to reduce the number of conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants