Skip to content

feat: fetch add wait_until parameter for page loads options#1896

Merged
karlseguin merged 3 commits intolightpanda-io:mainfrom
shaewe180:feat/fetch-wait-options
Mar 20, 2026
Merged

feat: fetch add wait_until parameter for page loads options#1896
karlseguin merged 3 commits intolightpanda-io:mainfrom
shaewe180:feat/fetch-wait-options

Conversation

@shaewe180
Copy link
Contributor

@shaewe180 shaewe180 commented Mar 18, 2026

Overview
Currently, the fetch command uses a hardcoded 5000ms timeout and a fixed "wait until idle" strategy. This limits the tool's effectiveness when dealing with slow-loading pages, heavy Single Page Applications (SPAs), or scenarios requiring specific lifecycle events (e.g., only waiting for the DOM to become interactive).

This PR introduces two new command-line arguments to the fetch command: --wait_ms and --wait_until.

Key Changes

  1. Configurable Wait Time: Replaced the hardcoded 5000ms limit with the --wait_ms flag (defaults to 5000ms to maintain backward compatibility).
  2. Flexible Wait Strategies: Introduced the --wait_until flag, allowing users to choose the condition for returning the DOM:
    • load (default): Waits for the window.onload event to fire.
    • domcontentloaded: Returns immediately after the HTML is parsed and scripts have finished running, without waiting for subresources like images to load.
    • networkidle: Waits until there are no active HTTP requests and the page is in a stable state.
    • fixed: Forces the engine to wait for the exact duration specified by --wait_ms, regardless of what events the page triggers.
  3. Engine Refactoring: Updated src/browser/Session.zig to support these strategies in the core wait loop, ensuring the engine can intelligently yield or return based on the requested events and pending JavaScript macrotasks.
  4. System-wide Consistency: Updated all internal call sites (tests, MCP tools, and CDP domains) to align with the new Session.wait function signature.

Benefits:

  • Support for Slow-loading Websites: Users can now scrape pages that take longer than 5 seconds to load (when used in conjunction with --http_timeout).
  • Performance Optimization: For simple pages, using --wait_until domcontentloaded skips waiting for unnecessary static resources, significantly speeding up the dump process.
  • Predictability: The fixed strategy is highly valuable for debugging pages with complex, non-deterministic background loading logic.

Usage Examples:

# Wait up to 10 seconds until the network is completely idle
./lightpanda fetch --dump html --wait_ms 10000 --wait_until networkidle https://example.com

# Quick dump immediately when the DOM is ready
./lightpanda fetch --dump html --wait_until domcontentloaded https://example.com

Add `--wait_until` and `--wait_ms` CLI arguments to configure session wait behavior. Updates `Session.wait` to evaluate specific page load states (`load`, `domcontentloaded`, `networkidle`, `fixed`) before completing the wait loop.
@github-actions
Copy link

github-actions bot commented Mar 18, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@shaewe180
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

src/Config.zig Outdated
networkidle,
fixed,

pub const js_enum_from_string = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub const js_enum_from_string = true;

This is only needed for enums that are serialized to v8/js.

var page = &(self.page orelse return .no_page);
while (true) {
const wait_result = self._wait(page, wait_ms) catch |err| {
const wait_result = self._wait(&page, wait_ms, wait_until) catch |err| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason to pass a pointer to the pointer of the page.

Suggested change
const wait_result = self._wait(&page, wait_ms, wait_until) catch |err| {
const wait_result = self._wait(page, wait_ms, wait_until) catch |err| {

}

fn _wait(self: *Session, page: *Page, wait_ms: u32) !WaitResult {
fn _wait(self: *Session, page: **Page, wait_ms: u32, wait_until: lp.Config.WaitUntil) !WaitResult {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn _wait(self: *Session, page: **Page, wait_ms: u32, wait_until: lp.Config.WaitUntil) !WaitResult {
fn _wait(self: *Session, page: *Page, wait_ms: u32, wait_until: lp.Config.WaitUntil) !WaitResult {

related to the above comment. All of the page.X that were changed to page.*.X will have to be updated.

.fixed => false,
.domcontentloaded => (page.*._load_state == .load or page.*._load_state == .complete),
.load => (page.*._load_state == .complete),
.networkidle => (page.*._load_state == .complete and http_active == 0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using page._notified_network_idle == .done here? It would introduce a 500ms delay to allow pending work (e.g. a setTimeout) to start new connections. On the flip, that could result in some telemetry/beacon keep the page alive longer than desired.

with_base: bool = false,
with_frames: bool = false,
strip: dump.Opts.Strip = .{},
wait_ms: u32 = 5000,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the help text for these in the fetch section of the printUsageAndExit function?

- Refactor `wait` and `_wait` to handle `page` as `*Page` instead of `**Page`, preventing stale references during navigations.
- Update `networkidle` wait condition to use `_notified_network_idle == .done`.
- Document `--wait_ms` and `--wait_until` options in `Config.zig` help text.
@shaewe180
Copy link
Contributor Author

Thank you for your suggestion. I have made the modifications.

@shaewe180 shaewe180 requested a review from karlseguin March 19, 2026 01:54
karlseguin added a commit that referenced this pull request Mar 19, 2026
Small tweaks to #1896

Improve the wait ergonomics with an Option with default parameter. Revert
page pointer logic to original (don't think that change was necessary).
@karlseguin karlseguin merged commit e2be852 into lightpanda-io:main Mar 20, 2026
1 check passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants