Skip to content

feat: support alerts across all drivers and languages#295

Open
ABNclearroute wants to merge 1 commit into
alumnium-hq:mainfrom
ABNclearroute:feat/support-alerts
Open

feat: support alerts across all drivers and languages#295
ABNclearroute wants to merge 1 commit into
alumnium-hq:mainfrom
ABNclearroute:feat/support-alerts

Conversation

@ABNclearroute
Copy link
Copy Markdown

@ABNclearroute ABNclearroute commented Apr 7, 2026

Summary

Resolves #209

  • Detect alerts: Selenium checks switch_to.alert; Playwright captures dialogs via page.on('dialog'); Appium handles WebView alerts via switch_to.alert
  • Inject synthetic nodes: When an alert is present, synthetic alertdialog, OK (accept), and Cancel (dismiss) button nodes are added to the accessibility tree so the LLM can see and interact with them
  • Route clicks to alert API: When the LLM clicks an alert button, the driver intercepts it and calls alert.accept() or alert.dismiss() instead of trying to find a DOM element

Driver-specific behavior

Driver Behavior
Selenium Alert stays open. LLM sees buttons in the tree and clicks to accept/dismiss.
Playwright Dialog is auto-accepted (required by Playwright API). LLM sees captured dialog info in the next tree for assertion purposes.
Appium Native alerts are already in the page source. WebView alerts are handled via switch_to.alert.

Files changed (11)

Python (6 files):

  • accessibility_element.py — added alert_action field
  • chromium_accessibility_tree.py — store/read _alert_action in XML tree
  • selenium_driver.py — alert detection, synthetic nodes, click routing, resilient tab-switch decorator
  • playwright_driver.py — dialog handler, captured dialog nodes, click acknowledgement
  • playwright_async_driver.py — same as sync, with async-safe dialog acceptance
  • appium_driver.py — basic alert routing in click

TypeScript (5 files):

  • AccessibilityElement.ts — added alertAction field
  • ChromiumAccessibilityTree.ts — store/read _alert_action in XML tree
  • SeleniumDriver.ts — alert detection, synthetic nodes, click routing, resilient decorator
  • PlaywrightDriver.ts — dialog handler, captured dialog nodes, click acknowledgement
  • AppiumDriver.ts — basic alert routing in click

Test plan

  • Selenium: Trigger alert() → verify LLM sees alertdialog + OK/Cancel → click OK → alert dismissed
  • Selenium: Trigger confirm() → click Cancel → confirm returns false
  • Playwright: Trigger alert() → verify dialog info appears in tree → LLM can assert on text
  • Playwright: Trigger confirm() → verify auto-accept behavior
  • Appium: WebView alert → verify accept/dismiss routing
  • Verify normal (non-alert) click/type operations remain unaffected

@p0deje
Copy link
Copy Markdown
Contributor

p0deje commented Apr 9, 2026

Thank you for the PR. A few questions came up as I was looking through it:

  1. I think the decision to accept/dismiss the alert should be on the user. The idea was to provide alerts as synthetic nodes in the accessibility tree tools in the actor agent that can be used by a user. This would allow both to inspect the alert with get/check and handle it with do:
al.do("click on the button that triggers alert")
# accessibility tree now has
# <alert id="1">
#   <button id="2" text="Accept" />
#   <button id="3" text="Dismiss" />
# </alert>
al.check("alert is shown")
al.do("accept alert")  # clicks button with id=2 -> accepts alert
  1. Is Selenium fully blocked when an alert is present? If not, maybe we can add a driver.alert method that returns alert properties and use it when constructing a synthetic node.

@ABNclearroute
Copy link
Copy Markdown
Author

@p0deje Thanks for the feedback! Here's what I've changed:

Selenium — CDP commands work even with an alert open, so we now return the full page tree + alert nodes appended at the end. The alert stays open for the user to accept/dismiss.

Playwright — Auto-accept is unavoidable (Playwright freezes if the dialog handler doesn't resolve immediately). Added clear docstrings explaining this constraint. Dialog info is still exposed in the tree for check()/get().

Node structure — Buttons are now children of alertdialog via childIds:
<alertdialog name="Alert message"> <button name="Accept" /> <button name="Dismiss" /> </alertdialog>

@p0deje
Copy link
Copy Markdown
Contributor

p0deje commented Apr 12, 2026

We've just merged a rewrite of the core into TypeScript - can you please rebase your PR?

Let's also add some tests both in Python and TypeScript. Something simple where Alumnium clicks a button to trigger alert, then checks for alert presence and finally closes it.

- Selenium: detect alerts via WebDriver, get full page tree via CDP
  (which works even with alert open), and append synthetic alertdialog
  nodes with Accept/Dismiss buttons as children
- Playwright: auto-accept dialogs (required by Playwright API to prevent
  page freeze), capture dialog info, and expose as synthetic nodes for
  check()/get() verification
- Add alert_action/alertAction to AccessibilityElement and handle in
  ChromiumAccessibilityTree for both Python and TypeScript
- Route click() on alert buttons to accept/dismiss the alert
- Add test HTML fixture and test cases for both Python and TypeScript

Closes alumnium-hq#209

Made-with: Cursor
current_handles = self.driver.window_handles
try:
current_handles = self.driver.window_handles
except Exception:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a concrete exception error (e.g. UnexpectedAlertPresentException)

new_handles = self.driver.window_handles
try:
new_handles = self.driver.window_handles
except Exception:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about UnexpectedAlertPresentException.

try:
ActionChains(self.driver).move_to_element(element).click().perform()
except ElementNotInteractableException:
# Fallback to direct click if ActionChains fails (e.g. for <option> elements)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert formatting changes!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert formatting changes!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert formatting changes!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests need to be disabled for Appium, please see other test to check how it can be done.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests need to be disabled for Appium, please see other test to check how it can be done.

self.autoswitch_to_new_tab = True
self.full_page_screenshot = FULL_PAGE_SCREENSHOT
self._last_dialog_info: dict | None = None
self.page.on("dialog", self._on_dialog)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be applied to all new pages - let's move to _setup_page_tracking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support alerts

2 participants