Skip to content

Commit ab4f2ad

Browse files
committed
add human trajectories
1 parent 039f934 commit ab4f2ad

File tree

3 files changed

+14
-5
lines changed

3 files changed

+14
-5
lines changed

README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,9 @@
2121

2222
![Overview](media/overview.png)
2323

24-
## Roadmap
25-
- [ ] Support more agents with different prompting mechanisms such as [ASH](https://arxiv.org/pdf/2305.14257.pdf).
2624

2725
## News
26+
* [12/21/2023] We release the recording of trajectories performed by human annotators on ~170 tasks. Check out the [resource page](./resources/README.md#12212023-human-trajectories) for more details.
2827
* [11/3/2023] Multiple features!
2928
* Uploaded newest [execution trajectories](./resources/README.md#1132023-execution-traces-from-our-experiments-v2)
3029
* Added [Amazon Machine Image](./environment_docker/README.md#pre-installed-amazon-machine-image) that pre-installed all websites so that you don't have to!

resources/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,12 @@
11
# WebArena Resources
2+
## [12/21/2023] Human Trajectories
3+
We collected human trajectories on 179 tasks and the recording files are [here](https://drive.google.com/drive/folders/1NrN_sawtYK2V_uHnmmS8ugmGIKUAsPgt?usp=sharing).
4+
5+
We sample one task from each template or templates that share similar task semantic. Each file is named as `<task_id>.zip`, and the corresponding template id can be found in the [task config file](../config_files/test.raw.json). The trajectories are presented as playwright trace files. You can view the concrete HTML, network traffic etc by `playwright show-trace <example_idx>.zip`.
6+
7+
Human task success rate: 78.24%
8+
9+
210
## [11/3/2023] Execution Traces from Our Experiments (v2)
311
![v2 results](../media/v2_result.png)
412
The results on the release v2 can be found in this [folder](https://drive.google.com/drive/folders/1H4wkzDkY2ufiC63DISMXllri0j-ipWcs?usp=sharing). It contains

scripts/collect_obs.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222

2323
def gen_tmp_storage_state() -> None:
2424
with open(f"scripts/tmp_storage_state.json", "w") as f:
25-
json.dump({"storage_state": ".auth/gitlab_state.json"}, f)
25+
json.dump({"storage_state": ".auth/shopping_admin_state.json"}, f)
2626

2727

2828
def get_observation(
@@ -32,10 +32,12 @@ def get_observation(
3232
observation_type=observation_type,
3333
current_viewport_only=current_viewport_only,
3434
headless=HEADLESS,
35+
sleep_after_execution=2.0,
3536
)
3637
env.reset(options={"config_file": f"scripts/tmp_storage_state.json"})
37-
s = f"""page.goto("{GITLAB}/byteblaze/a11y-syntax-highlighting")
38-
page.scroll(down)
38+
s = f"""page.goto("http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:7780/admin/admin/dashboard/")
39+
page.get_by_label("", exact=True).fill("reviews")
40+
page.get_by_label("", exact=True).press("Enter")
3941
page.scroll(down)"""
4042
action_seq = s.split("\n")
4143

0 commit comments

Comments
 (0)