feat: O3DE test benchmark #426

jmatejcz · 2025-02-19T08:16:41Z

Purpose

Add Test Benchmark for O3DE simulation
Add framework for easier benchmark creation

Proposed Changes

Majority of file changes are only formatting changes, main changes are in src/rai_bench package and src/rai_sim

This PR contains new package rai_bench. It introduces classes like Task, Scenario and Benchmark(src/rai_bench/rai_bench/benchmark_model.py), which are frame for creating benchmarks and it also contains O3DE Test Benchmark which is a test implementation of a benchmark for O3DE engine simulation.

O3DE Test Benchmark (src/rai_bench/rai_bench/o3de_test_bench/), contains 2 Tasks(tasks/) - GrabCarrotTask and PlaceCubesTask (these tasks implement calculating scores) and 4 scenes(configs/) for O3DE robotic arm simulation.

Example of how can scenarios and benchmark be defined and run can be found in rai_bench/rai_bench/main.py

Additional changes:

Subclass of Bridge for Arm Manipulation-> O3DEngineArmManipulationBridge
validation if required services are available -> _is_robotic_stack_ready() function in SimulationBridge
fix a bug when get_transform returned same values every time-> ROS2ARIConnector.get_transform returns the same object positions every time #430

Issues

Testing

the binary to simulation can be downloaded from here: humble

To run this benchmark you need o3de bridge config file in src/rai_bench/rai_bench/o3de_test_bench/configs/o3de_config.yaml, this config file should include fields:

binary_path: ...
robotic_stack_command: ros2 launch examples/manipulation-demo-no-binary.launch.py
required_services:
  - /grounding_dino_classify
  - /grounded_sam_segment
  - /manipulator_move_to
  - /spawn_entity
  - /delete_entity
required_topics:
  - /color_image5
  - /depth_image5
  - /color_camera_info5
required_actions: []

to install and run:

poetry install --with openset
colcon build --symlink-install
source setup_shell.sh
python src/rai_bench/rai_bench/main.py

Logs about the progess of benchmark can be found in 'src/rai_bench/o3de_test_bench/benchmark_agent.log'

You should be able to see:

properly started simulation
4 defined scenarios
logs about the actions taken by agent and scores achieved in each scenario
After finishing all scenarios, simulation should close and all nodes and services should also shutdown after no more than 30 seconds

rachwalk

Overall good job, I have left a few comments regarding code structure, and I am unsure if all the committed files should be included.

imgui.ini

src/rai_bench/README.md

src/rai_bench/rai_bench/benchmark_model.py

rachwalk · 2025-02-19T15:17:02Z

src/rai_bench/o3de_test_bench/main.py

+from pathlib import Path
+
+
+class GrabCarrotTask(Task):


It would be a good idea to create a module with Tasks so they can be imported into different benchmark runtimes.

Do you mean to create a separate folder still inside o3de_test_bench? or to define specific tasks in rai_bench package?

Yes I mean create a separate folder with init.py (i.e. a submodule) inside the o3de_test_bench

i moved tasks to separate folder, also moved whole o3de_test_branch to rai_bench package so the benchamark/ tasks can be imported across project. main.py to setup and run benchmark moved to src/rai_bench/main.py, so still outside package

src/rai_bench/o3de_test_bench/scene1.yaml

src/rai_bench/o3de_test_bench/o3de_config.yaml

rachwalk · 2025-02-19T15:18:11Z

src/rai_bench/o3de_test_bench/main.py

+    one_carrot_scene_config = O3DExROS2SimulationConfig.load_config(
+        base_config_path=Path("src/rai_bench/o3de_test_bench/scene1.yaml"),
+        connector_config_path=Path("src/rai_bench/o3de_test_bench/o3de_config.yaml"),
+    )
+    multiple_carrot_scene_config = O3DExROS2SimulationConfig.load_config(
+        base_config_path=Path("src/rai_bench/o3de_test_bench/scene2.yaml"),
+        connector_config_path=Path("src/rai_bench/o3de_test_bench/o3de_config.yaml"),
+    )
+    red_cubes_scene_config = O3DExROS2SimulationConfig.load_config(
+        base_config_path=Path("src/rai_bench/o3de_test_bench/scene3.yaml"),
+        connector_config_path=Path("src/rai_bench/o3de_test_bench/o3de_config.yaml"),
+    )
+    multiple_cubes_scene_config = O3DExROS2SimulationConfig.load_config(
+        base_config_path=Path("src/rai_bench/o3de_test_bench/scene4.yaml"),
+        connector_config_path=Path("src/rai_bench/o3de_test_bench/o3de_config.yaml"),
+    )
+    # combine different scene configs with the tasks to create various scenarios
+    scenarios = [
+        Scenario(
+            task=GrabCarrotTask(logger=bench_logger),
+            scene_config=one_carrot_scene_config,
+        ),
+        Scenario(
+            task=GrabCarrotTask(logger=bench_logger),
+            scene_config=multiple_carrot_scene_config,
+        ),
+        Scenario(
+            task=GrabCarrotTask(logger=bench_logger),
+            scene_config=red_cubes_scene_config,
+        ),
+        Scenario(
+            task=RedCubesTask(logger=bench_logger), scene_config=red_cubes_scene_config
+        ),
+        Scenario(
+            task=RedCubesTask(logger=bench_logger),
+            scene_config=multiple_cubes_scene_config,
+        ),
+    ]


It would be a good idea to have these load dynamically to allow for easy configuration of scenes/tasks.

do you mean loading from cmd arguments, from config file, or what?

I mean a functionality to auto create scenarios given a list of Tasks and a list of scene paths

MagdalenaKotynia

Thank you for this PR. I have left some comments.
Apart from that:

Please apply new naming convention for simulation connectors (see feat: rai_sim #415, SimulationConnector was renamed to SimulationBridge to avoid confusion with the BaseConnector and its subclasses.
Please fix typing where possible

Ideas for the future improvement:

I think it would be very beneficial to save output from camera (from the whole scenario or at least beginning and end of the scenario) to be able to verify metrics with human assessment. It also could be useful for presenting and reporting the results.
When the final implementation of benchmark interface will be established, the next step could be to design and implement some structure for saving and storing the results (results + info about what was benchmarked + camera images + metadata).

MagdalenaKotynia · 2025-02-19T16:32:04Z

src/rai_bench/rai_bench/benchmark_model.py

+        super().__init__(message)
+
+
+class Task(ABC, Generic[SimulationConnectorT]):


Task class should not be bound with any specific SimulationBridge, it should be reused with different implementations of SimulationBridge. Please apply appropriate typing for this class and the calculate_result method.

Okay so i will modify the Task to take SimulationConfig as argument, and from this i will extract initial_positions of objects.

Second thing is - as spawned_entities are unique to O3De , methods of SimulaitonBride should return Entities, not spawnedEntities

I think that Task does not need to take SimulationConfig as argument - Scenario uses SimulationConfig and bounds it with Task (btw please rename scene_config to simulation_config according to the changes in naming convention). Task itself should be SimulationConfig-agnostic, current interface is ok, just typing is inappropriate.

Second, yes, good point. I noticed it yesterday when reviewing your PR and I decided to move spawned_entities to base SimulationBridge class because it is needed for all simulations if we want to compare them. Please rebase to these changes.

i refactored it to match your new code, task now uses spawned_entities attribute from Bridges

src/rai_bench/rai_bench/benchmark_model.py

src/rai_bench/o3de_test_bench/main.py

MagdalenaKotynia · 2025-02-19T18:59:43Z

src/rai_bench/o3de_test_bench/main.py

+                "Number of initially spawned entities does not match number of entities present at the end."
+            )
+
+        for ini_carrot in initial_carrots:


This check is strictly matching our simulation where we know how the table is located with respect to the origin. Maybe it's overkill for now but I would consider parametrizing it somehow. Maybe the calculate_result method should take some kwargs to be able to reuse the task for simulations with different parameters key for computing result (e.g. in this case the info about coordinates and size of the table top).

I don't know how the cooridinates work in other simulation engines, so it's difficult to make it work. I think for now leave it as this is, maybe in future when we will work with different engines, I will take this into cosideration

I didn't mean other simulation engines but e.g. another o3de robotic arm simulation where the table is located in another way with respect to origin than the current simulation we use, e.g. (x,y) = 0 is not on the middle of the table as it is assumed in this example. I think we can leave it for now, general differentiating where is left and right is quite hard problem IMO, please just leave a NOTE with info that it is example for particular demo.

src/rai_bench/o3de_test_bench/main.py

jmatejcz · 2025-02-20T07:31:59Z

Overall good job, I have left a few comments regarding code structure, and I am unsure if all the committed files should be included.

the additiional files propably come from rebases, but you are right, they are not part of this PR, so I will remove them

jmatejcz · 2025-02-20T09:09:07Z

add new occuring issue to the main comment of this PR

ros2/rclpy#1142

``` Exception in thread Thread-1 (spin): Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 294, in spin self.spin_once() File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 794, in spin_once self._spin_once_impl(timeout_sec) File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 786, in _spin_once_impl self._executor.submit(handler) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 167, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ```

Co-authored-by: Jakub Matejczyk <[email protected]> chore: naming changes Co-authored-by: Jakub Matejczyk <[email protected]> chore: naming changes Co-authored-by: Jakub Matejczyk <[email protected]> chore: redefine benchmark model Co-authored-by: Jakub Matejczyk <[email protected]> Co-authored-by: Kacper Dąbrowski <[email protected]> refactor: SceneConfig to BaseModel class feat: add SceneSetup to store initial scene setup build: poetry initialization of rai_benchmarks and rai_simulations chore: add licence lines build: create packages from rai_benchmarks and rai_simulations chore: removed mistakenly added file feat: scene config implementation Add O3DEEngineConnector features Signed-off-by: Kacper Dąbrowski <[email protected]> Remove an unused file Signed-off-by: Kacper Dąbrowski <[email protected]> Add binary path caching Signed-off-by: Kacper Dąbrowski <[email protected]> Add two example scenes Signed-off-by: Kacper Dąbrowski <[email protected]> feat: replace binary path with ros2 launch command + binary path refactor: renamed rai_sim and rai_bench fix: fixed shutdown of binary chore: make pose mandatory chore: remove rai_bench because it is developed on another branch ci: add missing license lines laoding and spawning frm benchmark rabsed naming change grab xyz benchmark benchmarks run new sim every scenario for now

See this issue: giampaolo/psutil#2437

jmatejcz · 2025-02-20T18:23:11Z

found new issue with getting object position from simulation, part of it was due to changes introduced in feat/benchmarking , resolved here: 702bb00

Now the transform is good, but the position of object is always same as starting position, even when object was moved on simulation

jmatejcz · 2025-02-21T11:58:28Z

found new issue with getting object position from simulation, part of it was due to changes introduced in feat/benchmarking , resolved here: 702bb00

Now the transform is good, but the position of object is always same as starting position, even when object was moved on simulation

fixed here: 871f707

rachwalk

When attempting to run the according to PR description I get an error:

Traceback (most recent call last):
  File "/home/krachwal/projects/internal/rai/src/rai_bench/rai_bench/main.py", line 21, in <module>
    from rai_bench.benchmark_model import (
ModuleNotFoundError: No module named 'rai_bench'

It seems that poetry install doesn't install the new package.

jmatejcz · 2025-02-22T17:35:57Z

When attempting to run the according to PR description I get an error:
Traceback (most recent call last):
  File "/home/krachwal/projects/internal/rai/src/rai_bench/rai_bench/main.py", line 21, in <module>
    from rai_bench.benchmark_model import (
ModuleNotFoundError: No module named 'rai_bench'
It seems that poetry install doesn't install the new package.

added package to pyproject.toml

…nch package

task defined more clearly

This fixes bug when get_transform returns always the same, first transform Add logs to track retrieved positions

…e available

added log when all scenarios finished

refactored scenario to inlucde config path so it can be logged easily

rachwalk

LGTM

Co-authored-by: Maciej Majek <[email protected]> Co-authored-by: Bartłomiej Boczek <[email protected]> Co-authored-by: MagdalenaKotynia <[email protected]>

jmatejcz marked this pull request as ready for review February 19, 2025 13:51

jmatejcz requested a review from rachwalk February 19, 2025 13:51

rachwalk requested changes Feb 19, 2025

View reviewed changes

MagdalenaKotynia requested changes Feb 19, 2025

View reviewed changes

jmatejcz force-pushed the jm/feat/benchmark-metrics branch from 28b2a5b to ae2b33d Compare February 20, 2025 10:53

jmatejcz requested review from MagdalenaKotynia and rachwalk February 20, 2025 15:00

maciejmajek and others added 16 commits February 20, 2025 18:08

refactor(partial): refactor openset tools to rai2.0 ros2ariconnector api

9b3c256

refactor: finalize with simplifications

b1ba1cf

fix: missing argument, missing client

647b14d

workaround: don't remove subscription due to rclpy issue

47edd91

ros2/rclpy#1142

replace ros2 future waiting with more robust callback based mechanism

6a9431e

remove waiting from launch

7c6e84c

refactor: cleanup manipulation launchfile

3d6ad0c

chore: pre-commit

b8eba34

adjust to new tools

5a4467e

Remove psutil conflicting __del__ method from o3de connector

aca52dd

See this issue: giampaolo/psutil#2437

stream langgraph agent

51cc62a

extent output of tool runner

e69626c

enable all scenarios in the benchmark

5ef20f2

jmatejcz force-pushed the jm/feat/benchmark-metrics branch from 903906b to a33e500 Compare February 20, 2025 18:23

rachwalk requested changes Feb 21, 2025

View reviewed changes

jmatejcz added 18 commits February 25, 2025 13:39

fix: adjust args passing in calculate_results

ceef23a

style: moved tasks to separate folder, moved o3de benchmark to rai_be…

afe9112

…nch package

feat: add validation if scene is suitable for task

a8eac92

feat: add way to automatically create scenarios

c1b4c75

feat: variables in score calculations rename to better suit purpouse

6017193

task defined more clearly

feat: add storing results

7695a9a

chore: add license

e1d76d1

partially fixed getting position bug

6fe1ab3

fix: move tf_listener to constructor in ROS2ARIConnector

323f3ed

This fixes bug when get_transform returns always the same, first transform Add logs to track retrieved positions

fix: fix calculations with adjacent objects in Tasks

23374f5

feat: add counting number of tool calls

829617a

chore: delete o3de config, remove from gitignore

a4b9760

fix: update pyproject toml to include rai bench

84d922a

formatting changes

abfa97e

feat: added verification if required services , toipcs and actions ar…

09206e5

…e available

refactor: funtions naming, typing and rotation of arm change

4f11d20

style: change of formatting

285817b

docs: extended readme

82ed909

jmatejcz force-pushed the jm/feat/benchmark-metrics branch from e5f93ad to 82ed909 Compare February 25, 2025 12:48

MagdalenaKotynia and others added 5 commits February 25, 2025 14:03

fix: fix of wrong generic classes typing

458f882

style: delete unsed genric

b38d82d

refactor: parametrized number of retries in checking robotic stack

fd19905

style: add logs about how many scenarios left

16ae42a

added log when all scenarios finished

feat: add dumping results to file

2172967

refactored scenario to inlucde config path so it can be logged easily

rachwalk approved these changes Feb 25, 2025

View reviewed changes

rachwalk merged commit d4ac769 into feat/benchmarking Feb 25, 2025
2 checks passed

rachwalk deleted the jm/feat/benchmark-metrics branch February 25, 2025 14:47

maciejmajek added a commit that referenced this pull request Feb 25, 2025

feat: O3DE test benchmark (#426)

db30465

Co-authored-by: Maciej Majek <[email protected]> Co-authored-by: Bartłomiej Boczek <[email protected]> Co-authored-by: MagdalenaKotynia <[email protected]>

maciejmajek restored the jm/feat/benchmark-metrics branch February 25, 2025 18:56

jmatejcz mentioned this pull request Feb 26, 2025

feat: rai_bench #436

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: O3DE test benchmark #426

feat: O3DE test benchmark #426

jmatejcz commented Feb 19, 2025 •

edited

Loading

rachwalk left a comment

rachwalk Feb 19, 2025

jmatejcz Feb 20, 2025 •

edited

Loading

rachwalk Feb 20, 2025

jmatejcz Feb 20, 2025

rachwalk Feb 19, 2025

jmatejcz Feb 20, 2025

rachwalk Feb 20, 2025

jmatejcz Feb 20, 2025

MagdalenaKotynia left a comment •

edited

Loading

MagdalenaKotynia Feb 19, 2025

jmatejcz Feb 20, 2025

jmatejcz Feb 20, 2025

MagdalenaKotynia Feb 20, 2025

jmatejcz Feb 20, 2025

MagdalenaKotynia Feb 19, 2025

jmatejcz Feb 20, 2025

MagdalenaKotynia Feb 20, 2025

jmatejcz Feb 20, 2025

jmatejcz commented Feb 20, 2025

jmatejcz commented Feb 20, 2025

jmatejcz commented Feb 20, 2025

jmatejcz commented Feb 21, 2025

rachwalk left a comment

jmatejcz commented Feb 22, 2025

rachwalk left a comment

		super().__init__(message)


		class Task(ABC, Generic[SimulationConnectorT]):

feat: O3DE test benchmark #426

feat: O3DE test benchmark #426

Conversation

jmatejcz commented Feb 19, 2025 • edited Loading

Purpose

Proposed Changes

Issues

Testing

rachwalk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmatejcz Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MagdalenaKotynia left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmatejcz commented Feb 20, 2025

jmatejcz commented Feb 20, 2025

jmatejcz commented Feb 20, 2025

jmatejcz commented Feb 21, 2025

rachwalk left a comment

Choose a reason for hiding this comment

jmatejcz commented Feb 22, 2025

rachwalk left a comment

Choose a reason for hiding this comment

jmatejcz commented Feb 19, 2025 •

edited

Loading

jmatejcz Feb 20, 2025 •

edited

Loading

MagdalenaKotynia left a comment •

edited

Loading