Skip to content

Commit c1f3487

Browse files
authored
Add openmanipulator simulation environment agent (#50)
1 parent 16ae437 commit c1f3487

32 files changed

+1187
-173
lines changed

Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
test:
2-
env PYTHONPATH=./scripts pytest --flake8 # --cov=algorithms
2+
env PYTHONPATH=./scripts pytest --flake8 --ignore=./scripts/envs # --cov=algorithms
33

44
format:
55
isort -y
6-
python3.6 -m black -t py27 .
6+
python3.6 -m black -t py27 . --fast
77

88
dev:
99
pip install -r scripts/requirements-dev.txt

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,19 +23,19 @@ The [scripts](/scripts) folder contains implementations of a curated list of RL
2323

2424
- Twin Delayed Deep Deterministic Policy Gradient (TD3)
2525
- TD3 (Fujimoto et al., 2018) is an extension of DDPG (Lillicrap et al., 2015), a deterministic policy gradient algorithm that uses deep neural networks for function approximation. Inspired by Deep Q-Networks (Mnih et al., 2015), DDPG uses experience replay and target network to improve stability. TD3 further improves DDPG by adding clipped double Q-learning (Van Hasselt, 2010) to mitigate overestimation bias (Thrun & Schwartz, 1993) and delaying policy updates to address variance.
26-
- [Example Script on LunarLander](/scripts/examples/lunarlander_continuous_v2/td3.py)
26+
- [Example Script on LunarLander](/scripts/config/agent/lunarlander_continuous_v2/td3.py)
2727
- [ArXiv Preprint](https://arxiv.org/abs/1802.09477)
2828

2929
- (Twin) Soft Actor Critic (SAC)
3030
- SAC (Haarnoja et al., 2018a) incorporates maximum entropy reinforcment learning, where the agent's goal is to maximize expected reward and entropy concurrently. Combined with TD3, SAC achieves state of the art performance in various continuous control tasks. SAC has been extended to allow automatically tuning of the temperature parameter (Haarnoja et al., 2018b), which determines the importance of entropy against the expected reward.
31-
- [Example Script on LunarLander](/scripts/examples/lunarlander_continuous_v2/sac.py)
31+
- [Example Script on LunarLander](/scripts/config/agent/lunarlander_continuous_v2/sac.py)
3232
- [ArXiv Preprint](https://arxiv.org/abs/1801.01290) (Original SAC)
3333
- [ArXiv Preprint](https://arxiv.org/abs/1812.05905) (SAC with autotuned temperature)
3434

3535
- TD3 from Demonstrations, SAC from Demonstrations (TD3fD, SACfD)
3636
- DDPGfD (Vecerik et al., 2017) is an imitation learning algorithm that infuses demonstration data into experience replay. DDPGfD also improved DDPG by (1) using prioritized experience replay (Schaul et al., 2015), (2) adding n-step returns, (3) learning multiple times per environment step, and (4) adding L2 regularizers to actor and critic losses. We incorporated these improvements to TD3 and SAC and found that it dramatically improves their performance.
37-
- [Example Script of TD3fD on LunarLander](/scripts/examples/lunarlander_continuous_v2/td3fd.py)
38-
- [Example Script of SACfD on LunarLander](/scripts/examples/lunarlander_continuous_v2/sacfd.py)
37+
- [Example Script of TD3fD on LunarLander](/scripts/config/agent/lunarlander_continuous_v2/td3fd.py)
38+
- [Example Script of SACfD on LunarLander](/scripts/config/agent/lunarlander_continuous_v2/sacfd.py)
3939
- [ArXiv Preprint](https://arxiv.org/abs/1707.08817)
4040

4141
## Installation

docker_train.sh

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,10 @@ KAIR=$CATKIN_WS/src/kair_algorithms_draft
66

77
if [ "$1" == "lunarlander" ]; then
88
cd $KAIR/scripts; \
9-
python run_lunarlander_continuous.py --algo $2 --off-render
9+
python run_lunarlander_continuous.py --algo $2 --off-render
1010
elif [ "$1" == "openmanipulator" ]; then
11-
echo "Working"
11+
cd $KAIR/scripts; \
12+
/opt/ros/$ROS_DISTRO/bin/rosrun kair_algorithms run_open_manipulator_reacher_v0.py --algo $2 --off-render
1213
else
1314
echo "Unknown parameter"
1415
fi

launch/open_manipulator_env.launch

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
<?xml version="1.0"?>
2+
<launch>
3+
<!-- gazebo related args -->
4+
<arg name="paused" default="false"/>
5+
<arg name="use_sim_time" default="true"/>
6+
<arg name="gui" default="true"/>
7+
<arg name="headless" default="false"/>
8+
<arg name="debug" default="false"/>
9+
10+
<!-- rviz & tf related args -->
11+
<arg name="robot_name" default="open_manipulator"/>
12+
<arg name="open_rviz" default="false" />
13+
<arg name="use_gui" default="false" />
14+
15+
<!-- gazebo related -->
16+
<rosparam file="$(find open_manipulator_gazebo)/config/gazebo_controller.yaml" command="load" />
17+
<include file="$(find gazebo_ros)/launch/empty_world.launch">
18+
<arg name="world_name" value="$(find open_manipulator_gazebo)/worlds/empty.world"/>
19+
<arg name="debug" value="$(arg debug)" />
20+
<arg name="gui" value="$(arg gui)" />
21+
<arg name="paused" value="$(arg paused)"/>
22+
<arg name="use_sim_time" value="$(arg use_sim_time)"/>
23+
<arg name="headless" value="$(arg headless)"/>
24+
</include>
25+
26+
<!-- rviz related -->
27+
<!-- Send joint values -->
28+
<node pkg="joint_state_publisher" type="joint_state_publisher" name="joint_state_publisher">
29+
<param name="/use_gui" value="$(arg use_gui)"/>
30+
<rosparam param="source_list" subst_value="true">["$(arg robot_name)/joint_states"]</rosparam>
31+
</node>
32+
<!-- Combine joint values to TF-->
33+
<node name="robot_state_publisher" pkg="robot_state_publisher" type="state_publisher"/>
34+
35+
<!-- Show in Rviz -->
36+
<group if="$(arg open_rviz)">
37+
<node name="rviz" pkg="rviz" type="rviz" args="-d $(find open_manipulator_description)/rviz/open_manipulator.rviz"/>
38+
</group>
39+
40+
<!-- Load the URDF into the ROS Parameter Server -->
41+
<param name="robot_description"
42+
command="$(find xacro)/xacro --inorder '$(find open_manipulator_description)/urdf/open_manipulator.urdf.xacro'"/>
43+
44+
<!-- Run a python script to the send a service call to gazebo_ros to spawn a URDF robot -->
45+
<node name="urdf_spawner" pkg="gazebo_ros" type="spawn_model" respawn="false" output="screen"
46+
args="-urdf -model open_manipulator -z 0.0 -param robot_description"/>
47+
48+
<!-- ros_control robotis manipulator launch file -->
49+
<include file="$(find open_manipulator_gazebo)/launch/open_manipulator_controller.launch"/>
50+
</launch>

launch/vel_kinematics.launch

Lines changed: 0 additions & 13 deletions
This file was deleted.

launch/yumi_gazebo_vel.launch

Lines changed: 0 additions & 33 deletions
This file was deleted.

scripts/algorithms/common/abstract/agent.py

100644100755
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ def __init__(self, env, args):
4444
self.args.max_episode_steps = env._max_episode_steps
4545

4646
# for logging
47-
self.env_name = str(self.env.env).split("<")[2].replace(">>", "")
47+
self.env_name = str(self.env.env).split("<")[1].replace(">>", "")
4848
self.sha = (
4949
subprocess.check_output(["git", "rev-parse", "--short", "HEAD"])[:-1]
5050
.decode("ascii")
File renamed without changes.

0 commit comments

Comments
 (0)