Skip to content

Commit c1f3487

Browse files
authored
Add openmanipulator simulation environment agent (#50)
1 parent 16ae437 commit c1f3487

32 files changed

+1187
-173
lines changed

Makefile

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
test:
2-
env PYTHONPATH=./scripts pytest --flake8 # --cov=algorithms
2+
env PYTHONPATH=./scripts pytest --flake8 --ignore=./scripts/envs # --cov=algorithms
33

44
format:
55
isort -y
6-
python3.6 -m black -t py27 .
6+
python3.6 -m black -t py27 . --fast
77

88
dev:
99
pip install -r scripts/requirements-dev.txt

README.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -23,19 +23,19 @@ The [scripts](/scripts) folder contains implementations of a curated list of RL
2323

2424
- Twin Delayed Deep Deterministic Policy Gradient (TD3)
2525
- TD3 (Fujimoto et al., 2018) is an extension of DDPG (Lillicrap et al., 2015), a deterministic policy gradient algorithm that uses deep neural networks for function approximation. Inspired by Deep Q-Networks (Mnih et al., 2015), DDPG uses experience replay and target network to improve stability. TD3 further improves DDPG by adding clipped double Q-learning (Van Hasselt, 2010) to mitigate overestimation bias (Thrun & Schwartz, 1993) and delaying policy updates to address variance.
26-
- [Example Script on LunarLander](/scripts/examples/lunarlander_continuous_v2/td3.py)
26+
- [Example Script on LunarLander](/scripts/config/agent/lunarlander_continuous_v2/td3.py)
2727
- [ArXiv Preprint](https://arxiv.org/abs/1802.09477)
2828

2929
- (Twin) Soft Actor Critic (SAC)
3030
- SAC (Haarnoja et al., 2018a) incorporates maximum entropy reinforcment learning, where the agent's goal is to maximize expected reward and entropy concurrently. Combined with TD3, SAC achieves state of the art performance in various continuous control tasks. SAC has been extended to allow automatically tuning of the temperature parameter (Haarnoja et al., 2018b), which determines the importance of entropy against the expected reward.
31-
- [Example Script on LunarLander](/scripts/examples/lunarlander_continuous_v2/sac.py)
31+
- [Example Script on LunarLander](/scripts/config/agent/lunarlander_continuous_v2/sac.py)
3232
- [ArXiv Preprint](https://arxiv.org/abs/1801.01290) (Original SAC)
3333
- [ArXiv Preprint](https://arxiv.org/abs/1812.05905) (SAC with autotuned temperature)
3434

3535
- TD3 from Demonstrations, SAC from Demonstrations (TD3fD, SACfD)
3636
- DDPGfD (Vecerik et al., 2017) is an imitation learning algorithm that infuses demonstration data into experience replay. DDPGfD also improved DDPG by (1) using prioritized experience replay (Schaul et al., 2015), (2) adding n-step returns, (3) learning multiple times per environment step, and (4) adding L2 regularizers to actor and critic losses. We incorporated these improvements to TD3 and SAC and found that it dramatically improves their performance.
37-
- [Example Script of TD3fD on LunarLander](/scripts/examples/lunarlander_continuous_v2/td3fd.py)
38-
- [Example Script of SACfD on LunarLander](/scripts/examples/lunarlander_continuous_v2/sacfd.py)
37+
- [Example Script of TD3fD on LunarLander](/scripts/config/agent/lunarlander_continuous_v2/td3fd.py)
38+
- [Example Script of SACfD on LunarLander](/scripts/config/agent/lunarlander_continuous_v2/sacfd.py)
3939
- [ArXiv Preprint](https://arxiv.org/abs/1707.08817)
4040

4141
## Installation

docker_train.sh

+3-2
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,10 @@ KAIR=$CATKIN_WS/src/kair_algorithms_draft
66

77
if [ "$1" == "lunarlander" ]; then
88
cd $KAIR/scripts; \
9-
python run_lunarlander_continuous.py --algo $2 --off-render
9+
python run_lunarlander_continuous.py --algo $2 --off-render
1010
elif [ "$1" == "openmanipulator" ]; then
11-
echo "Working"
11+
cd $KAIR/scripts; \
12+
/opt/ros/$ROS_DISTRO/bin/rosrun kair_algorithms run_open_manipulator_reacher_v0.py --algo $2 --off-render
1213
else
1314
echo "Unknown parameter"
1415
fi

launch/open_manipulator_env.launch

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
<?xml version="1.0"?>
2+
<launch>
3+
<!-- gazebo related args -->
4+
<arg name="paused" default="false"/>
5+
<arg name="use_sim_time" default="true"/>
6+
<arg name="gui" default="true"/>
7+
<arg name="headless" default="false"/>
8+
<arg name="debug" default="false"/>
9+
10+
<!-- rviz & tf related args -->
11+
<arg name="robot_name" default="open_manipulator"/>
12+
<arg name="open_rviz" default="false" />
13+
<arg name="use_gui" default="false" />
14+
15+
<!-- gazebo related -->
16+
<rosparam file="$(find open_manipulator_gazebo)/config/gazebo_controller.yaml" command="load" />
17+
<include file="$(find gazebo_ros)/launch/empty_world.launch">
18+
<arg name="world_name" value="$(find open_manipulator_gazebo)/worlds/empty.world"/>
19+
<arg name="debug" value="$(arg debug)" />
20+
<arg name="gui" value="$(arg gui)" />
21+
<arg name="paused" value="$(arg paused)"/>
22+
<arg name="use_sim_time" value="$(arg use_sim_time)"/>
23+
<arg name="headless" value="$(arg headless)"/>
24+
</include>
25+
26+
<!-- rviz related -->
27+
<!-- Send joint values -->
28+
<node pkg="joint_state_publisher" type="joint_state_publisher" name="joint_state_publisher">
29+
<param name="/use_gui" value="$(arg use_gui)"/>
30+
<rosparam param="source_list" subst_value="true">["$(arg robot_name)/joint_states"]</rosparam>
31+
</node>
32+
<!-- Combine joint values to TF-->
33+
<node name="robot_state_publisher" pkg="robot_state_publisher" type="state_publisher"/>
34+
35+
<!-- Show in Rviz -->
36+
<group if="$(arg open_rviz)">
37+
<node name="rviz" pkg="rviz" type="rviz" args="-d $(find open_manipulator_description)/rviz/open_manipulator.rviz"/>
38+
</group>
39+
40+
<!-- Load the URDF into the ROS Parameter Server -->
41+
<param name="robot_description"
42+
command="$(find xacro)/xacro --inorder '$(find open_manipulator_description)/urdf/open_manipulator.urdf.xacro'"/>
43+
44+
<!-- Run a python script to the send a service call to gazebo_ros to spawn a URDF robot -->
45+
<node name="urdf_spawner" pkg="gazebo_ros" type="spawn_model" respawn="false" output="screen"
46+
args="-urdf -model open_manipulator -z 0.0 -param robot_description"/>
47+
48+
<!-- ros_control robotis manipulator launch file -->
49+
<include file="$(find open_manipulator_gazebo)/launch/open_manipulator_controller.launch"/>
50+
</launch>

launch/vel_kinematics.launch

-13
This file was deleted.

launch/yumi_gazebo_vel.launch

-33
This file was deleted.

scripts/algorithms/common/abstract/agent.py

100644100755
+1-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ def __init__(self, env, args):
4444
self.args.max_episode_steps = env._max_episode_steps
4545

4646
# for logging
47-
self.env_name = str(self.env.env).split("<")[2].replace(">>", "")
47+
self.env_name = str(self.env.env).split("<")[1].replace(">>", "")
4848
self.sha = (
4949
subprocess.check_output(["git", "rev-parse", "--short", "HEAD"])[:-1]
5050
.decode("ascii")
File renamed without changes.

scripts/examples/lunarlander_continuous_v2/sac.py renamed to scripts/config/agent/lunarlander_continuous_v2/sac.py

+13-13
Original file line numberDiff line numberDiff line change
@@ -33,22 +33,28 @@
3333
"AUTO_ENTROPY_TUNING": True,
3434
"WEIGHT_DECAY": 0.0,
3535
"INITIAL_RANDOM_ACTION": 5000,
36+
"NETWORK": {
37+
"ACTOR_HIDDEN_SIZES": [256, 256],
38+
"VF_HIDDEN_SIZES": [256, 256],
39+
"QF_HIDDEN_SIZES": [256, 256],
40+
},
3641
}
3742

3843

39-
def run(env, args, state_dim, action_dim):
44+
def get(env, args):
4045
"""Run training or test.
4146
4247
Args:
4348
env (gym.Env): openAI Gym environment with continuous action space
4449
args (argparse.Namespace): arguments including training settings
45-
state_dim (int): dimension of states
46-
action_dim (int): dimension of actions
4750
4851
"""
49-
hidden_sizes_actor = [256, 256]
50-
hidden_sizes_vf = [256, 256]
51-
hidden_sizes_qf = [256, 256]
52+
state_dim = env.observation_space.shape[0]
53+
action_dim = env.action_space.shape[0]
54+
55+
hidden_sizes_actor = hyper_params["NETWORK"]["ACTOR_HIDDEN_SIZES"]
56+
hidden_sizes_vf = hyper_params["NETWORK"]["VF_HIDDEN_SIZES"]
57+
hidden_sizes_qf = hyper_params["NETWORK"]["QF_HIDDEN_SIZES"]
5258

5359
# target entropy
5460
target_entropy = -np.prod((action_dim,)).item() # heuristic
@@ -102,10 +108,4 @@ def run(env, args, state_dim, action_dim):
102108
optims = (actor_optim, vf_optim, qf_1_optim, qf_2_optim)
103109

104110
# create an agent
105-
agent = Agent(env, args, hyper_params, models, optims, target_entropy)
106-
107-
# run
108-
if args.test:
109-
agent.test()
110-
else:
111-
agent.train()
111+
return Agent(env, args, hyper_params, models, optims, target_entropy)

scripts/examples/lunarlander_continuous_v2/sacfd.py renamed to scripts/config/agent/lunarlander_continuous_v2/sacfd.py

+13-13
Original file line numberDiff line numberDiff line change
@@ -42,22 +42,28 @@
4242
"PER_EPS": 1e-6,
4343
"PER_EPS_DEMO": 1.0,
4444
"INITIAL_RANDOM_ACTION": int(5e3),
45+
"NETWORK": {
46+
"ACTOR_HIDDEN_SIZES": [256, 256],
47+
"VF_HIDDEN_SIZES": [256, 256],
48+
"QF_HIDDEN_SIZES": [256, 256],
49+
},
4550
}
4651

4752

48-
def run(env, args, state_dim, action_dim):
53+
def get(env, args):
4954
"""Run training or test.
5055
5156
Args:
5257
env (gym.Env): openAI Gym environment with continuous action space
5358
args (argparse.Namespace): arguments including training settings
54-
state_dim (int): dimension of states
55-
action_dim (int): dimension of actions
5659
5760
"""
58-
hidden_sizes_actor = [256, 256]
59-
hidden_sizes_vf = [256, 256]
60-
hidden_sizes_qf = [256, 256]
61+
state_dim = env.observation_space.shape[0]
62+
action_dim = env.action_space.shape[0]
63+
64+
hidden_sizes_actor = hyper_params["NETWORK"]["ACTOR_HIDDEN_SIZES"]
65+
hidden_sizes_vf = hyper_params["NETWORK"]["VF_HIDDEN_SIZES"]
66+
hidden_sizes_qf = hyper_params["NETWORK"]["QF_HIDDEN_SIZES"]
6167

6268
# target entropy
6369
target_entropy = -np.prod((action_dim,)).item() # heuristic
@@ -109,10 +115,4 @@ def run(env, args, state_dim, action_dim):
109115
optims = (actor_optim, vf_optim, qf_1_optim, qf_2_optim)
110116

111117
# create an agent
112-
agent = Agent(env, args, hyper_params, models, optims, target_entropy)
113-
114-
# run
115-
if args.test:
116-
agent.test()
117-
else:
118-
agent.train()
118+
return Agent(env, args, hyper_params, models, optims, target_entropy)

scripts/examples/lunarlander_continuous_v2/td3.py renamed to scripts/config/agent/lunarlander_continuous_v2/td3.py

+8-12
Original file line numberDiff line numberDiff line change
@@ -28,21 +28,23 @@
2828
"TARGET_POLICY_NOISE_CLIP": 0.5,
2929
"POLICY_UPDATE_FREQ": 2,
3030
"INITIAL_RANDOM_ACTIONS": 1e4,
31+
"NETWORK": {"ACTOR_HIDDEN_SIZES": [400, 300], "CRITIC_HIDDEN_SIZES": [400, 300]},
3132
}
3233

3334

34-
def run(env, args, state_dim, action_dim):
35+
def get(env, args):
3536
"""Run training or test.
3637
3738
Args:
3839
env (gym.Env): openAI Gym environment with continuous action space
3940
args (argparse.Namespace): arguments including training settings
40-
state_dim (int): dimension of states
41-
action_dim (int): dimension of actions
4241
4342
"""
44-
hidden_sizes_actor = [400, 300]
45-
hidden_sizes_critic = [400, 300]
43+
state_dim = env.observation_space.shape[0]
44+
action_dim = env.action_space.shape[0]
45+
46+
hidden_sizes_actor = hyper_params["NETWORK"]["ACTOR_HIDDEN_SIZES"]
47+
hidden_sizes_critic = hyper_params["NETWORK"]["CRITIC_HIDDEN_SIZES"]
4648

4749
# create actor
4850
actor = MLP(
@@ -123,10 +125,4 @@ def run(env, args, state_dim, action_dim):
123125
noises = (exploration_noise, target_policy_noise)
124126

125127
# create an agent
126-
agent = Agent(env, args, hyper_params, models, optims, noises)
127-
128-
# run
129-
if args.test:
130-
agent.test()
131-
else:
132-
agent.train()
128+
return Agent(env, args, hyper_params, models, optims, noises)

scripts/examples/lunarlander_continuous_v2/td3fd.py renamed to scripts/config/agent/lunarlander_continuous_v2/td3fd.py

+8-12
Original file line numberDiff line numberDiff line change
@@ -40,21 +40,23 @@
4040
"PER_BETA": 1.0,
4141
"PER_EPS": 1e-6,
4242
"PER_EPS_DEMO": 1.0,
43+
"NETWORK": {"ACTOR_HIDDEN_SIZES": [400, 300], "CRITIC_HIDDEN_SIZES": [400, 300]},
4344
}
4445

4546

46-
def run(env, args, state_dim, action_dim):
47+
def get(env, args):
4748
"""Run training or test.
4849
4950
Args:
5051
env (gym.Env): openAI Gym environment with continuous action space
5152
args (argparse.Namespace): arguments including training settings
52-
state_dim (int): dimension of states
53-
action_dim (int): dimension of actions
5453
5554
"""
56-
hidden_sizes_actor = [400, 300]
57-
hidden_sizes_critic = [400, 300]
55+
state_dim = env.observation_space.shape[0]
56+
action_dim = env.action_space.shape[0]
57+
58+
hidden_sizes_actor = hyper_params["NETWORK"]["ACTOR_HIDDEN_SIZES"]
59+
hidden_sizes_critic = hyper_params["NETWORK"]["CRITIC_HIDDEN_SIZES"]
5860

5961
# create actor
6062
actor = MLP(
@@ -135,10 +137,4 @@ def run(env, args, state_dim, action_dim):
135137
noises = (exploration_noise, target_policy_noise)
136138

137139
# create an agent
138-
agent = Agent(env, args, hyper_params, models, optims, noises)
139-
140-
# run
141-
if args.test:
142-
agent.test()
143-
else:
144-
agent.train()
140+
return Agent(env, args, hyper_params, models, optims, noises)

scripts/config/agent/open_manipulator_reacher_v0/__init__.py

Whitespace-only changes.

scripts/examples/reacher-v1/td3.py renamed to scripts/config/agent/open_manipulator_reacher_v0/td3.py

+8-12
Original file line numberDiff line numberDiff line change
@@ -28,21 +28,23 @@
2828
"TARGET_POLICY_NOISE_CLIP": 0.5,
2929
"POLICY_UPDATE_FREQ": 2,
3030
"INITIAL_RANDOM_ACTIONS": 1e4,
31+
"NETWORK": {"ACTOR_HIDDEN_SIZES": [400, 300], "CRITIC_HIDDEN_SIZES": [400, 300]},
3132
}
3233

3334

34-
def run(env, args, state_dim, action_dim):
35+
def get(env, args):
3536
"""Run training or test.
3637
3738
Args:
3839
env (gym.Env): openAI Gym environment with continuous action space
3940
args (argparse.Namespace): arguments including training settings
40-
state_dim (int): dimension of states
41-
action_dim (int): dimension of actions
4241
4342
"""
44-
hidden_sizes_actor = [400, 300]
45-
hidden_sizes_critic = [400, 300]
43+
state_dim = env.observation_space.shape[0]
44+
action_dim = env.action_space.shape[0]
45+
46+
hidden_sizes_actor = hyper_params["NETWORK"]["ACTOR_HIDDEN_SIZES"]
47+
hidden_sizes_critic = hyper_params["NETWORK"]["CRITIC_HIDDEN_SIZES"]
4648

4749
# create actor
4850
actor = MLP(
@@ -123,10 +125,4 @@ def run(env, args, state_dim, action_dim):
123125
noises = (exploration_noise, target_policy_noise)
124126

125127
# create an agent
126-
agent = Agent(env, args, hyper_params, models, optims, noises)
127-
128-
# run
129-
if args.test:
130-
agent.test()
131-
else:
132-
agent.train()
128+
return Agent(env, args, hyper_params, models, optims, noises)

scripts/config/agent/reacher-v1/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)