- 1.File Structure
- 2. Default hyper-parameters in gym.make()
- 3. The functions in class of 'env'
- 4. env.step()
main ├── README.md ├── drp_env # the directory for drp challenge environment │ ├── __init__.py │ ├── drp_env.py │ ├── EE_map.py │ ├── map │ └── state_repre ├── problem │ └── problems.py # 30 problems are fixed for evaluation ├──policy_tester.py # test your developed policy # feel free to customize this file ├──policy # your workspace │ └── policy.py # your development └── calculate_cost.py # output evaluation result in a json file
Although drp env can be easily constructed by the following codes, there are other hyper-parameters about env can be customized.
env = gym.make(
"drp_env:drp-" + str(agent_num) + "agent_" + map_name + "-v2",
state_repre_flag="onehot_fov",
reward_list=reward_list,
goal_array=goal,
start_ori_array=start,
)
You can be free to alter the following hyper-parameters in your development, but we will keep the default values to evaluate
-
speed
: Represents the distance of moving in one step (all drones have same speeds and default value is 5). -
start_ori_array
: Starting positions. If not specified (start_ori_array = []), they are randomly generated. -
goal_array
: Goal positions. If not specified (goal_array = []), they are randomly generated. -
visu_delay
: Waiting time for one step. Default is 0.3s. -
reward_list
: Rewards given when an action is taken by the drone. Default values are :{"goal": 100, "collision": -10, "wait": -10, "move": -1}
-
collision
: Default is "terminated" mode where current episode terminate once collision happens. The another mode is "bounceback," where the drones would bounceback when collision happens. -
time_limit
: We set one episode with maximum 100 steps.
Since the class of 'env' is also as an input passed to policy, there are many functions can be used.Please refer this file.
-
env.get_avail_agent_actions()
: Searches for actions available for all drones -
env.get_pos_list()
: Returns the current positions and states of all agents in a dictionary-list format. -
env.G
: Returns the map information including nodes and edges. Map is constructed byNetworkX
so that you can utilize methods consistent with the usages ofNetworkX
,if you want to obtain detail information about map.(Ex.env.G.nodes
) -
step
: Please see below -
reset
: Sets the initial and destination node for the agent. If not specified, random nodes are set. -
render
: Visualizes the state of agents at each step. -
get_log
: the results of each episode can be displayed.
Input
: joint action, which contains the actions (node numbers) taken by each agent.Output
:obs
: each agent's observationreward
: Represents the each reward received by single agent.done
: Returns False. It becomes True when all drones reach the goal or when a collision occurs.info
: The following list.goal
: Returns True if the agent has reached its goal, otherwise False.collision
: Returns True if a collision has occurred, otherwise False.timeup
: Returns True if the number of steps is larger than 100.distance_from_start
: Distance form start.step
: The number of steps from agent startswait
: When agent be wait state, this count increases by one.