Appendix

1.File Structure
2. Default hyper-parameters in gym.make()
3. The functions in class of 'env'
4. env.step()

1.File Structure

main
├── README.md
├── drp_env # the directory for drp challenge environment
│   ├── __init__.py
│   ├── drp_env.py 
│   ├── EE_map.py
│   ├── map
│   └── state_repre
├──  problem  
│     └──  problems.py # 30 problems are fixed for evaluation
├──policy_tester.py  # test your developed policy # feel free to customize this file
├──policy # your workspace
│     └──  policy.py # your development
└── calculate_cost.py  # output evaluation result in a json file

2. Default hyper-parameters in gym.make()

Although drp env can be easily constructed by the following codes, there are other hyper-parameters about env can be customized.

    env = gym.make(
        "drp_env:drp-" + str(agent_num) + "agent_" + map_name + "-v2",
        state_repre_flag="onehot_fov",
        reward_list=reward_list,
        goal_array=goal,
        start_ori_array=start,
    )

You can be free to alter the following hyper-parameters in your development, but we will keep the default values to evaluate

speed: Represents the distance of moving in one step (all drones have same speeds and default value is 5).
start_ori_array: Starting positions. If not specified (start_ori_array = []), they are randomly generated.
goal_array: Goal positions. If not specified (goal_array = []), they are randomly generated.
visu_delay: Waiting time for one step. Default is 0.3s.
reward_list: Rewards given when an action is taken by the drone. Default values are : {"goal": 100, "collision": -10, "wait": -10, "move": -1}
collision: Default is "terminated" mode where current episode terminate once collision happens. The another mode is "bounceback," where the drones would bounceback when collision happens.
time_limit: We set one episode with maximum 100 steps.

3. The functions in class of 'env'

Since the class of 'env' is also as an input passed to policy, there are many functions can be used.Please refer this file.

env.get_avail_agent_actions(): Searches for actions available for all drones
env.get_pos_list(): Returns the current positions and states of all agents in a dictionary-list format.
env.G: Returns the map information including nodes and edges. Map is constructed by NetworkX so that you can utilize methods consistent with the usages of NetworkX ,if you want to obtain detail information about map.(Ex.env.G.nodes)
step: Please see below
reset: Sets the initial and destination node for the agent. If not specified, random nodes are set.
render: Visualizes the state of agents at each step.
get_log : the results of each episode can be displayed.

4. env.step()

Input: joint action, which contains the actions (node numbers) taken by each agent.
Output:
- obs: each agent's observation
- reward: Represents the each reward received by single agent.
- done : Returns False. It becomes True when all drones reach the goal or when a collision occurs.
- info: The following list.
  - goal: Returns True if the agent has reached its goal, otherwise False.
  - collision: Returns True if a collision has occurred, otherwise False.
  - timeup : Returns True if the number of steps is larger than 100.
  - distance_from_start : Distance form start.
  - step : The number of steps from agent starts
  - wait : When agent be wait state, this count increases by one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

appendix.md

appendix.md

Appendix

1.File Structure

2. Default hyper-parameters in gym.make()

3. The functions in class of 'env'

4. env.step()

Files

appendix.md

Latest commit

History

appendix.md

File metadata and controls

Appendix

1.File Structure

2. Default hyper-parameters in gym.make()

3. The functions in class of 'env'

4. env.step()