|
5 | 5 | [](https://github.com/invenia/BlueStyle)
|
6 | 6 | [](https://JuliaCI.github.io/NanosoldierReports/pkgeval_badges/report.html)
|
7 | 7 |
|
| 8 | +## Introduction |
| 9 | + |
| 10 | +A space is simply a set of objects. In a reinforcement learning context, spaces define the sets of possible states, actions, and observations. |
| 11 | + |
| 12 | +In Julia, spaces can be represented by a variety of objects. For instance, a small discrete action set might be represented with `["up", "left", "down", "right"]`, or an interval of real numbers might be represented with an object from the [`IntervalSets`](https://github.com/JuliaMath/IntervalSets.jl) package. In general, the space defined by any Julia object is the set of objects `x` for which `x in space` returns `true`. |
| 13 | + |
| 14 | +In addition to establishing the definition above, this package provides three useful tools: |
| 15 | + |
| 16 | +1. Traits to communicate about the properties of spaces, e.g. whether they are continuous or discrete, how many subspaces they have, and how to interact with them. |
| 17 | +2. Functions such as `product` for constructing more complex spaces |
| 18 | +3. Constructors to for spaces whose elements are arrays, such as `ArraySpace` and `Box`. |
| 19 | + |
| 20 | +## Concepts and Interface |
| 21 | + |
| 22 | +### Interface for all spaces |
| 23 | + |
| 24 | +Since a space is simply a set of objects, a wide variety of common Julia types including `Vector`, `Set`, `Tuple`, and `Dict`<sup>1</sup>can represent a space. |
| 25 | +Because of this inclusive definition, there is a very minimal interface that all spaces are expected to implement. Specifically, it consists of |
| 26 | +- `in(x, space)`, which tests whether `x` is a member of the set `space` (this can also be called with the `x in space` syntax). |
| 27 | +- `rand(space)`, which returns a valid member of the set<sup>2</sup>. |
| 28 | +- `eltype(space)`, which returns the type of the elements in the space. |
| 29 | + |
| 30 | +In addition, the `SpaceStyle` trait is always defined. Calling `SpaceStyle(space)` will return either a `FiniteSpaceStyle`, `ContinuousSpaceStyle`, `HybridSpaceStyle`, or an `UnknownSpaceStyle` object. |
| 31 | + |
| 32 | +### Finite discrete spaces |
| 33 | + |
| 34 | +Spaces with a finite number of elements have `FiniteSpaceStyle`. These spaces are guaranteed to be iterable, implementing Julia's [iteration interface](https://docs.julialang.org/en/v1/manual/interfaces/). In particular `collect(space)` will return all elements in an array. |
| 35 | + |
| 36 | +### Continuous spaces |
| 37 | + |
| 38 | +Continuous spaces represent sets that have an uncountable number of elements they have a `SpaceStyle` of type `ContinuousSpaceStyle`. CommonRLSpaces does not adopt a rigorous mathematical definition of a continuous set, but, roughly, elements in the interior of a continuous space have other elements very close to them. |
| 39 | + |
| 40 | +Continuous spaces have some additional interface functions: |
| 41 | + |
| 42 | +- `bounds(space)` returns upper and lower bounds in a tuple. For example, if `space` is a unit circle, `bounds(space)` will return `([-1.0, -1.0], [1.0, 1.0])`. This allows agents to choose policies that appropriately cover the space e.g. a normal distribution with a mean of `mean(bounds(space))` and a standard deviation of half the distance between the bounds. |
| 43 | +- `clamp(x, space)` returns an element of `space` that is near `x`. i.e. if `space` is a unit circle, `clamp([2.0, 0.0], space)` might return `[1.0, 0.0]`. This allows for a convenient way for an agent to find a valid action if they sample actions from a distribution that doesn't match the space exactly (e.g. a normal distribution). |
| 44 | +- `clamp!(x, space)`, similar to `clamp`, but clamps `x` in place. |
| 45 | + |
| 46 | +### Hybrid spaces |
| 47 | + |
| 48 | +The interface for hybrid continuous-discrete spaces is currently planned, but not yet defined. If the space style is not `FiniteSpaceStyle` or `ContinuousSpaceStyle`, it is `UnknownSpaceStyle`. |
| 49 | + |
| 50 | +### Spaces of arrays |
| 51 | + |
| 52 | +[need to figure this out, but I think `elsize(space)` should return the size of the arrays in the space] |
| 53 | + |
| 54 | +### Cartesian products of spaces |
| 55 | + |
| 56 | +The Cartesian product of two spaces `a` and `b` can be constructed with `c = product(a, b)`. |
| 57 | + |
| 58 | +The exact form of the resulting space is unspecified and should be considered an implementation detail. The only guarantees are (1) that there will be one unique element of `c` for every combination of one object from `a` and one object from `b` and (2) that the resulting space conforms to the interface above. |
| 59 | + |
| 60 | +The `TupleSpaceProduct` constructor provides a specialized Cartesian product where each element is a tuple, i.e. `TupleSpaceProduct(a, b)` has elements of type `Tuple{eltype(a), eltype(b)}`. |
| 61 | + |
| 62 | +--- |
| 63 | + |
| 64 | +<sup>1</sup>Note: the elements of a space represented by a `Dict` are key-value `Pair`s. |
| 65 | +<sup>2</sup>[TODO: should we make any guarantees about whether `rand(space)` is drawn from a uniform distribution?] |
| 66 | + |
8 | 67 | ## Usage
|
9 | 68 |
|
10 | 69 | ### Construction
|
11 | 70 |
|
12 | 71 | |Category|Style|Example|
|
13 | 72 | |:---|:----|:-----|
|
14 |
| -|Enumerable discrete space| `DiscreteSpaceStyle{()}()` | `Space((:cat, :dog))`, `Space(0:1)`, `Space(1:2)`, `Space(Bool)`| |
15 |
| -|Multi-dimensional discrete space| `DiscreteSpaceStyle{(3,4)}()` | `Space((:cat, :dog), 3, 4)`, `Space(0:1, 3, 4)`, `Space(1:2, 3, 4)`, `Space(Bool, 3, 4)`| |
16 |
| -|Multi-dimensional variable discrete space| `DiscreteSpaceStyle{(2,)}()` | `Space(SVector((:cat, :dog), (:litchi, :longan, :mango))`, `Space([-1:1, (false, true)])`| |
17 |
| -|Continuous space| `ContinuousSpaceStyle{()}()` | `Space(-1.2..3.3)`, `Space(Float32)`| |
18 |
| -|Multi-dimensional continuous space| `ContinuousSpaceStyle{(3,4)}()` | `Space(-1.2..3.3, 3, 4)`, `Space(Float32, 3, 4)`| |
| 73 | +|Enumerable discrete space| `FiniteSpaceStyle{()}()` | `(:cat, :dog)`, `0:1`, `["a","b","c"]` | |
| 74 | +|One dimensional continuous space| `ContinuousSpaceStyle{()}()` | `-1.2..3.3`, `Interval(1.0, 2.0)` | |
| 75 | +|Multi-dimensional discrete space| `FiniteSpaceStyle{(3,4)}()` | `ArraySpace((:cat, :dog), 3, 4)`, `ArraySpace(0:1, 3, 4)`, `ArraySpace(1:2, 3, 4)`, `ArraySpace(Bool, 3, 4)`| |
| 76 | +|Multi-dimensional variable discrete space| `FiniteSpaceStyle{(2,)}()` | `product((:cat, :dog), (:litchi, :longan, :mango))`, `product(-1:1, (false, true))`| |
| 77 | +|Multi-dimensional continuous space| `ContinuousSpaceStyle{(2,)}()` or `ContinuousSpaceStyle{(3,4)}()` | `Box([-1.0, -2.0], [2.0, 4.0])`, `product(-1.2..3.3, -4.6..5.0)`, `ArraySpace(-1.2..3.3, 3, 4)`, `ArraySpace(Float32, 3, 4)` | |
| 78 | +|Multi-dimensional hybrid space| `HybridSpaceStyle{(2,),()}()` | `product(-1.2..3.3, -4.6..5.0, [:cat, :dog])`, `product(Box([-1.0, -2.0], [2.0, 4.0]), [1,2,3])`| |
19 | 79 |
|
20 | 80 | ### API
|
21 | 81 |
|
22 | 82 | ```julia
|
23 | 83 | julia> using CommonRLSpaces
|
24 | 84 |
|
25 |
| -julia> s = Space((:litchi, :longan, :mango)) |
26 |
| -Space{Tuple{Symbol, Symbol, Symbol}}((:litchi, :longan, :mango)) |
| 85 | +julia> s = (:litchi, :longan, :mango) |
27 | 86 |
|
28 | 87 | julia> rand(s)
|
29 | 88 | :litchi
|
30 | 89 |
|
31 | 90 | julia> rand(s) in s
|
32 | 91 | true
|
33 | 92 |
|
34 |
| -julia> size(s) |
35 |
| -() |
| 93 | +julia> length(s) |
| 94 | +3 |
36 | 95 | ```
|
37 | 96 |
|
38 | 97 | ```julia
|
39 |
| -julia> s = Space(UInt8, 2,3) |
40 |
| -Space{Matrix{UnitRange{UInt8}}}(UnitRange{UInt8}[0x00:0xff 0x00:0xff 0x00:0xff; 0x00:0xff 0x00:0xff 0x00:0xff]) |
| 98 | +julia> s = ArraySpace(1:5, 2,3) |
| 99 | +CommonRLSpaces.RepeatedSpace{UnitRange{Int64}, Tuple{Int64, Int64}}(1:5, (2, 3)) |
41 | 100 |
|
42 | 101 | julia> rand(s)
|
43 |
| -2×3 Matrix{UInt8}: |
44 |
| - 0x7b 0x38 0xf3 |
45 |
| - 0x6a 0xe1 0x28 |
| 102 | +2×3 Matrix{Int64}: |
| 103 | + 4 1 1 |
| 104 | + 3 2 2 |
46 | 105 |
|
47 | 106 | julia> rand(s) in s
|
48 | 107 | true
|
49 | 108 |
|
50 | 109 | julia> SpaceStyle(s)
|
51 |
| -DiscreteSpaceStyle{(2, 3)}() |
| 110 | +FiniteSpaceStyle() |
52 | 111 |
|
53 |
| -julia> size(s) |
| 112 | +julia> elsize(s) |
54 | 113 | (2, 3)
|
55 | 114 | ```
|
56 | 115 |
|
57 | 116 | ```julia
|
58 |
| -julia> s = Space(SVector(-1..1, 0..1)) |
59 |
| -Space{SVector{2, ClosedInterval{Int64}}}(ClosedInterval{Int64}[-1..1, 0..1]) |
| 117 | +julia> s = product(-1..1, 0..1) |
| 118 | +Box{StaticArraysCore.SVector{2, Float64}}([-1.0, 0.0], [1.0, 1.0]) |
60 | 119 |
|
61 | 120 | julia> rand(s)
|
62 |
| -2-element SVector{2, Float64} with indices SOneTo(2): |
63 |
| - 0.5563101538643473 |
64 |
| - 0.9227368869418011 |
| 121 | +2-element StaticArraysCore.SVector{2, Float64} with indices SOneTo(2): |
| 122 | + 0.03049072910834738 |
| 123 | + 0.6295234114874269 |
65 | 124 |
|
66 | 125 | julia> rand(s) in s
|
67 | 126 | true
|
68 | 127 |
|
69 | 128 | julia> SpaceStyle(s)
|
70 |
| -ContinuousSpaceStyle{(2,)}() |
| 129 | +ContinuousSpaceStyle() |
71 | 130 |
|
72 |
| -julia> size(s) |
| 131 | +julia> elsize(s) |
73 | 132 | (2,)
|
74 |
| -``` |
| 133 | + |
| 134 | +julia> bounds(s) |
| 135 | +([-1.0, 0.0], [1.0, 1.0]) |
| 136 | + |
| 137 | +julia> clamp([5, 5], s) |
| 138 | +2-element StaticArraysCore.SizedVector{2, Float64, Vector{Float64}} with indices SOneTo(2): |
| 139 | + 1.0 |
| 140 | + 1.0 |
| 141 | +``` |
0 commit comments