-
-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add probability masking to space.sample
#1296
base: main
Are you sure you want to change the base?
Conversation
…rove efficiency and readability
space.sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job on the PR, overall looks good.
I will look at it in more detail this evening but the biggest thing that I notice is checking if the probabilities don't sum of 1. I'm not sure if numpy will do that for us, or if we'll need to add that check.
FYI, if testing with randomisation is a pain, then fix the seeds to get reliable, consistent results
Yeah I don't think we need to require their probability to sum to 1 since we normalize the probability mask anyways so that it adds to 1. I could remove this requirement if you'd like. This would also allow the user to input a zeros array, and have it behave the same way that a Good to know about fixing seeds to help with testing. Those two failed tests didn't come up for me locally when I ran pytest. I think it might be because I used a different version of numpy, since the string representation of numpy attributes are different on my machine. |
Yeah, we do testing with NumPy 2.0 and <2.0 I suspect that you might need to ignore the error if |
Ok. Do you want me to go ahead and make those changes so that the probability sum doesn't have to equal 1? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive PR @mariojerez
I think this is a question of programming styles, imho, I prefer simplity over minimal lines of code.
Therefore, personally I think that we should implement the feature like
if mask is not None and probability is not None:
raise ValueError
elif mask is not None:
# logical mask sampling logic
elif probability is not None:
# probabilistic mask sampling logic
else:
# uniform sampling logic
This should make it easier to debug and understand for new users what is happening. I understand that this will increase the number of lines of code with duplicate error checking. Thoughts?
There seem to be two other technical questions to ask
- Should we enforce the sum of probabilities == 1 or do we normalise before applying the probabilities? Personally, I'm in favour of the first as it makes the requirements for users easy to understand. (If we do this, then use
numpy.isclose
with a small error, as floating point summation). - For composite spaces, i.e., Tuple / Dict, do we allow users to mix the logical and probabilistic masks? From implementation simplity, I would say no. Plus logical is a subset of probabilistic, therefore, users can convert all logical cases to probabilistic to solve this.
Do you agree with my thoughts?
@@ -91,6 +98,10 @@ def sample(self, mask: MaskNDArray | None = None) -> NDArray[np.int8]: | |||
self.np_random.integers(low=0, high=2, size=self.n, dtype=self.dtype), | |||
mask.astype(self.dtype), | |||
) | |||
elif probability is not None: | |||
raise gym.error.Error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this unsupported currently? I'm happy to add this if you wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't obvious to me how to do it at the time, and I decided to move on. It honestly would be a huge help if you did! I'm pretty overwhelmed with classes and other commitments I have this semester.
Thanks @pseudo-rnd-thoughts ! I agree that that template you provided is simpler and easier to understand. In discrete.py for example, here's what I'm thinking it could be changed to:
Referring to your comment about doing the isclose check, In
I think you're telling me to use
|
@mariojerez Hey, sorry I thought I had replied My preferred if mask is not None and probability is not None:
raise ValueError(
f"For `Discrete.sample`, only one of `mask` or `probability` can be provided, actual values: mask={mask}, probability={probability}. "
)
elif mask is not None:
assert isinstance(
mask, np.ndarray
), f"The expected type of the sample mask is np.ndarray, actual type: {type(mask)}"
assert (
mask.dtype == np.int8
), f"The expected dtype of the sample mask is np.int8, actual dtype: {mask.dtype}"
assert mask.shape == (
self.n,
), f"The expected shape of the sample mask is {(self.n,)}, actual shape: {mask.shape}"
valid_action_mask = mask == 1
assert np.all(
np.logical_or(mask == 0, valid_action_mask)
), f"All values of the sample mask should be 0 or 1, actual values: {mask}"
if np.any(valid_action_mask):
return self.start + self.np_random.choice(
np.where(valid_action_mask)[0]
)
else:
return self.start
elif probability is not None:
assert isinstance(
probability, np.ndarray
), f"The expected type of the sample probability is np.ndarray, actual type: {type(probability)}"
assert (
probability.dtype == np.float32
), f"The expected dtype of the sample probability is np.int8, actual dtype: {probability.dtype}"
assert probability.shape == (
self.n,
), f"The expected shape of the sample probability is {(self.n,)}, actual shape: {probability.shape}"
assert np.all(
np.logical_and(probability >= 0, probability <= 1)
), f"All values of the sample probability should be between 0 and 1, actual values: {probability}"
assert np.isclose(
np.sum(probability), 1
), f"The sum of the sample probability should be equal to 1, actual sum: {np.sum(probability)}"
return self.start + self.np_random.choice(np.arange(self.n), p=probability)
else:
return self.start + self.np_random.integers(self.n) I know this has repeated code in it but for users and maintainers, it is easier to see what is happening, and change / fix in the future. |
Sounds good. I'll make those changes when I get the chance. It may be a couple weeks, I'm really swamped with other work right now. If that timescale is longer than ideal, I don't mind if you make those changes. Otherwise, I'm happy to do it. |
Cool, I had some time today and looked over the whole PR and make some changes Not sure what the easier way of making all the changes Also I fixed the test issues |
Description
Adds a probability mask feature (
probability
) to thesample()
method of all spaces. This allows you to specify the probability of choosing each action. Similarly to themask
parameter, theprobability
parameter is a numpy array with the same shape asn
, the number of elements in the space. Each value in the array describes the probability of the corresponding value being chosen, with 0 meaning it will not be chosen and 1 meaning that it will be chosen. All of the values in the array must sum to 1.probability
is unsupported for the Box and MultiBinary space.Motivation
This is helpful in instances where values need to be chosen at random, such that some values are given higher priority (a higher likelihood) over others. For example, when implementing an Ant Colony Optimization algorithm, each action needs to be assigned a different probability of being chosen.
Fixes #1255
Type of change
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)