Add probability masking to `space.sample` #1296

mariojerez · 2025-01-22T23:32:16Z

Description

Adds a probability mask feature (probability) to the sample() method of all spaces. This allows you to specify the probability of choosing each action. Similarly to the mask parameter, the probability parameter is a numpy array with the same shape as n, the number of elements in the space. Each value in the array describes the probability of the corresponding value being chosen, with 0 meaning it will not be chosen and 1 meaning that it will be chosen. All of the values in the array must sum to 1. probability is unsupported for the Box and MultiBinary space.

Motivation

This is helpful in instances where values need to be chosen at random, such that some values are given higher priority (a higher likelihood) over others. For example, when implementing an Ant Colony Optimization algorithm, each action needs to be assigned a different probability of being chosen.

Fixes #1255

Type of change

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…rors

…h sample code

…rove efficiency and readability

pseudo-rnd-thoughts

Great job on the PR, overall looks good.
I will look at it in more detail this evening but the biggest thing that I notice is checking if the probabilities don't sum of 1. I'm not sure if numpy will do that for us, or if we'll need to add that check.

FYI, if testing with randomisation is a pain, then fix the seeds to get reliable, consistent results

mariojerez · 2025-01-23T20:48:27Z

Yeah I don't think we need to require their probability to sum to 1 since we normalize the probability mask anyways so that it adds to 1. I could remove this requirement if you'd like. This would also allow the user to input a zeros array, and have it behave the same way that a mask of zeros does: return space.start.

Good to know about fixing seeds to help with testing.

Those two failed tests didn't come up for me locally when I ran pytest. I think it might be because I used a different version of numpy, since the string representation of numpy attributes are different on my machine.

pseudo-rnd-thoughts · 2025-01-24T15:24:35Z

Yeah, we do testing with NumPy 2.0 and <2.0

I suspect that you might need to ignore the error if numpy.__version__ < "2.0"

mariojerez · 2025-01-24T17:32:02Z

Ok. Do you want me to go ahead and make those changes so that the probability sum doesn't have to equal 1?

pseudo-rnd-thoughts

Impressive PR @mariojerez

I think this is a question of programming styles, imho, I prefer simplity over minimal lines of code.
Therefore, personally I think that we should implement the feature like

if mask is not None and probability is not None: 
    raise ValueError
elif mask is not None:
    # logical mask sampling logic
elif probability is not None:
    # probabilistic mask sampling logic
else:
    # uniform sampling logic

This should make it easier to debug and understand for new users what is happening. I understand that this will increase the number of lines of code with duplicate error checking. Thoughts?

There seem to be two other technical questions to ask

Should we enforce the sum of probabilities == 1 or do we normalise before applying the probabilities? Personally, I'm in favour of the first as it makes the requirements for users easy to understand. (If we do this, then use numpy.isclose with a small error, as floating point summation).
For composite spaces, i.e., Tuple / Dict, do we allow users to mix the logical and probabilistic masks? From implementation simplity, I would say no. Plus logical is a subset of probabilistic, therefore, users can convert all logical cases to probabilistic to solve this.

Do you agree with my thoughts?

pseudo-rnd-thoughts · 2025-01-25T12:07:18Z

gymnasium/spaces/multi_binary.py

@@ -91,6 +98,10 @@ def sample(self, mask: MaskNDArray | None = None) -> NDArray[np.int8]:
                self.np_random.integers(low=0, high=2, size=self.n, dtype=self.dtype),
                mask.astype(self.dtype),
            )
+        elif probability is not None:
+            raise gym.error.Error(


Why is this unsupported currently? I'm happy to add this if you wish.

It wasn't obvious to me how to do it at the time, and I decided to move on. It honestly would be a huge help if you did! I'm pretty overwhelmed with classes and other commitments I have this semester.

mariojerez · 2025-01-29T02:04:39Z

Thanks @pseudo-rnd-thoughts !

I agree that that template you provided is simpler and easier to understand. In discrete.py for example, here's what I'm thinking it could be changed to:

if mask is not None and probability is not None: 
    raise ValueError
elif mask is not None:
    self._validate_mask(
                mask,
                (self.n,),
                np.int8,
                "mask",
            )
    valid_action_mask = self._get_valid_action_mask(mask, "mask")
    # continue sampling logic for mask
elif probability is not None:
    self._validate_mask(
                probability,
                (self.n,),
                np.float64,
                "probability",
            )
    valid_action_mask = self._get_valid_action_mask(probability, "probability")
    # continue sampling logic for probability
else:
    # uniform sampling logic

I agree that enforcing probabilities == 1 makes it straight forward to understand. Anyways, the user isn't missing out too much from us enforcing it, because they can easily normalize it themselves.

Referring to your comment about doing the isclose check, In _get_valid_action_mask() (in discrete.py), I have this check:

assert np.isclose(
                np.sum(mask), 1
            ), f"The sum of all values of `probability mask` should be 1, actual sum: {np.sum(mask)}"

I think you're telling me to use np.fsum instead of np.sum, which makes sense. The relative tolerance and absolute tolerance of np.isclose is rtol=1e-05, atol=1e-08 respectively by default. I don't have a problem with these default values, but do you think that I should hard-code them?

I agree with you, I think we can restrict them to one type of mask for composite spaces, without serious disadvantages. One thing to consider though is that according to Numpy's documentation, when adding probabilities (through the p parameter) to the numpy.random.Generator.choice method, it samples less efficiently. So I could see it benefiting a user to be able to sample more efficiently by using mask instead of probability whenever they can.

pseudo-rnd-thoughts · 2025-02-11T17:08:53Z

@mariojerez Hey, sorry I thought I had replied

My preferred Discrete.sample would be

        if mask is not None and probability is not None:
            raise ValueError(
                f"For `Discrete.sample`, only one of `mask` or `probability` can be provided, actual values: mask={mask}, probability={probability}. "
            )

        elif mask is not None:
            assert isinstance(
                mask, np.ndarray
            ), f"The expected type of the sample mask is np.ndarray, actual type: {type(mask)}"
            assert (
                mask.dtype == np.int8
            ), f"The expected dtype of the sample mask is np.int8, actual dtype: {mask.dtype}"
            assert mask.shape == (
                self.n,
            ), f"The expected shape of the sample mask is {(self.n,)}, actual shape: {mask.shape}"

            valid_action_mask = mask == 1
            assert np.all(
                np.logical_or(mask == 0, valid_action_mask)
            ), f"All values of the sample mask should be 0 or 1, actual values: {mask}"

            if np.any(valid_action_mask):
                return self.start + self.np_random.choice(
                    np.where(valid_action_mask)[0]
                )
            else:
                return self.start

        elif probability is not None:
            assert isinstance(
                probability, np.ndarray
            ), f"The expected type of the sample probability is np.ndarray, actual type: {type(probability)}"
            assert (
                probability.dtype == np.float32
            ), f"The expected dtype of the sample probability is np.int8, actual dtype: {probability.dtype}"
            assert probability.shape == (
                self.n,
            ), f"The expected shape of the sample probability is {(self.n,)}, actual shape: {probability.shape}"

            assert np.all(
                np.logical_and(probability >= 0, probability <= 1)
            ), f"All values of the sample probability should be between 0 and 1, actual values: {probability}"
            assert np.isclose(
                np.sum(probability), 1
            ), f"The sum of the sample probability should be equal to 1, actual sum: {np.sum(probability)}"

            return self.start + self.np_random.choice(np.arange(self.n), p=probability)

        else:
            return self.start + self.np_random.integers(self.n)

I know this has repeated code in it but for users and maintainers, it is easier to see what is happening, and change / fix in the future.
I'm happy to update some current code if you want me to

mariojerez · 2025-02-17T05:28:59Z

Sounds good. I'll make those changes when I get the chance. It may be a couple weeks, I'm really swamped with other work right now. If that timescale is longer than ideal, I don't mind if you make those changes. Otherwise, I'm happy to do it.

pseudo-rnd-thoughts · 2025-02-18T00:09:25Z

Cool, I had some time today and looked over the whole PR and make some changes
https://github.com/pseudo-rnd-thoughts/Gymnasium/tree/probability_mask

Not sure what the easier way of making all the changes

Also I fixed the test issues

mariojerez added 13 commits January 14, 2025 18:54

hook fixed styling

713aa42

Updated invalid probability tests so that they catch the assertion er…

94fbd34

…rors

Corrected test_invalid_probability_mask tests and corrected issue wit…

967a061

…h sample code

reformatted comment

8839e03

Added probability to sample method of box, dict, sequence, and tuple

af94493

Fixed error in message that would have shown in documentation

d29f055

Improved documentation for discrete and space

0f6f7b2

Added probability mask to graph space

3005c92

Added probability mask to remaining spaces and refactored code to imp…

c876115

…rove efficiency and readability

Finished up editing sample methods. Added tests.

dc64293

Added and improved tests for box, discrete, graph, multi-discrete, oneof

765442a

Wrote sample method tests for Sequence space

5fcabe4

finalized tests and made a small correction in documentation

0aae7ac

pseudo-rnd-thoughts changed the title ~~Probability mask~~ Add probability masking to space.sample Jan 23, 2025

pseudo-rnd-thoughts reviewed Jan 23, 2025

View reviewed changes

pseudo-rnd-thoughts requested changes Jan 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add probability masking to `space.sample` #1296

Add probability masking to `space.sample` #1296

mariojerez commented Jan 22, 2025

pseudo-rnd-thoughts left a comment •

edited

Loading

mariojerez commented Jan 23, 2025

pseudo-rnd-thoughts commented Jan 24, 2025

mariojerez commented Jan 24, 2025

pseudo-rnd-thoughts left a comment •

edited

Loading

pseudo-rnd-thoughts Jan 25, 2025

mariojerez Jan 29, 2025

mariojerez commented Jan 29, 2025

pseudo-rnd-thoughts commented Feb 11, 2025

mariojerez commented Feb 17, 2025

pseudo-rnd-thoughts commented Feb 18, 2025 •

edited

Loading

Add probability masking to space.sample #1296

Are you sure you want to change the base?

Add probability masking to space.sample #1296

Conversation

mariojerez commented Jan 22, 2025

Description

Motivation

Type of change

Checklist:

pseudo-rnd-thoughts left a comment • edited Loading

Choose a reason for hiding this comment

mariojerez commented Jan 23, 2025

pseudo-rnd-thoughts commented Jan 24, 2025

mariojerez commented Jan 24, 2025

pseudo-rnd-thoughts left a comment • edited Loading

Choose a reason for hiding this comment

pseudo-rnd-thoughts Jan 25, 2025

Choose a reason for hiding this comment

mariojerez Jan 29, 2025

Choose a reason for hiding this comment

mariojerez commented Jan 29, 2025

pseudo-rnd-thoughts commented Feb 11, 2025

mariojerez commented Feb 17, 2025

pseudo-rnd-thoughts commented Feb 18, 2025 • edited Loading

Add probability masking to `space.sample` #1296

Add probability masking to `space.sample` #1296

pseudo-rnd-thoughts left a comment •

edited

Loading

pseudo-rnd-thoughts left a comment •

edited

Loading

pseudo-rnd-thoughts commented Feb 18, 2025 •

edited

Loading