Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - <title>RuntimeError: index -9223372036854775808 is out of bounds for dimension 1 with size 1. ProbabilisticActor cannot be configured with return_log_prob=True; it will throw an error in version 0.5.0, but switching back to version 0.4.0 resolves the issue. #3011

Closed
Sui-Xing opened this issue Aug 26, 2024 · 2 comments
Labels
bug rl Issues related to reinforcement learning tutorial, DQN, and so on

Comments

@Sui-Xing
Copy link

Sui-Xing commented Aug 26, 2024

Add Link

https://pytorch.org/rl/stable/reference/generated/torchrl.modules.tensordict_module.ProbabilisticActor.html?highlight=probabilisticactor#torchrl.modules.tensordict_module.ProbabilisticActor

Describe the bug

CODE

policy_module = TensorDictModule(net_policy,
                                 in_keys=['hidden'],
                                 out_keys=[
                                     ("params", "action1", "logits"),
                                     ("params", "action2", "logits"),
                                     ("params", "action3", "logits"),
                                     ("params", "action4", "logits"),
                                     ("params", "action5", "logits")
                                 ])
actor = ProbabilisticActor(
    module=policy_module,
    in_keys=["params"],
    distribution_class=CompositeDistribution,
    distribution_kwargs={
        "distribution_map": {
            "action1": d.Categorical,
            "action2": d.Categorical,
            "action3": d.Categorical,
            "action4": d.Categorical,
            "action5": d.Categorical
        },

    },
    return_log_prob=True,
)
net_value = Net_Value(num_cells, device=device)
net_value.apply(init_weights)
net_value(hidden)

value_module = ValueOperator(
    module=net_value,
    in_keys=["hidden"],
    out_keys=["state_action_value"]
)

a_c_model = ActorCriticOperator(shared_module, actor, value_module)

test_td = a_c_model.get_policy_operator()(td)

ERROR MESSAGE

Traceback (most recent call last):
  File "********", line 159, in <module>
    test_td = a_c_model.get_policy_operator()(td)
  File "E:\tools\miniconda\envs\***\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\tools\miniconda\envs\***\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\tools\miniconda\envs\***\lib\site-packages\tensordict\nn\common.py", line 297, in wrapper
    return func(_self, tensordict, *args, **kwargs)
  File "E:\tools\miniconda\envs\***\lib\site-packages\tensordict\_contextlib.py", line 127, in decorate_context
    return func(*args, **kwargs)
  File "E:\tools\miniconda\envs\***\lib\site-packages\tensordict\nn\utils.py", line 293, in wrapper
    return func(_self, tensordict, *args, **kwargs)
  File "E:\tools\miniconda\envs\***\lib\site-packages\tensordict\nn\probabilistic.py", line 655, in forward
    return self.module[-1](tensordict_out, _requires_sample=self._requires_sample)
  File "E:\tools\miniconda\envs\***\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\tools\miniconda\envs\***\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\tools\miniconda\envs\***\lib\site-packages\tensordict\nn\common.py", line 297, in wrapper
    return func(_self, tensordict, *args, **kwargs)
  File "E:\tools\miniconda\envs\***\lib\site-packages\tensordict\_contextlib.py", line 127, in decorate_context
    return func(*args, **kwargs)
  File "E:\tools\miniconda\envs\***\lib\site-packages\tensordict\nn\utils.py", line 293, in wrapper
    return func(_self, tensordict, *args, **kwargs)
  File "E:\tools\miniconda\envs\***\lib\site-packages\tensordict\nn\probabilistic.py", line 439, in forward
    tensordict_out = dist.log_prob(tensordict_out)
  File "E:\tools\miniconda\envs\***\lib\site-packages\tensordict\nn\distributions\composite.py", line 150, in log_prob
    d[_add_suffix(name, "_log_prob")] = lp = dist.log_prob(sample.get(name))
  File "E:\tools\miniconda\envs\***\lib\site-packages\torch\distributions\categorical.py", line 142, in log_prob
    return log_pmf.gather(-1, value).squeeze(-1)
RuntimeError: index -9223372036854775808 is out of bounds for dimension 1 with size 1

ProbabilisticActor cannot be configured with return_log_prob=True; it will throw an error in version 0.5.0, but switching back to version 0.4.0 resolves the issue.

Describe your environment

windows11/windows server2022
python 3.10
cpu or cuda11.8
torch==2.4.0
torchrl==0.5.0
tensordict=0.5.0

cc @vmoens @nairbv

@Sui-Xing Sui-Xing added the bug label Aug 26, 2024
@svekars svekars added the rl Issues related to reinforcement learning tutorial, DQN, and so on label Aug 26, 2024
@svekars
Copy link
Contributor

svekars commented Aug 26, 2024

This should go to the RL repository - https://github.com/pytorch/rl

@svekars svekars closed this as completed Aug 26, 2024
@svekars svekars reopened this Aug 26, 2024
@vmoens
Copy link
Contributor

vmoens commented Aug 27, 2024

Closing in favour of pytorch/rl#2402

@vmoens vmoens closed this as completed Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug rl Issues related to reinforcement learning tutorial, DQN, and so on
Projects
None yet
Development

No branches or pull requests

9 participants
@svekars @vmoens @Sui-Xing and others