@@ -731,29 +731,80 @@ pixels or states etc).
731
731
Forward and inverse transforms
732
732
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
733
733
734
- Transforms also have an ``inv `` method that is called before
735
- the action is applied in reverse order over the composed transform chain:
736
- this allows to apply transforms to data in the environment before the action is taken
737
- in the environment. The keys to be included in this inverse transform are passed through the
738
- ``"in_keys_inv" `` keyword argument:
734
+ Transforms also have an :meth: `~torchrl.envs.Transform.inv ` method that is called before the action is applied in reverse
735
+ order over the composed transform chain. This allows applying transforms to data in the environment before the action is
736
+ taken in the environment. The keys to be included in this inverse transform are passed through the `"in_keys_inv" `
737
+ keyword argument, and the out-keys default to these values in most cases:
739
738
740
739
.. code-block ::
741
740
:caption: Inverse transform
742
741
743
742
>>> env.append_transform(DoubleToFloat(in_keys_inv=["action"])) # will map the action from float32 to float64 before calling the base_env.step
744
743
745
- The way ``in_keys `` relates to ``in_keys_inv `` can be understood by considering the base environment as the "inner" part
746
- of the transform. In constrast, the user inputs and outputs to and from the transform are to be considered as the
747
- outside world. The following figure shows what this means in practice for the :class: `~torchrl.envs.RenameTransform `
748
- class: the input ``TensorDict `` of the ``step `` function must have the ``out_keys_inv `` listed in its entries as they
749
- are part of the outside world. The transform changes these names to make them match the names of the inner, base
750
- environment using the ``in_keys_inv ``. The inverse process is executed with the output tensordict, where the ``in_keys ``
751
- are mapped to the corresponding ``out_keys ``.
744
+ The following paragraphs detail how one can think about what is to be considered `in_ ` or `out_ ` features.
745
+
746
+ Understanding Transform Keys
747
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
748
+
749
+ In transforms, `in_keys ` and `out_keys ` define the interaction between the base environment and the outside world
750
+ (e.g., your policy):
751
+
752
+ - `in_keys ` refers to the base environment's perspective (inner = `base_env ` of the
753
+ :class: `~torchrl.envs.TransformedEnv `).
754
+ - `out_keys ` refers to the outside world (outer = `policy `, `agent `, etc.).
755
+
756
+ For example, with `in_keys=["obs"] ` and `out_keys=["obs_standardized"] `, the policy will "see" a standardized
757
+ observation, while the base environment outputs a regular observation.
758
+
759
+ Similarly, for inverse keys:
760
+
761
+ - `in_keys_inv ` refers to entries as seen by the base environment.
762
+ - `out_keys_inv ` refers to entries as seen or produced by the policy.
763
+
764
+ The following figure illustrates this concept for the :class: `~torchrl.envs.RenameTransform ` class: the input
765
+ `TensorDict ` of the `step ` function must include the `out_keys_inv ` as they are part of the outside world. The
766
+ transform changes these names to match the names of the inner, base environment using the `in_keys_inv `.
767
+ The inverse process is executed with the output tensordict, where the `in_keys ` are mapped to the corresponding
768
+ `out_keys `.
752
769
753
770
.. figure :: /_static/img/rename_transform.png
754
771
755
772
Rename transform logic
756
773
774
+ Transforming Tensors and Specs
775
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
776
+
777
+ When transforming actual tensors (coming from the policy), the process is schematically represented as:
778
+
779
+ >>> for t in reversed (self .transform):
780
+ ... td = t.inv(td)
781
+
782
+ This starts with the outermost transform to the innermost transform, ensuring the action value exposed to the policy
783
+ is properly transformed.
784
+
785
+ For transforming the action spec, the process should go from innermost to outermost (similar to observation specs):
786
+
787
+ >>> def transform_action_spec (self , action_spec ):
788
+ ... for t in self .transform:
789
+ ... action_spec = t.transform_action_spec(action_spec)
790
+ ... return action_spec
791
+
792
+ A pseudocode for a single transform_action_spec could be:
793
+
794
+ >>> def transform_action_spec (self , action_spec ):
795
+ ... return spec_from_random_values(self ._apply_transform(action_spec.rand()))
796
+
797
+ This approach ensures that the "outside" spec is inferred from the "inside" spec. Note that we did not call
798
+ `_inv_apply_transform ` but `_apply_transform ` on purpose!
799
+
800
+ Exposing Specs to the Outside World
801
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
802
+
803
+ `TransformedEnv ` will expose the specs corresponding to the `out_keys_inv ` for actions and states.
804
+ For example, with :class: `~torchrl.envs.ActionDiscretizer `, the environment's action (e.g., `"action" `) is a float-valued
805
+ tensor that should not be generated when using :meth: `~torchrl.envs.EnvBase.rand_action ` with the transformed
806
+ environment. Instead, `"action_discrete" ` should be generated, and its continuous counterpart obtained from the
807
+ transform. Therefore, the user should see the `"action_discrete" ` entry being exposed, but not `"action" `.
757
808
758
809
759
810
Cloning transforms
0 commit comments