|
| 1 | +# Usage Guide |
| 2 | +The code examples below will assume the following import: |
| 3 | +```py |
| 4 | +import fog_x as fox |
| 5 | +``` |
| 6 | + |
| 7 | +## Definitions |
| 8 | +- **episode**: one robot trajectory or action, consisting of multiple step data. |
| 9 | +- **step data**: data representing a snapshot of the robot action at a certain time |
| 10 | +- **metadata**: information that is consistent across a certain episode, e.g. the language instruction associated with the robot action, the name of the person collecting the data, or any other tags/labels. |
| 11 | + |
| 12 | +## The Fog-X Dataset |
| 13 | +To start, create a `Dataset` object. Any data that is collected, loaded, or exported |
| 14 | +will be saved to the provided path. |
| 15 | +There can be existing Fog-X data located at the path as well, so that you can continue |
| 16 | +right where you left off. |
| 17 | +```py |
| 18 | +dataset = fox.Dataset(name="my_fogx_dataset", path="/local/path/my_fogx_dataset") |
| 19 | +``` |
| 20 | + |
| 21 | +## Collecting Robot Data |
| 22 | + |
| 23 | +```py |
| 24 | +# create a new trajectory |
| 25 | +episode = dataset.new_episode() |
| 26 | + |
| 27 | +# run robot and collect data |
| 28 | +while robot_is_running: |
| 29 | + # at each step, add data to the episode |
| 30 | + episode.add(feature = "arm_view", value = "image1.jpg") |
| 31 | + |
| 32 | +# Automatically time-aligns and saves the trajectory |
| 33 | +episode.close() |
| 34 | +``` |
| 35 | + |
| 36 | +## Exporting Data |
| 37 | +By default, the exported data will be located under the `/export` directory within |
| 38 | +the initialized `dataset.path`. |
| 39 | +Currently, the supported data formats are `rtx`, `open-x`, and `rlds`. |
| 40 | + |
| 41 | +```py |
| 42 | +# Export and share the dataset as standard Open-X-Embodiment format |
| 43 | +dataset.export(desired_episodes, format="rtx") |
| 44 | +``` |
| 45 | + |
| 46 | +### PyTorch |
| 47 | +```py |
| 48 | +import torch |
| 49 | + |
| 50 | +metadata = dataset.get_episode_info() |
| 51 | +metadata = metadata.filter(metadata["feature1"] == "value1") |
| 52 | +pytorch_ds = dataset.pytorch_dataset_builder(metadata=metadata) |
| 53 | + |
| 54 | +# get samples from the dataset |
| 55 | +for data in torch.utils.data.DataLoader( |
| 56 | + pytorch_ds, |
| 57 | + batch_size=2, |
| 58 | + collate_fn=lambda x: x, |
| 59 | + sampler=torch.utils.data.RandomSampler(pytorch_ds), |
| 60 | +): |
| 61 | + print(data) |
| 62 | +``` |
| 63 | + |
| 64 | +### HuggingFace |
| 65 | +WIP: Currently there is limited support for HuggingFace. |
| 66 | + |
| 67 | +```py |
| 68 | +huggingface_ds = dataset.get_as_huggingface_dataset() |
| 69 | +``` |
| 70 | + |
| 71 | + |
| 72 | +## Loading Data from Existing Datasets |
| 73 | + |
| 74 | +### RT-X / Tensorflow Datasets |
| 75 | +Load any RT-X robotics data available at [Tensorflow Datasets](https://www.tensorflow.org/datasets/catalog/). |
| 76 | +You can also find a preview of all the RT-X datasets [here](https://dibyaghosh.com/rtx_viz/). |
| 77 | + |
| 78 | +When loading the episodes, you can optionally specify `additional_metadata` to be associated with it. |
| 79 | +You can also load a specific portion of train or test data with the `split` parameter. See the [Tensorflow Split API](https://www.tensorflow.org/datasets/splits) for specifics. |
| 80 | + |
| 81 | +```py |
| 82 | +# load all berkeley_autolab_ur5 data |
| 83 | +dataset.load_rtx_episodes(name="berkeley_autolab_ur5") |
| 84 | + |
| 85 | +# load 75% of berkeley_autolab_ur5 train data labeled with my_label as train` |
| 86 | +dataset.load_rtx_episodes( |
| 87 | + name="berkeley_autolab_ur5", |
| 88 | + split="train[:75%]", |
| 89 | + additional_metadata={"my_label": "train1"} |
| 90 | +) |
| 91 | +``` |
| 92 | + |
| 93 | +## Data Management |
| 94 | + |
| 95 | +### Episode Metadata |
| 96 | +You can retrieve episode-level information (metadata) using `dataset.get_episode_info()`. |
| 97 | +This is a [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), |
| 98 | +meaning you have access to pandas data management methods including `filter`, `map`, `aggregate`, `groupby`, etc. |
| 99 | +After processing the metadata, you can then use the metadata to obtain your |
| 100 | +desired episodes with `dataset.read_by(desired_metadata)`. |
| 101 | + |
| 102 | +```py |
| 103 | +# Retrieve episode-level data as a pandas DataFrame |
| 104 | +episode_info = dataset.get_episode_info() |
| 105 | + |
| 106 | +# Use pandas DataFrame filter to select episodes |
| 107 | +desired_episode_metadata = episode_info.filter(episode_info["natural_language_instruction"] == "open door") |
| 108 | + |
| 109 | +# Obtain the actual episodes containing their step data |
| 110 | +episodes = dataset.read_by(desired_episode_metadata) |
| 111 | +``` |
| 112 | + |
| 113 | +### Step Data |
| 114 | +Step data is stored as a [Polars LazyFrame](https://docs.pola.rs/py-polars/html/reference/lazyframe/index.html). |
| 115 | +Lazy loading with Polars results in speedups of 10 to 100 times compared to pandas. |
| 116 | +```py |
| 117 | +# Retrieve Fog-X dataset as a Polars LazyFrame |
| 118 | +step_data = dataset.get_step_data() |
| 119 | + |
| 120 | +# select only the episode_id and natural_language_instruction |
| 121 | +lazy_id_to_language = step_data.select("episode_id", "natural_language_instruction") |
| 122 | + |
| 123 | +# the frame is lazily evaluated at memory when we call collect(). returns a Polars DataFrame |
| 124 | +id_to_language = lazy_id_to_language.collect() |
| 125 | + |
| 126 | +# drop rows with duplicate natural_language_instruction to see unique instructions |
| 127 | +id_to_language.unique(subset=["natural_language_instruction"], maintain_order=True) |
| 128 | +``` |
| 129 | + |
| 130 | +Polars also allows chaining methods: |
| 131 | +```py |
| 132 | +# Same as above example, but chained |
| 133 | +id_to_language = ( |
| 134 | + dataset.get_step_data() |
| 135 | + .select("episode_id", "natural_language_instruction") |
| 136 | + .collect() |
| 137 | + .unique(subset=["natural_language_instruction"], maintain_order=True) |
| 138 | +) |
| 139 | +``` |
| 140 | + |
0 commit comments