Skip to content

Commit c5000d5

Browse files
authored
Merge pull request #3 from KeplerC/docs
Add documentation
2 parents 3262ee4 + e585698 commit c5000d5

File tree

7 files changed

+316
-54
lines changed

7 files changed

+316
-54
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ import fogx as fox
2020

2121
# 🦊 Dataset Creation
2222
# from distributed dataset storage
23-
dataset = fox.Dataset(load_from = ["/tmp/rtx", "s3://fox_stroage/"])
23+
dataset = fox.Dataset(load_from = ["/tmp/rtx", "s3://fox_storage/"])
2424

2525
# 🦊 Data collection:
2626
# create a new trajectory

docs/Reference.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# API Reference
2+
3+
## Dataset
4+
::: fog_x.dataset.Dataset
5+
6+
-------
7+
8+
## Episode
9+
::: fog_x.episode.Episode

docs/index.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
1-
# Welcome to MkDocs
1+
# 🦊 Fog-X Documentation
22

3-
For full documentation visit [mkdocs.org](https://www.mkdocs.org).
3+
**Fog-X is an efficient and scalable data collection and management framework for robotics learning.**
4+
Supports datasets from [Open-X-Embodiment](https://robotics-transformer-x.github.io/) and 🤗[HuggingFace](https://huggingface.co/).
5+
Fog-X considers both speed 🚀 and memory efficiency 📈 with active metadata and lazily-loaded trajectory data. It supports flexible and distributed dataset partitioning.
46

5-
## Commands
7+
## Installation
68

7-
* `mkdocs new [dir-name]` - Create a new project.
8-
* `mkdocs serve` - Start the live-reloading docs server.
9-
* `mkdocs build` - Build the documentation site.
10-
* `mkdocs -h` - Print help message and exit.
9+
```bash
10+
pip install fogx
11+
```
1112

12-
## Project layout
13+
## Usage
1314

14-
mkdocs.yml # The configuration file.
15-
docs/
16-
index.md # The documentation homepage.
17-
... # Other markdown pages, images and other files.
15+
See [Usage Guide](./usage.md) for an overview of how to use Fog-X.
16+
17+
You can also view [working examples on GitHub](https://github.com/KeplerC/fog_x/tree/main/examples).

docs/usage.md

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
# Usage Guide
2+
The code examples below will assume the following import:
3+
```py
4+
import fog_x as fox
5+
```
6+
7+
## Definitions
8+
- **episode**: one robot trajectory or action, consisting of multiple step data.
9+
- **step data**: data representing a snapshot of the robot action at a certain time
10+
- **metadata**: information that is consistent across a certain episode, e.g. the language instruction associated with the robot action, the name of the person collecting the data, or any other tags/labels.
11+
12+
## The Fog-X Dataset
13+
To start, create a `Dataset` object. Any data that is collected, loaded, or exported
14+
will be saved to the provided path.
15+
There can be existing Fog-X data located at the path as well, so that you can continue
16+
right where you left off.
17+
```py
18+
dataset = fox.Dataset(name="my_fogx_dataset", path="/local/path/my_fogx_dataset")
19+
```
20+
21+
## Collecting Robot Data
22+
23+
```py
24+
# create a new trajectory
25+
episode = dataset.new_episode()
26+
27+
# run robot and collect data
28+
while robot_is_running:
29+
# at each step, add data to the episode
30+
episode.add(feature = "arm_view", value = "image1.jpg")
31+
32+
# Automatically time-aligns and saves the trajectory
33+
episode.close()
34+
```
35+
36+
## Exporting Data
37+
By default, the exported data will be located under the `/export` directory within
38+
the initialized `dataset.path`.
39+
Currently, the supported data formats are `rtx`, `open-x`, and `rlds`.
40+
41+
```py
42+
# Export and share the dataset as standard Open-X-Embodiment format
43+
dataset.export(desired_episodes, format="rtx")
44+
```
45+
46+
### PyTorch
47+
```py
48+
import torch
49+
50+
metadata = dataset.get_episode_info()
51+
metadata = metadata.filter(metadata["feature1"] == "value1")
52+
pytorch_ds = dataset.pytorch_dataset_builder(metadata=metadata)
53+
54+
# get samples from the dataset
55+
for data in torch.utils.data.DataLoader(
56+
pytorch_ds,
57+
batch_size=2,
58+
collate_fn=lambda x: x,
59+
sampler=torch.utils.data.RandomSampler(pytorch_ds),
60+
):
61+
print(data)
62+
```
63+
64+
### HuggingFace
65+
WIP: Currently there is limited support for HuggingFace.
66+
67+
```py
68+
huggingface_ds = dataset.get_as_huggingface_dataset()
69+
```
70+
71+
72+
## Loading Data from Existing Datasets
73+
74+
### RT-X / Tensorflow Datasets
75+
Load any RT-X robotics data available at [Tensorflow Datasets](https://www.tensorflow.org/datasets/catalog/).
76+
You can also find a preview of all the RT-X datasets [here](https://dibyaghosh.com/rtx_viz/).
77+
78+
When loading the episodes, you can optionally specify `additional_metadata` to be associated with it.
79+
You can also load a specific portion of train or test data with the `split` parameter. See the [Tensorflow Split API](https://www.tensorflow.org/datasets/splits) for specifics.
80+
81+
```py
82+
# load all berkeley_autolab_ur5 data
83+
dataset.load_rtx_episodes(name="berkeley_autolab_ur5")
84+
85+
# load 75% of berkeley_autolab_ur5 train data labeled with my_label as train`
86+
dataset.load_rtx_episodes(
87+
name="berkeley_autolab_ur5",
88+
split="train[:75%]",
89+
additional_metadata={"my_label": "train1"}
90+
)
91+
```
92+
93+
## Data Management
94+
95+
### Episode Metadata
96+
You can retrieve episode-level information (metadata) using `dataset.get_episode_info()`.
97+
This is a [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html),
98+
meaning you have access to pandas data management methods including `filter`, `map`, `aggregate`, `groupby`, etc.
99+
After processing the metadata, you can then use the metadata to obtain your
100+
desired episodes with `dataset.read_by(desired_metadata)`.
101+
102+
```py
103+
# Retrieve episode-level data as a pandas DataFrame
104+
episode_info = dataset.get_episode_info()
105+
106+
# Use pandas DataFrame filter to select episodes
107+
desired_episode_metadata = episode_info.filter(episode_info["natural_language_instruction"] == "open door")
108+
109+
# Obtain the actual episodes containing their step data
110+
episodes = dataset.read_by(desired_episode_metadata)
111+
```
112+
113+
### Step Data
114+
Step data is stored as a [Polars LazyFrame](https://docs.pola.rs/py-polars/html/reference/lazyframe/index.html).
115+
Lazy loading with Polars results in speedups of 10 to 100 times compared to pandas.
116+
```py
117+
# Retrieve Fog-X dataset as a Polars LazyFrame
118+
step_data = dataset.get_step_data()
119+
120+
# select only the episode_id and natural_language_instruction
121+
lazy_id_to_language = step_data.select("episode_id", "natural_language_instruction")
122+
123+
# the frame is lazily evaluated at memory when we call collect(). returns a Polars DataFrame
124+
id_to_language = lazy_id_to_language.collect()
125+
126+
# drop rows with duplicate natural_language_instruction to see unique instructions
127+
id_to_language.unique(subset=["natural_language_instruction"], maintain_order=True)
128+
```
129+
130+
Polars also allows chaining methods:
131+
```py
132+
# Same as above example, but chained
133+
id_to_language = (
134+
dataset.get_step_data()
135+
.select("episode_id", "natural_language_instruction")
136+
.collect()
137+
.unique(subset=["natural_language_instruction"], maintain_order=True)
138+
)
139+
```
140+

0 commit comments

Comments
 (0)