BerkeleyAutomation
diff --git a/‎README.md
Lines changed: 1 addition & 1 deletion b/‎README.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/Reference.md
Lines changed: 9 additions & 0 deletions b/‎docs/Reference.md
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/index.md
Lines changed: 12 additions & 12 deletions b/‎docs/index.md
Lines changed: 12 additions & 12 deletions
diff --git a/‎docs/usage.md
Lines changed: 140 additions & 0 deletions b/‎docs/usage.md
Lines changed: 140 additions & 0 deletions
@@ -20,7 +20,7 @@ import fogx as fox
 
 # 🦊 Dataset Creation 
 # from distributed dataset storage 
-dataset = fox.Dataset(load_from = ["/tmp/rtx", "s3://fox_stroage/"])  
+dataset = fox.Dataset(load_from = ["/tmp/rtx", "s3://fox_storage/"])  
 
 # 🦊 Data collection: 
 # create a new trajectory
 
@@ -0,0 +1,9 @@
+# API Reference
+
+## Dataset
+::: fog_x.dataset.Dataset
+
+-------
+
+## Episode
+::: fog_x.episode.Episode
@@ -1,17 +1,17 @@
-# Welcome to MkDocs
+# 🦊 Fog-X Documentation
 
-For full documentation visit [mkdocs.org](https://www.mkdocs.org).
+**Fog-X is an efficient and scalable data collection and management framework for robotics learning.**
+Supports datasets from [Open-X-Embodiment](https://robotics-transformer-x.github.io/) and 🤗[HuggingFace](https://huggingface.co/).
+Fog-X considers both speed 🚀 and memory efficiency 📈 with active metadata and lazily-loaded trajectory data. It supports flexible and distributed dataset partitioning. 
 
-## Commands
+## Installation
 
-* `mkdocs new [dir-name]` - Create a new project.
-* `mkdocs serve` - Start the live-reloading docs server.
-* `mkdocs build` - Build the documentation site.
-* `mkdocs -h` - Print help message and exit.
+```bash
+pip install fogx
+```
 
-## Project layout
+## Usage
 
-    mkdocs.yml    # The configuration file.
-    docs/
-        index.md  # The documentation homepage.
-        ...       # Other markdown pages, images and other files.
+See [Usage Guide](./usage.md) for an overview of how to use Fog-X.
+
+You can also view [working examples on GitHub](https://github.com/KeplerC/fog_x/tree/main/examples).
@@ -0,0 +1,140 @@
+# Usage Guide
+The code examples below will assume the following import:
+```py
+import fog_x as fox 
+```
+
+## Definitions
+- **episode**: one robot trajectory or action, consisting of multiple step data.
+- **step data**: data representing a snapshot of the robot action at a certain time
+- **metadata**: information that is consistent across a certain episode, e.g. the language instruction associated with the robot action, the name of the person collecting the data, or any other tags/labels.
+
+## The Fog-X Dataset
+To start, create a `Dataset` object. Any data that is collected, loaded, or exported
+will be saved to the provided path.
+There can be existing Fog-X data located at the path as well, so that you can continue
+right where you left off.
+```py
+dataset = fox.Dataset(name="my_fogx_dataset", path="/local/path/my_fogx_dataset")  
+```
+
+## Collecting Robot Data
+
+```py
+# create a new trajectory
+episode = dataset.new_episode()
+
+# run robot and collect data
+while robot_is_running:
+    # at each step, add data to the episode
+    episode.add(feature = "arm_view", value = "image1.jpg")
+
+# Automatically time-aligns and saves the trajectory
+episode.close()
+```
+
+## Exporting Data
+By default, the exported data will be located under the `/export` directory within
+the initialized `dataset.path`.
+Currently, the supported data formats are `rtx`, `open-x`, and `rlds`.
+
+```py
+# Export and share the dataset as standard Open-X-Embodiment format
+dataset.export(desired_episodes, format="rtx")
+```
+
+### PyTorch
+```py
+import torch 
+
+metadata = dataset.get_episode_info()
+metadata = metadata.filter(metadata["feature1"] == "value1")
+pytorch_ds = dataset.pytorch_dataset_builder(metadata=metadata)
+
+# get samples from the dataset
+for data in torch.utils.data.DataLoader(
+    pytorch_ds,
+    batch_size=2,
+    collate_fn=lambda x: x,
+    sampler=torch.utils.data.RandomSampler(pytorch_ds),
+):
+    print(data)
+```
+
+### HuggingFace
+WIP: Currently there is limited support for HuggingFace.
+
+```py
+huggingface_ds = dataset.get_as_huggingface_dataset()
+```
+
+
+## Loading Data from Existing Datasets
+
+### RT-X / Tensorflow Datasets
+Load any RT-X robotics data available at [Tensorflow Datasets](https://www.tensorflow.org/datasets/catalog/).
+You can also find a preview of all the RT-X datasets [here](https://dibyaghosh.com/rtx_viz/).
+
+When loading the episodes, you can optionally specify `additional_metadata` to be associated with it.
+You can also load a specific portion of train or test data with the `split` parameter. See the [Tensorflow Split API](https://www.tensorflow.org/datasets/splits) for specifics.
+
+```py
+# load all berkeley_autolab_ur5 data
+dataset.load_rtx_episodes(name="berkeley_autolab_ur5")
+
+# load 75% of berkeley_autolab_ur5 train data labeled with my_label as train`
+dataset.load_rtx_episodes(
+    name="berkeley_autolab_ur5",
+    split="train[:75%]",
+    additional_metadata={"my_label": "train1"}
+)
+```
+
+## Data Management
+
+### Episode Metadata
+You can retrieve episode-level information (metadata) using `dataset.get_episode_info()`.
+This is a [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html),
+meaning you have access to pandas data management methods including `filter`, `map`, `aggregate`, `groupby`, etc.
+After processing the metadata, you can then use the metadata to obtain your
+desired episodes with `dataset.read_by(desired_metadata)`.
+
+```py
+# Retrieve episode-level data as a pandas DataFrame
+episode_info = dataset.get_episode_info()
+
+# Use pandas DataFrame filter to select episodes
+desired_episode_metadata = episode_info.filter(episode_info["natural_language_instruction"] == "open door")
+
+# Obtain the actual episodes containing their step data
+episodes = dataset.read_by(desired_episode_metadata)
+```
+
+### Step Data
+Step data is stored as a [Polars LazyFrame](https://docs.pola.rs/py-polars/html/reference/lazyframe/index.html).
+Lazy loading with Polars results in speedups of 10 to 100 times compared to pandas.
+```py
+# Retrieve Fog-X dataset as a Polars LazyFrame
+step_data = dataset.get_step_data()
+
+# select only the episode_id and natural_language_instruction
+lazy_id_to_language = step_data.select("episode_id", "natural_language_instruction")
+
+# the frame is lazily evaluated at memory when we call collect(). returns a Polars DataFrame
+id_to_language = lazy_id_to_language.collect() 
+
+# drop rows with duplicate natural_language_instruction to see unique instructions
+id_to_language.unique(subset=["natural_language_instruction"], maintain_order=True)
+```
+
+Polars also allows chaining methods:
+```py
+# Same as above example, but chained
+id_to_language = (
+    dataset.get_step_data()
+        .select("episode_id", "natural_language_instruction")
+        .collect()
+        .unique(subset=["natural_language_instruction"], maintain_order=True)
+)
+```
+