diff --git a/logging/contexts.mdx b/logging/contexts.mdx
index 0e058c07c..8e74e4fc7 100644
--- a/logging/contexts.mdx
+++ b/logging/contexts.mdx
@@ -11,7 +11,7 @@ unify.log(x=0, y=1, z=2)
When no context is specified in the interface or the table, then the default context is used.
However, you can group your logs into *non-overlapping* contexts, which can then be displayed in separate tables.
-```python
+```python [expandable]
first_names = ["Zoe", "John", "Jane", "Jim", "Jill"]
last_names = ["Smith", "Johnson", "Williams", "Jones", "Brown"]
qnas = {
@@ -61,6 +61,7 @@ for _ in range(15):
+click image to maximize
## Context Handling
@@ -81,6 +82,8 @@ with unify.Context("Traffic"):
)
```
+## Nested Contexts
+
These can also be arbitrarily nested, where each context string is joined via `/` to define the full context path.
```python
@@ -91,11 +94,18 @@ with unify.Context("Sciences"):
question="what is 1 + 1?",
region="US"
)
- with unify.Context("Physics"):
+ with unify.Context("Physics"):
+ unify.log(
+ name="John",
+ question="what is the speed of light?",
+ region="EU"
+ )
+with unify.Context("Arts"):
+ with unify.Context("Literature"):
unify.log(
- name="John",
- question="what is the speed of light?",
- region="EU"
+ name="Jane",
+ question="What does this sentence convey?",
+ region="UK"
)
```
@@ -104,7 +114,12 @@ but it forms part of the directory structure for selecting the available context
However, "Sciences" **can** be selected as the context of the tab in the interface,
which limits the search space for each table within the tab.
-GIF
+
+
+click image to maximize
+
+
+## Read and Write Modes
By default, `unify.Context("...")` also changes the behaviour of `get_logs`, returning only the logs relevant within the context.
@@ -143,7 +158,7 @@ exam_results = [
]
with unify.Context("Results"):
logs = unify.create_logs(entries=exam_results)
-with unify.Context("Failed"):
+with unify.Context("ExtraSupport"):
unify.add_logs_to_context(
[l.id for l in logs if not l.entries["passed"]]
)
@@ -157,6 +172,7 @@ If any logs are then updated inplace, they will be reflected in all contexts.
GIF
-Note that in this case, `Results` and `Failed` are **disconnected**, and both sets of logs **cannot** be viewed in a single table.
+Note that in this case, the `ExtraSupport` and `Failed` contexts are "disconnected",
+and therefore both sets of logs **cannot** be viewed in a single table.
-To organize data in a single table, you can use **column contexts**, more on those soon!
+To organize data hierarchically in a single table, you can use **column contexts**, more on those soon!
diff --git a/logging/datasets.mdx b/logging/datasets.mdx
index 67e0fefbb..ce6bc78fc 100644
--- a/logging/datasets.mdx
+++ b/logging/datasets.mdx
@@ -2,40 +2,98 @@
title: 'Datasets'
---
-One common use case of contexts is the creation of *datasets*.
-Each dataset has it's own context, such that it can easily be viewed in it's own table.
-
-The `unify.Dataset` class is the best way to interact with your datasets from the Python client.
+The `unify.Dataset` class is the best way to manage and create datasets.
+Datasets support indexing as well as value-based addition and removal.
```python
my_dataset = unify.Dataset([0, 1, 2], name="my_dataset")
+my_dataset[0] = -1
+my_dataset += [3, 4, 5]
+my_dataset -= [1, 2]
+print(my_dataset)
+```
```
+unify.Dataset([-1, 3, 4, 5], name="my_dataset")
+```
+
+Under the hood, datasets make use of the same logging primitives as `unify.log`.
+The only differences is that datasets each have their own context,
+which is automatically managed by the `unify.Dataset` class.
-You can then upload this to your interface like so.
+## Uploading
+
+You can upload any dataset to your interface like so.
```python
my_dataset.upload()
```
-`unify.Dataset` sets the context as `f"Datasets/{name}"`, so your dataset can be viewed at `Datasets/my_dataset`.
+The `unify.Dataset` class automatically sets the context as `f"Datasets/{name}"`, so your dataset can be viewed at `Datasets/my_dataset`.
-IMG
+GIF
+
+ Setting `overwrite=True` will overwrite any existing dataset with the same name
+(even if the upstream dataset contained more data not included in the upload).
+
+```python
+my_dataset.upload(overwrite=True)
+```
+
+## Downloading
If your dataset already exists upstream, you can download it like so.
```python
-my_dataset = unify.Dataset.from_upstream("my_dataset")
+my_dataset = unify.Dataset.download("my_dataset")
```
-You can also add and remove content by value using standard Python operators.
+Setting `overwrite=True` will overwrite your local dataset
+(even if your local dataset contained more data than is in the download).
```python
-my_dataset += [3, 4, 5]
-my_dataset -= [0, 1, 2]
-my_dataset.upload(overwrite=True)
+my_dataset.download(overwrite=True)
```
-Given that logs can be shared across multiple contexts, it's easy to create sub-datasets withou duplication.
+## Syncing
+
+Finally, if you want to sync your local dataset with the upstream version,
+to achieve the superset of the upstream and local data, you can do so like so.
+
+```python
+my_dataset.sync()
+```
+
+UNION
+
+SUPERSET
+
+## Duplicates
+
+By default, the `unify.Dataset` class will **not** allow duplicate values.
+
+```python
+unify.Dataset([0, 1, 2, 2])
+```
+```
+EXCEPTION
+```
+
+If you would like to allow duplicates, you can override the default behavior by setting the `allow_duplicates` flag when creating the dataset.
+
+```python
+unify.Dataset([0, 1, 2, 2], allow_duplicates=True)
+```
+
+When `allow_duplicates` is set to `False`, then all upstream entries with identical values to local entries will be assumed to represent the **same entry**,
+and any unset log ids of local values will be updated to match the upstream ids.
+
+If `allow_duplicates` is set to `True`, then any upstream entries with identical values to local entries will simply be added alongside the local entries,
+and the log ids of the local entries will be left unchanged.
+
+If duplicates are not explicitly required for a dataset,
+then it's best to leave `allow_duplicates` set to `False`.
+Even if duplicates are needed, adding an extra `example_id` column with `allow_duplicates` kept as `False` can be worthwhile,
+especially if you're regularly syncing datasets between local and upstream sources.
```python
import random
diff --git a/logging/overview.mdx b/logging/overview.mdx
index 86b4e9315..ebb01e151 100644
--- a/logging/overview.mdx
+++ b/logging/overview.mdx
@@ -7,7 +7,8 @@ Logs enable you to store **any** kind of data to use in your [interfaces](https:
All logging features are supported by the [REST API](https://docs.unify.ai/api-reference),
-but our Python client provides more convenient abstractions on top of the basic logging primitives.
+but our [Python client](https://docs.unify.ai/python)
+provides more convenient abstractions on top of the basic logging primitives.
Just `unify.log` your data from the Python client...