updates to logging section.

unifyai · Feb 26, 2025 · 190111e · 190111e
1 parent fa699cd
commit 190111e
Show file tree

Hide file tree

Showing 3 changed files with 98 additions and 23 deletions.
diff --git a/logging/contexts.mdx b/logging/contexts.mdx
@@ -11,7 +11,7 @@ unify.log(x=0, y=1, z=2)
 When no context is specified in the interface or the table, then the default context is used.
 However, you can group your logs into *non-overlapping* contexts, which can then be displayed in separate tables.
 
-```python
+```python [expandable]
 first_names = ["Zoe", "John", "Jane", "Jim", "Jill"]
 last_names = ["Smith", "Johnson", "Williams", "Jones", "Brown"]
 qnas = {
@@ -61,6 +61,7 @@ for _ in range(15):
 
 <Accordion title="In the Interface">
 <img class="dark-light" width="100%" src="https://raw.githubusercontent.com/unifyai/unifyai.github.io/refs/heads/main/img/externally_linked/two_contexts_dark.gif"/>
+click image to maximize
 </Accordion>
 
 ## Context Handling
@@ -81,6 +82,8 @@ with unify.Context("Traffic"):
     )
 ```
 
+## Nested Contexts
+
 These can also be arbitrarily nested, where each context string is joined via `/` to define the full context path.
 
 ```python
@@ -91,11 +94,18 @@ with unify.Context("Sciences"):
             question="what is 1 + 1?",
             region="US"
         )
-    with unify.Context("Physics"):
+        with unify.Context("Physics"):
+            unify.log(
+                name="John",
+                question="what is the speed of light?",
+                region="EU"
+            )
+with unify.Context("Arts"):
+    with unify.Context("Literature"):
         unify.log(
-            name="John",
-            question="what is the speed of light?",
-            region="EU"
+            name="Jane",
+            question="What does this sentence convey?",
+            region="UK"
         )
 ```
 
@@ -104,7 +114,12 @@ but it forms part of the directory structure for selecting the available context
 However, "Sciences" **can** be selected as the context of the tab in the interface,
 which limits the search space for each table within the tab.
 
-GIF
+<Accordion title="In the Interface">
+<img class="dark-light" width="100%" src="https://raw.githubusercontent.com/unifyai/unifyai.github.io/refs/heads/main/img/externally_linked/table_nested_contexts_dark.gif"/>
+click image to maximize
+</Accordion>
+
+## Read and Write Modes
 
 By default, `unify.Context("...")` also changes the behaviour of `get_logs`, returning only the logs relevant within the context.
 
@@ -143,7 +158,7 @@ exam_results = [
 ]
 with unify.Context("Results"):
     logs = unify.create_logs(entries=exam_results)
-with unify.Context("Failed"):
+with unify.Context("ExtraSupport"):
     unify.add_logs_to_context(
         [l.id for l in logs if not l.entries["passed"]]
     )
@@ -157,6 +172,7 @@ If any logs are then updated inplace, they will be reflected in all contexts.
 
 GIF
 
-Note that in this case, `Results` and `Failed` are **disconnected**, and both sets of logs **cannot** be viewed in a single table.
+Note that in this case, the `ExtraSupport` and `Failed` contexts are "disconnected",
+and therefore both sets of logs **cannot** be viewed in a single table.
 
-To organize data in a single table, you can use **column contexts**, more on those soon!
+To organize data hierarchically in a single table, you can use **column contexts**, more on those soon!
diff --git a/logging/datasets.mdx b/logging/datasets.mdx
@@ -2,40 +2,98 @@
 title: 'Datasets'
 ---
 
-One common use case of contexts is the creation of *datasets*.
-Each dataset has it's own context, such that it can easily be viewed in it's own table.
-
-The `unify.Dataset` class is the best way to interact with your datasets from the Python client.
+The `unify.Dataset` class is the best way to manage and create datasets.
+Datasets support indexing as well as value-based addition and removal.
 
 ```python
 my_dataset = unify.Dataset([0, 1, 2], name="my_dataset")
+my_dataset[0] = -1
+my_dataset += [3, 4, 5]
+my_dataset -= [1, 2]
+print(my_dataset)
+```
 ```
+unify.Dataset([-1, 3, 4, 5], name="my_dataset")
+```
+
+Under the hood, datasets make use of the same logging primitives as `unify.log`.
+The only differences is that datasets each have their own context,
+which is automatically managed by the `unify.Dataset` class.
 
-You can then upload this to your interface like so.
+## Uploading
+
+You can upload any dataset to your interface like so.
 
 ```python
 my_dataset.upload()
 ```
 
-`unify.Dataset` sets the context as `f"Datasets/{name}"`, so your dataset can be viewed at `Datasets/my_dataset`.
+The `unify.Dataset` class automatically sets the context as `f"Datasets/{name}"`, so your dataset can be viewed at `Datasets/my_dataset`.
 
-IMG
+GIF
+
+    Setting `overwrite=True` will overwrite any existing dataset with the same name 
+(even if the upstream dataset contained more data not included in the upload).
+
+```python
+my_dataset.upload(overwrite=True)
+```
+
+## Downloading
 
 If your dataset already exists upstream, you can download it like so.
 
 ```python
-my_dataset = unify.Dataset.from_upstream("my_dataset")
+my_dataset = unify.Dataset.download("my_dataset")
 ```
 
-You can also add and remove content by value using standard Python operators.
+Setting `overwrite=True` will overwrite your local dataset 
+(even if your local dataset contained more data than is in the download).
 
 ```python
-my_dataset += [3, 4, 5]
-my_dataset -= [0, 1, 2]
-my_dataset.upload(overwrite=True)
+my_dataset.download(overwrite=True)
 ```
 
-Given that logs can be shared across multiple contexts, it's easy to create sub-datasets withou duplication.
+## Syncing
+
+Finally, if you want to sync your local dataset with the upstream version,
+to achieve the superset of the upstream and local data, you can do so like so.
+
+```python
+my_dataset.sync()
+```
+
+UNION
+
+SUPERSET
+
+## Duplicates
+
+By default, the `unify.Dataset` class will **not** allow duplicate values.
+
+```python
+unify.Dataset([0, 1, 2, 2])
+```
+```
+EXCEPTION
+```
+
+If you would like to allow duplicates, you can override the default behavior by setting the `allow_duplicates` flag when creating the dataset.
+
+```python
+unify.Dataset([0, 1, 2, 2], allow_duplicates=True)
+```
+
+When `allow_duplicates` is set to `False`, then all upstream entries with identical values to local entries will be assumed to represent the **same entry**,
+and any unset log ids of local values will be updated to match the upstream ids.
+
+If `allow_duplicates` is set to `True`, then any upstream entries with identical values to local entries will simply be added alongside the local entries,
+and the log ids of the local entries will be left unchanged.
+
+If duplicates are not explicitly required for a dataset, 
+then it's best to leave `allow_duplicates` set to `False`.
+Even if duplicates are needed, adding an extra `example_id` column with `allow_duplicates` kept as `False` can be worthwhile, 
+especially if you're regularly syncing datasets between local and upstream sources.
 
 ```python
 import random

diff --git a/logging/overview.mdx b/logging/overview.mdx
@@ -7,7 +7,8 @@ Logs enable you to store **any** kind of data to use in your [interfaces](https:
 <img class="dark-light" width="100%" src="https://raw.githubusercontent.com/unifyai/unifyai.github.io/refs/heads/main/img/externally_linked/logging_animation_dark.gif"/>
 
 All logging features are supported by the [REST API](https://docs.unify.ai/api-reference),
-but our Python client provides more convenient abstractions on top of the basic logging primitives.
+but our [Python client](https://docs.unify.ai/python)
+provides more convenient abstractions on top of the basic logging primitives.
 
 Just `unify.log` your data from the Python client...