Skip to content

Commit

Permalink
updates to logging section.
Browse files Browse the repository at this point in the history
  • Loading branch information
djl11 committed Feb 26, 2025
1 parent fa699cd commit 190111e
Show file tree
Hide file tree
Showing 3 changed files with 98 additions and 23 deletions.
34 changes: 25 additions & 9 deletions logging/contexts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ unify.log(x=0, y=1, z=2)
When no context is specified in the interface or the table, then the default context is used.
However, you can group your logs into *non-overlapping* contexts, which can then be displayed in separate tables.

```python
```python [expandable]
first_names = ["Zoe", "John", "Jane", "Jim", "Jill"]
last_names = ["Smith", "Johnson", "Williams", "Jones", "Brown"]
qnas = {
Expand Down Expand Up @@ -61,6 +61,7 @@ for _ in range(15):

<Accordion title="In the Interface">
<img class="dark-light" width="100%" src="https://raw.githubusercontent.com/unifyai/unifyai.github.io/refs/heads/main/img/externally_linked/two_contexts_dark.gif"/>
click image to maximize
</Accordion>

## Context Handling
Expand All @@ -81,6 +82,8 @@ with unify.Context("Traffic"):
)
```

## Nested Contexts

These can also be arbitrarily nested, where each context string is joined via `/` to define the full context path.

```python
Expand All @@ -91,11 +94,18 @@ with unify.Context("Sciences"):
question="what is 1 + 1?",
region="US"
)
with unify.Context("Physics"):
with unify.Context("Physics"):
unify.log(
name="John",
question="what is the speed of light?",
region="EU"
)
with unify.Context("Arts"):
with unify.Context("Literature"):
unify.log(
name="John",
question="what is the speed of light?",
region="EU"
name="Jane",
question="What does this sentence convey?",
region="UK"
)
```

Expand All @@ -104,7 +114,12 @@ but it forms part of the directory structure for selecting the available context
However, "Sciences" **can** be selected as the context of the tab in the interface,
which limits the search space for each table within the tab.

GIF
<Accordion title="In the Interface">
<img class="dark-light" width="100%" src="https://raw.githubusercontent.com/unifyai/unifyai.github.io/refs/heads/main/img/externally_linked/table_nested_contexts_dark.gif"/>
click image to maximize
</Accordion>

## Read and Write Modes

By default, `unify.Context("...")` also changes the behaviour of `get_logs`, returning only the logs relevant within the context.

Expand Down Expand Up @@ -143,7 +158,7 @@ exam_results = [
]
with unify.Context("Results"):
logs = unify.create_logs(entries=exam_results)
with unify.Context("Failed"):
with unify.Context("ExtraSupport"):
unify.add_logs_to_context(
[l.id for l in logs if not l.entries["passed"]]
)
Expand All @@ -157,6 +172,7 @@ If any logs are then updated inplace, they will be reflected in all contexts.

GIF

Note that in this case, `Results` and `Failed` are **disconnected**, and both sets of logs **cannot** be viewed in a single table.
Note that in this case, the `ExtraSupport` and `Failed` contexts are "disconnected",
and therefore both sets of logs **cannot** be viewed in a single table.

To organize data in a single table, you can use **column contexts**, more on those soon!
To organize data hierarchically in a single table, you can use **column contexts**, more on those soon!
84 changes: 71 additions & 13 deletions logging/datasets.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,40 +2,98 @@
title: 'Datasets'
---

One common use case of contexts is the creation of *datasets*.
Each dataset has it's own context, such that it can easily be viewed in it's own table.

The `unify.Dataset` class is the best way to interact with your datasets from the Python client.
The `unify.Dataset` class is the best way to manage and create datasets.
Datasets support indexing as well as value-based addition and removal.

```python
my_dataset = unify.Dataset([0, 1, 2], name="my_dataset")
my_dataset[0] = -1
my_dataset += [3, 4, 5]
my_dataset -= [1, 2]
print(my_dataset)
```
```
unify.Dataset([-1, 3, 4, 5], name="my_dataset")
```

Under the hood, datasets make use of the same logging primitives as `unify.log`.
The only differences is that datasets each have their own context,
which is automatically managed by the `unify.Dataset` class.

You can then upload this to your interface like so.
## Uploading

You can upload any dataset to your interface like so.

```python
my_dataset.upload()
```

`unify.Dataset` sets the context as `f"Datasets/{name}"`, so your dataset can be viewed at `Datasets/my_dataset`.
The `unify.Dataset` class automatically sets the context as `f"Datasets/{name}"`, so your dataset can be viewed at `Datasets/my_dataset`.

IMG
GIF

Setting `overwrite=True` will overwrite any existing dataset with the same name
(even if the upstream dataset contained more data not included in the upload).

```python
my_dataset.upload(overwrite=True)
```

## Downloading

If your dataset already exists upstream, you can download it like so.

```python
my_dataset = unify.Dataset.from_upstream("my_dataset")
my_dataset = unify.Dataset.download("my_dataset")
```

You can also add and remove content by value using standard Python operators.
Setting `overwrite=True` will overwrite your local dataset
(even if your local dataset contained more data than is in the download).

```python
my_dataset += [3, 4, 5]
my_dataset -= [0, 1, 2]
my_dataset.upload(overwrite=True)
my_dataset.download(overwrite=True)
```

Given that logs can be shared across multiple contexts, it's easy to create sub-datasets withou duplication.
## Syncing

Finally, if you want to sync your local dataset with the upstream version,
to achieve the superset of the upstream and local data, you can do so like so.

```python
my_dataset.sync()
```

UNION

SUPERSET

## Duplicates

By default, the `unify.Dataset` class will **not** allow duplicate values.

```python
unify.Dataset([0, 1, 2, 2])
```
```
EXCEPTION
```

If you would like to allow duplicates, you can override the default behavior by setting the `allow_duplicates` flag when creating the dataset.

```python
unify.Dataset([0, 1, 2, 2], allow_duplicates=True)
```

When `allow_duplicates` is set to `False`, then all upstream entries with identical values to local entries will be assumed to represent the **same entry**,
and any unset log ids of local values will be updated to match the upstream ids.

If `allow_duplicates` is set to `True`, then any upstream entries with identical values to local entries will simply be added alongside the local entries,
and the log ids of the local entries will be left unchanged.

If duplicates are not explicitly required for a dataset,
then it's best to leave `allow_duplicates` set to `False`.
Even if duplicates are needed, adding an extra `example_id` column with `allow_duplicates` kept as `False` can be worthwhile,
especially if you're regularly syncing datasets between local and upstream sources.

```python
import random
Expand Down
3 changes: 2 additions & 1 deletion logging/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ Logs enable you to store **any** kind of data to use in your [interfaces](https:
<img class="dark-light" width="100%" src="https://raw.githubusercontent.com/unifyai/unifyai.github.io/refs/heads/main/img/externally_linked/logging_animation_dark.gif"/>

All logging features are supported by the [REST API](https://docs.unify.ai/api-reference),
but our Python client provides more convenient abstractions on top of the basic logging primitives.
but our [Python client](https://docs.unify.ai/python)
provides more convenient abstractions on top of the basic logging primitives.

Just `unify.log` your data from the Python client...

Expand Down

0 comments on commit 190111e

Please sign in to comment.