diff --git a/logging/contexts.mdx b/logging/contexts.mdx index 0e058c07c..8e74e4fc7 100644 --- a/logging/contexts.mdx +++ b/logging/contexts.mdx @@ -11,7 +11,7 @@ unify.log(x=0, y=1, z=2) When no context is specified in the interface or the table, then the default context is used. However, you can group your logs into *non-overlapping* contexts, which can then be displayed in separate tables. -```python +```python [expandable] first_names = ["Zoe", "John", "Jane", "Jim", "Jill"] last_names = ["Smith", "Johnson", "Williams", "Jones", "Brown"] qnas = { @@ -61,6 +61,7 @@ for _ in range(15): +click image to maximize ## Context Handling @@ -81,6 +82,8 @@ with unify.Context("Traffic"): ) ``` +## Nested Contexts + These can also be arbitrarily nested, where each context string is joined via `/` to define the full context path. ```python @@ -91,11 +94,18 @@ with unify.Context("Sciences"): question="what is 1 + 1?", region="US" ) - with unify.Context("Physics"): + with unify.Context("Physics"): + unify.log( + name="John", + question="what is the speed of light?", + region="EU" + ) +with unify.Context("Arts"): + with unify.Context("Literature"): unify.log( - name="John", - question="what is the speed of light?", - region="EU" + name="Jane", + question="What does this sentence convey?", + region="UK" ) ``` @@ -104,7 +114,12 @@ but it forms part of the directory structure for selecting the available context However, "Sciences" **can** be selected as the context of the tab in the interface, which limits the search space for each table within the tab. -GIF + + +click image to maximize + + +## Read and Write Modes By default, `unify.Context("...")` also changes the behaviour of `get_logs`, returning only the logs relevant within the context. @@ -143,7 +158,7 @@ exam_results = [ ] with unify.Context("Results"): logs = unify.create_logs(entries=exam_results) -with unify.Context("Failed"): +with unify.Context("ExtraSupport"): unify.add_logs_to_context( [l.id for l in logs if not l.entries["passed"]] ) @@ -157,6 +172,7 @@ If any logs are then updated inplace, they will be reflected in all contexts. GIF -Note that in this case, `Results` and `Failed` are **disconnected**, and both sets of logs **cannot** be viewed in a single table. +Note that in this case, the `ExtraSupport` and `Failed` contexts are "disconnected", +and therefore both sets of logs **cannot** be viewed in a single table. -To organize data in a single table, you can use **column contexts**, more on those soon! +To organize data hierarchically in a single table, you can use **column contexts**, more on those soon! diff --git a/logging/datasets.mdx b/logging/datasets.mdx index 67e0fefbb..ce6bc78fc 100644 --- a/logging/datasets.mdx +++ b/logging/datasets.mdx @@ -2,40 +2,98 @@ title: 'Datasets' --- -One common use case of contexts is the creation of *datasets*. -Each dataset has it's own context, such that it can easily be viewed in it's own table. - -The `unify.Dataset` class is the best way to interact with your datasets from the Python client. +The `unify.Dataset` class is the best way to manage and create datasets. +Datasets support indexing as well as value-based addition and removal. ```python my_dataset = unify.Dataset([0, 1, 2], name="my_dataset") +my_dataset[0] = -1 +my_dataset += [3, 4, 5] +my_dataset -= [1, 2] +print(my_dataset) +``` ``` +unify.Dataset([-1, 3, 4, 5], name="my_dataset") +``` + +Under the hood, datasets make use of the same logging primitives as `unify.log`. +The only differences is that datasets each have their own context, +which is automatically managed by the `unify.Dataset` class. -You can then upload this to your interface like so. +## Uploading + +You can upload any dataset to your interface like so. ```python my_dataset.upload() ``` -`unify.Dataset` sets the context as `f"Datasets/{name}"`, so your dataset can be viewed at `Datasets/my_dataset`. +The `unify.Dataset` class automatically sets the context as `f"Datasets/{name}"`, so your dataset can be viewed at `Datasets/my_dataset`. -IMG +GIF + + Setting `overwrite=True` will overwrite any existing dataset with the same name +(even if the upstream dataset contained more data not included in the upload). + +```python +my_dataset.upload(overwrite=True) +``` + +## Downloading If your dataset already exists upstream, you can download it like so. ```python -my_dataset = unify.Dataset.from_upstream("my_dataset") +my_dataset = unify.Dataset.download("my_dataset") ``` -You can also add and remove content by value using standard Python operators. +Setting `overwrite=True` will overwrite your local dataset +(even if your local dataset contained more data than is in the download). ```python -my_dataset += [3, 4, 5] -my_dataset -= [0, 1, 2] -my_dataset.upload(overwrite=True) +my_dataset.download(overwrite=True) ``` -Given that logs can be shared across multiple contexts, it's easy to create sub-datasets withou duplication. +## Syncing + +Finally, if you want to sync your local dataset with the upstream version, +to achieve the superset of the upstream and local data, you can do so like so. + +```python +my_dataset.sync() +``` + +UNION + +SUPERSET + +## Duplicates + +By default, the `unify.Dataset` class will **not** allow duplicate values. + +```python +unify.Dataset([0, 1, 2, 2]) +``` +``` +EXCEPTION +``` + +If you would like to allow duplicates, you can override the default behavior by setting the `allow_duplicates` flag when creating the dataset. + +```python +unify.Dataset([0, 1, 2, 2], allow_duplicates=True) +``` + +When `allow_duplicates` is set to `False`, then all upstream entries with identical values to local entries will be assumed to represent the **same entry**, +and any unset log ids of local values will be updated to match the upstream ids. + +If `allow_duplicates` is set to `True`, then any upstream entries with identical values to local entries will simply be added alongside the local entries, +and the log ids of the local entries will be left unchanged. + +If duplicates are not explicitly required for a dataset, +then it's best to leave `allow_duplicates` set to `False`. +Even if duplicates are needed, adding an extra `example_id` column with `allow_duplicates` kept as `False` can be worthwhile, +especially if you're regularly syncing datasets between local and upstream sources. ```python import random diff --git a/logging/overview.mdx b/logging/overview.mdx index 86b4e9315..ebb01e151 100644 --- a/logging/overview.mdx +++ b/logging/overview.mdx @@ -7,7 +7,8 @@ Logs enable you to store **any** kind of data to use in your [interfaces](https: All logging features are supported by the [REST API](https://docs.unify.ai/api-reference), -but our Python client provides more convenient abstractions on top of the basic logging primitives. +but our [Python client](https://docs.unify.ai/python) +provides more convenient abstractions on top of the basic logging primitives. Just `unify.log` your data from the Python client...