From 3212e99cc8655d471a5a9dd1cffe204cb33233fa Mon Sep 17 00:00:00 2001
From: tje <tje541@gmail.com>
Date: Wed, 28 Aug 2024 11:38:21 +0100
Subject: [PATCH] datasets examples in shell/python

---
 benchmarking/datasets.mdx | 72 +++++++++++++++++++++------------------
 1 file changed, 38 insertions(+), 34 deletions(-)

diff --git a/benchmarking/datasets.mdx b/benchmarking/datasets.mdx
index 583d2e202..7247f9dcd 100644
--- a/benchmarking/datasets.mdx
+++ b/benchmarking/datasets.mdx
@@ -52,10 +52,12 @@ This data is especially important, as it represents the *true distribution* obse
 before deployment.
 
 It's easy to extract any prompt queries previously made to the API,
-via the [X]() endpoint, as explained [here]().
-For example, the last 100 prompts for subject `Y` can be extracted as follows:
+via the [`prompt_history`](benchmarks/get_prompt_history) endpoint, as explained [here]().
+For example, the last 100 prompts with the tag `physics` can be extracted as follows:
 
-CODE
+```python
+phyiscs_prompts = client.prompt_history(tag="phyiscs", limit=100)
+```
 
 We can then add this to the local `.jsonl` file as follows:
 
@@ -64,19 +66,19 @@ CODE
 ## Uploading Datasets
 
 As shown above, the representation for prompt datasets is `.jsonl`,
-which is a effectively a list of json structures (or in Python, a list of dicts).
+which is a file format where each line is a json object (or in Python, a list of dicts).
 
 Lets upload our `english_language.jsonl` dataset.
 
 We can do this via the REST API as follows:
 
-```
-import requests
-url = "https://api.unify.ai/v0/dataset"
-headers = {"Authorization": "Bearer $UNIFY_API_KEY",}
-data = {"name": "english_language"}
-files = {"file": open('/path/to/english_language.jsonl' ,'rb')}
-response = requests.post(url, data=data, files=files, headers=headers)
+```shell
+curl --request POST \
+  --url 'https://api.unify.ai/v0/dataset' \
+  --header 'Authorization: Bearer <UNIFY_KEY>' \
+  --header 'Content-Type: multipart/form-data' \
+  --form 'file=@english_language.jsonl'\
+  --form 'name=english_language'
 ```
 
 Or we can create a `Dataset` instance in Python,
@@ -90,31 +92,32 @@ We can delete the dataset just as easily as we created it.
 
 First, using the REST API:
 
-```
-import requests
-url = "https://api.unify.ai/v0/dataset"
-headers = {"Authorization": "Bearer $UNIFY_API_KEY"}
-data = {"name": "english_language"}
-response = requests.delete(url, params=data, headers=headers)
+```shell
+curl --request DELETE \
+  --url 'https://api.unify.ai/v0/dataset?name=english_language' \
+  --header 'Authorization: Bearer <UNIFY_KEY>'
+
 ```
 
 Or via Python:
 
-CODE
-
+```python
+client.datasets.delete(name="english_language")
+```
 ## Listing Datasets
 
 We can retrieve a list of our uploaded datasets using the `/dataset/list` endpoint.
 
+```shell
+curl --request GET \
+  --url 'https://api.unify.ai/v0/dataset/list' \
+  --header 'Authorization: Bearer <UNIFY_KEY>'
 ```
-import requests
-url = "https://api.unify.ai/v0/dataset/list"
-headers = {"Authorization": "Bearer $UNIFY_API_KEY"}
-response = requests.get(url, headers=headers)
-print(response.text)
-```
-
 
+```python
+datasets = client.datasets.list()
+print(datasets)
+```
 
 
 ## Renaming Datasets
@@ -126,23 +129,24 @@ and `english language`.
 We can easily rename the dataset without deleting and re-uploading,
 via the following REST API command:
 
-```
-import requests
-url = "https://api.unify.ai/v0/dataset/rename"
-headers = {"Authorization": "Bearer $UNIFY_API_KEY"}
-data = {"name": "english", "new_name": "english_literature"}
-response = requests.post(url, params=data, headers=headers)
+```shell
+curl --request POST \
+  --url 'https://api.unify.ai/v0/dataset/rename?name=english&new_name=english_literature' \
+  --header 'Authorization: Bearer $UNIFY_KEY'
+
 ```
 
 Or via Python:
 
-CODE
+```python
+client.datasets.rename(name="english", new_name="english_literature")
+```
 
 ## Appending to Datasets
 
 As explained above, we might want to add to an existing dataset, either because we have
 [generated some synthetic examples](), or perhaps because we have some relevant
-[production traffic]().
+[production traffic](datasets#production-data).
 
 In the examples above, we simply appended to these datasets locally,
 before then uploading the full `.jsonl` file. However,