Skip to content
This repository was archived by the owner on Jan 10, 2025. It is now read-only.

Commit

Permalink
Merge pull request #16 from refuel-ai/order_by_docs
Browse files Browse the repository at this point in the history
Update docs for sorting with SDK
  • Loading branch information
Abhinav-Naikawadi authored Jan 18, 2024
2 parents e2507bc + 14d9c4a commit 5dcbe16
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 9 deletions.
2 changes: 1 addition & 1 deletion autolabel
Submodule autolabel updated 54 files
+1 −1 .pre-commit-config.yaml
+9 −0 README.md
+ docs/assets/refuel_llm_performance.png
+170 −108 docs/guide/llms/llms.md
+21 −13 docs/index.md
+35 −0 examples/figure_extraction/config_figure_extraction.json
+78 −0 examples/figure_extraction/data_cleanup.ipynb
+432 −0 examples/figure_extraction/example_figure_extraction.ipynb
+17 −0 examples/multimodal_science_qa/config_multimodal_sciq.json
+396 −0 examples/multimodal_science_qa/example_multimodal_sciq.ipynb
+479 −0 examples/painting-style-classification/image_classification.ipynb
+21 −0 examples/painting-style-classification/image_classification.json
+7 −6 pyproject.toml
+2 −0 src/autolabel/cache/__init__.py
+4 −4 src/autolabel/cache/redis_cache.py
+56 −0 src/autolabel/cache/redis_transform_cache.py
+49 −0 src/autolabel/cache/sqlalchemy_confidence_cache.py
+5 −3 src/autolabel/cache/sqlalchemy_generation_cache.py
+127 −36 src/autolabel/confidence.py
+33 −0 src/autolabel/configs/config.py
+2 −1 src/autolabel/configs/schema.py
+1 −0 src/autolabel/data_models/__init__.py
+74 −0 src/autolabel/data_models/confidence_cache.py
+5 −2 src/autolabel/data_models/generation_cache.py
+24 −10 src/autolabel/dataset/dataset.py
+149 −12 src/autolabel/labeler.py
+3 −1 src/autolabel/metrics/auroc.py
+4 −0 src/autolabel/models/__init__.py
+9 −1 src/autolabel/models/anthropic.py
+18 −5 src/autolabel/models/base.py
+9 −1 src/autolabel/models/cohere.py
+15 −8 src/autolabel/models/hf_pipeline.py
+155 −0 src/autolabel/models/hf_pipeline_vision.py
+60 −12 src/autolabel/models/openai.py
+167 −0 src/autolabel/models/openai_vision.py
+30 −8 src/autolabel/models/palm.py
+34 −6 src/autolabel/models/refuel.py
+61 −5 src/autolabel/schema.py
+64 −18 src/autolabel/tasks/attribute_extraction.py
+76 −28 src/autolabel/tasks/base.py
+29 −10 src/autolabel/tasks/classification.py
+30 −6 src/autolabel/tasks/entity_matching.py
+29 −6 src/autolabel/tasks/multilabel_classification.py
+37 −10 src/autolabel/tasks/named_entity_recognition.py
+45 −9 src/autolabel/tasks/question_answering.py
+3 −1 src/autolabel/transforms/__init__.py
+3 −1 src/autolabel/transforms/schema.py
+63 −12 src/autolabel/transforms/serp_api.py
+131 −0 src/autolabel/transforms/serper_api.py
+3 −0 src/autolabel/utils.py
+100 −0 tests/assets/banking/config_banking_gpt4V.json
+104 −42 tests/unit/llm_test.py
+54 −18 tests/unit/transforms/test_serp_api.py
+84 −0 tests/unit/transforms/test_serper_api.py
29 changes: 21 additions & 8 deletions docs/python-sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,23 +182,36 @@ items = refuel_client.get_items(

#### Applying sort ordering when querying items

By default, the API will use Refuel’s sort order (by decreasing order of diversity). But you can specify any other column in the dataset that you would lille to sort by, when querying for items in the dataset:
By default, the API will use Refuel’s sort order (by decreasing order of diversity). You can use the `order_by` param to sort by any other columns in the dataset or by the label or confidence score from a labeling task.

1) Sort by dataset column

```python
items = refuel_client.get_items(
dataset='<DATASET NAME>',
max_items=100,
order_by=[{'field': '<COLUMN NAME TO SORT BY>', 'direction': '<ASC or DESC>'}],
)
```

2) Sort by label or confidence score from a labeling task. Note that this requires a task name and a subtask name to be specified. `field` can be either 'label' or 'confidence'.

```python
items = refuel_client.get_items(
dataset='<DATASET NAME>',
task='<LABELING TASK NAME>',
max_items=100,
order_by='<COLUMN NAME TO SORT BY>',
order_direction='ASC'
order_by=[{'field': 'confidence', 'direction': '<ASC or DESC>', 'subtask': '<SUBTASK NAME>'}],
)
```

Some details about sorting related function parameters:
You may have multiple dicts in the `order_by` list if you would like to sort by multiple columns (used in the case of ties). Some details about the keys for each dict in the `order_by` list:

| Option | Is Required | Default Value | Comments |
| :--------------- | :-----------| :-----------------------| :------- |
| `order_by` | No | Refuel’s default sort (by diversity) | Name of the dataset you want to query and retrieve items (rows) from |
| `order_direction` | No | 100 | Valid values: ASC or DESC |
| Key | Is Required | Default Value | Description | Comments |
| :--------------- | :-----------| :-----------------------|:-----------------------| :------- |
| `field` | Yes | | The name of the column in the dataset to sort by | In addition to the columns in the dataset, the field can also be 'label' or 'confidence', if the task and subtask names are specified. |
| `direction` | No | `ASC` | The direction that you would like to sort the specified column by | Should be `ASC` or `DESC` |
| `subtask` | No | null | The name of the subtask for which you would like to sort by label or confidence | This should only be provided if the field is 'label' or 'confidence' and requires a task name to be specified in the function params. |

#### Applying filters when querying items

Expand Down

0 comments on commit 5dcbe16

Please sign in to comment.