Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions docs/release-notes/v2.2.1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## Release v2.2.1
### What's changed

#### Added features:
* add a multi-objective task
* more seamless and robust interfaces between the components

#### Further changes:
* refactor the tutorial
* improve robustness in handling strings & prompt objects
* fees in block tracking and idx subsampling in CAPO

**Full Changelog**: [here](https://github.com/finitearth/promptolution/compare/2.2.0...v2.2.1)
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ nav:
- Home: index.md
- Release Notes:
- Overview: release-notes.md
- v2.2.1: release-notes/v2.2.1.md
- v2.2.0: release-notes/v2.2.0.md
- v2.1.0: release-notes/v2.1.0.md
- v2.0.1: release-notes/v2.0.1.md
Expand Down Expand Up @@ -74,7 +75,7 @@ nav:
- Exemplar Selectors: api/exemplar_selectors.md
- Tutorials:
- Getting Started: examples/getting_started.md
- LLM as Judge Tutorial: examples/llm_as_judge_tutorial.md
- LLM-as-a-Judge Tutorial: examples/llm_as_judge_tutorial.md
- Reward Task Tutorial: examples/reward_task_tutorial.md

markdown_extensions:
Expand Down
2 changes: 1 addition & 1 deletion promptolution/tasks/judge_tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@


class JudgeTask(BaseTask):
"""Task that evaluates a predictor using an LLM as a judge, optionally accepting a ground truth."""
"""Task that evaluates a predictor using an LLM-as-a-judge, optionally accepting a ground truth."""

def __init__(
self,
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "promptolution"
version = "2.2.0"
version = "2.2.1"
description = "A framework for prompt optimization and a zoo of prompt optimization algorithms."
authors = ["Tom Zehle, Moritz Schlager, Timo Heiß"]
readme = "README.md"
Expand Down
16 changes: 7 additions & 9 deletions tests/optimizers/test_capo.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@

from promptolution.optimizers.capo import CAPO
from promptolution.utils.prompt import Prompt
from promptolution.utils.templates import CAPO_CROSSOVER_TEMPLATE, CAPO_MUTATION_TEMPLATE


def test_capo_initialization(mock_meta_llm, mock_predictor, initial_prompts, mock_task, mock_df):
Expand Down Expand Up @@ -195,18 +194,20 @@ def test_capo_crossover_prompt(mock_meta_llm, mock_predictor, initial_prompts, m
meta_llm=mock_meta_llm,
initial_prompts=initial_prompts,
df_few_shots=mock_df,
crossovers_per_iter=1, # Only perform one crossover so we can test the exact prompt
)

import random

random.seed(42)
mother = Prompt("Classify the sentiment of the text.", ["Input: I love this! Output: Positive"])
father = Prompt("Determine if the review is positive or negative.", ["Input: This is terrible. Output: Negative"])
optimizer._crossover([mother, father])

full_task_desc = mock_task.task_description + "\n" + optimizer.predictor.extraction_description

expected_meta_prompt = (
CAPO_CROSSOVER_TEMPLATE.replace("<mother>", mother.instruction)
optimizer.crossover_template.replace("<mother>", mother.instruction)
.replace("<father>", father.instruction)
.replace("<task_desc>", full_task_desc)
.strip()
)

assert str(mock_meta_llm.call_history[0]["prompts"][0]) == expected_meta_prompt
Expand All @@ -221,13 +222,10 @@ def test_capo_mutate_prompt(mock_meta_llm, mock_predictor, initial_prompts, mock
initial_prompts=initial_prompts,
df_few_shots=mock_df,
)
full_task_desc = mock_task.task_description + "\n" + optimizer.predictor.extraction_description

parent = Prompt("Classify the sentiment of the text.", ["Input: I love this! Output: Positive"])
optimizer._mutate([parent])

expected_meta_prompt = CAPO_MUTATION_TEMPLATE.replace("<instruction>", parent.instruction).replace(
"<task_desc>", full_task_desc
)
expected_meta_prompt = optimizer.mutation_template.replace("<instruction>", parent.instruction)

assert mock_meta_llm.call_history[0]["prompts"][0] == expected_meta_prompt
22 changes: 11 additions & 11 deletions tutorials/getting_started.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"\n",
"## Welcome to Promptolution! \n",
"\n",
"Discover a powerful tool for evolving and optimizing your LLM prompts. This notebook provides a friendly introduction to Promptolution's core functionality.\n",
"Discover a powerful tool for evolving and optimizing your LLM prompts. This notebook provides a friendly introduction to Promptolution's core functionality, by showcasing how you can easily find the best prompt to solve a classification problem.\n",
"\n",
"We're excited to have you try Promptolution - let's get started!"
]
Expand Down Expand Up @@ -73,7 +73,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Below, we're using a subsample of the subjectivity dataset from Hugging Face as an example. When using your own dataset, simply ensure you name the input column \"x\" and the target column \"y\", and provide a brief description of your task, that will parsed to the meta-llm during optimization."
"Below, we're using a subsample of the subjectivity dataset from Hugging Face as an example. When using your own dataset, simply ensure you name the input column \"x\" and the target column \"y\", and provide a brief description of your task, that will passed to the meta-llm during optimization."
]
},
{
Expand Down Expand Up @@ -104,7 +104,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We've defined some starter prompts below, but feel free to experiment! You might also want to explore create_prompts_from_samples to automatically generate initial prompts based on your data."
"We've defined some starter prompts below, but you don't need to do this necessarily, since Promptolution can also automatically generates initial prompts based on your data or the provided task description."
]
},
{
Expand Down Expand Up @@ -146,28 +146,28 @@
"1. vLLM backend (for efficient serving of large language models)\n",
"1. API-based LLMs (compatible with any provider following the OpenAI standard)\n",
"\n",
"For this demonstration, we'll use the DeepInfra API, but you can easily switch to other providers like Anthropic or OpenAI by simply changing the base_url and llm string in the configuration."
"For this demonstration, we'll use the DeepInfra API, but you can easily switch to other providers like Anthropic or OpenAI."
]
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"api_key = \"YOUR_API_KEY\" # Replace with your Promptolution API key"
"api_key = \"YOUR_API_KEY\" # Replace with your API key"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's an explanation of each configuration parameter in the ExperimentConfig:\n",
"- `optimizer`: The algorithm used for prompt optimization. Currently we support \"capo\", \"evopromptga\", \"evopromptde\", and \"opro\". For this example, we use \"capo\" as it is capable of leveraging few-shot examples.\n",
"- `task_description`: A string describing the task you're optimizing prompts for. This is used to provide the meta-llm with context about your task.\n",
"Here's an explanation of the most important configuration parameters in the ExperimentConfig:\n",
"- `optimizer`: The algorithm used for prompt optimization. For this example, we use \"capo\" as it is capable of leveraging few-shot examples.\n",
"- `task_description`: A string describing the task you're optimizing prompts for.\n",
"- `prompts`: A list of initial prompt strings that will be used as the starting point for optimization.\n",
"- `n_steps`: The number of optimization steps to run. Higher values allow more exploration and refinement but require more API calls and computational resources.\n",
"- `api_url`: The API endpoint URL used to access the language model. This example uses DeepInfra's API which follows the OpenAI standard.\n",
"- `n_steps`: The number of optimization steps to run.\n",
"- `api_url`: The API endpoint URL used to access the language model.\n",
"- `llm`: The LLM to use for the experiment, as both downstream and meta LLM.\n",
"- `token`: Your API authentication token required to access the language model service."
]
Expand Down
26 changes: 13 additions & 13 deletions tutorials/llm_as_judge_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting Started: LLM as a Judge with Promptolution\n",
"# Getting Started: LLM-as-a-Judge with Promptolution\n",
"\n",
"## Welcome to Promptolution! \n",
"\n",
"Discover a powerful tool for evolving and optimizing your LLM prompts. This notebook provides a friendly introduction to one of Promptolution's most advanced features: LLM as a Judge.\n",
"Discover a powerful tool for evolving and optimizing your LLM prompts. This notebook provides a friendly introduction to one of Promptolution's most advanced features: LLM-as-a-Judge.\n",
"\n",
"While the standard getting_started notebook shows how to optimize for classification tasks, this guide will focus on something different. We'll optimize prompts for a creative task where there's no single \"correct\" answer: *Finding an optimal argument for a statement*!"
]
Expand All @@ -26,7 +26,7 @@
"- The helpfulness of a summary?\n",
"- The persuasiveness of an essay?\n",
"\n",
"This is where LLM as a Judge comes in. Instead of relying on a pre-defined dataset of labels, we use another powerful Language Model (the \"judge\") to score the output of our prompts. The process looks like this:\n",
"This is where LLM-as-a-Judge comes in. Instead of relying on a pre-defined dataset of labels, we use another powerful Language Model (the \"judge\") to score the output of our prompts. The process looks like this:\n",
"\n",
"A candidate prompt is used to generate a response (e.g., an argument).\n",
"A \"judge\" LLM then evaluates this response based on the task provided and assigns a score.\n",
Expand Down Expand Up @@ -111,6 +111,13 @@
"df = pd.read_csv(\"hf://datasets/ibm-research/argument_quality_ranking_30k/dev.csv\").sample(300)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at what we're working with:"
]
},
{
"cell_type": "code",
"execution_count": 18,
Expand Down Expand Up @@ -141,13 +148,6 @@
"Our task: **Given a controversial statement, generate the strongest possible argument supporting that position.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at what we're working with:"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -202,7 +202,7 @@
"metadata": {},
"outputs": [],
"source": [
"api_key = \"YOUR_API_KEY\" # Replace with your Promptolution API key"
"api_key = \"YOUR_API_KEY\" # Replace with your API key"
]
},
{
Expand Down Expand Up @@ -257,9 +257,9 @@
"With everything configured, you're ready to optimize your prompts! The run_experiment function will:\n",
"\n",
"1. Evaluate your initial prompts by generating arguments and having the judge LLM score them\n",
"1. Use evolutionary operators (mutation, crossover) to create new prompt variations from the 1. best-performing ones\n",
"1. Use evolutionary operators (mutation, crossover) to create new prompt variations from the best-performing ones\n",
"1. Test these new prompt candidates and select the fittest ones for the next generation\n",
"1. Repeat this evolutionary process for the specified number of steps, gradually improving prompt 1. quality"
"1. Repeat this evolutionary process for the specified number of steps, gradually improving prompt quality"
]
},
{
Expand Down
37 changes: 7 additions & 30 deletions tutorials/reward_task_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -50,18 +50,9 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\tzehl\\anaconda3\\envs\\d\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
}
],
"outputs": [],
"source": [
"import pandas as pd\n",
"from promptolution.utils import ExperimentConfig\n",
Expand Down Expand Up @@ -147,7 +138,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Here are some starter prompts for JSON extraction. Feel free to experiment with your own approaches!"
"Here is a starter prompts for JSON extraction. Feel free to experiment with your own approaches!"
]
},
{
Expand Down Expand Up @@ -184,7 +175,7 @@
"1. vLLM backend (for efficient serving of large language models)\n",
"1. API-based LLMs (compatible with any provider following the OpenAI standard)\n",
"\n",
"For this demonstration, we'll use the DeepInfra API, but you can easily switch to other providers like Anthropic or OpenAI by simply changing the base_url and llm string in the configuration."
"For this demonstration, we'll use the DeepInfra API, but you can easily switch to other providers like Anthropic or OpenAI."
]
},
{
Expand All @@ -193,21 +184,7 @@
"metadata": {},
"outputs": [],
"source": [
"api_key = \"YOUR_API_KEY\" # Replace with your Promptolution API key"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's an explanation of each configuration parameter in the ExperimentConfig:\n",
"- `optimizer`: The algorithm used for prompt optimization. Currently we support \"capo\", \"evopromptga\", \"evopromptde\", and \"opro\". For this example, we use \"capo\" as it is capable of leveraging few-shot examples.\n",
"- `task_description`: A string describing the task you're optimizing prompts for. This is used to provide the meta-llm with context about your task.\n",
"- `prompts`: A list of initial prompt strings that will be used as the starting point for optimization.\n",
"- `n_steps`: The number of optimization steps to run. Higher values allow more exploration and refinement but require more API calls and computational resources.\n",
"- `api_url`: The API endpoint URL used to access the language model. This example uses DeepInfra's API which follows the OpenAI standard.\n",
"- `llm`: The LLM to use for the experiment, as both downstream and meta LLM.\n",
"- `token`: Your API authentication token required to access the language model service."
"api_key = \"YOUR_API_KEY\" # Replace with your API key"
]
},
{
Expand Down Expand Up @@ -447,7 +424,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "d",
"display_name": "promptolution-t4XIP6Xc-py3.12",
"language": "python",
"name": "python3"
},
Expand All @@ -461,7 +438,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.9"
"version": "3.12.3"
}
},
"nbformat": 4,
Expand Down
Loading