Skip to content

Commit 00ac27d

Browse files
Blog: Add post on leveraging Katib for efficient RAG optimization.
Signed-off-by: Varsha Prasad Narsing <[email protected]>
1 parent 1f1ad0b commit 00ac27d

File tree

3 files changed

+224
-0
lines changed

3 files changed

+224
-0
lines changed
Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
---
2+
toc: true
3+
layout: post
4+
categories: [katib]
5+
description: "Leveraging Katib for efficient RAG optimization."
6+
comments: true
7+
title: "Optimizing RAG Pipelines with Katib: Hyperparameter Tuning for Better Retrieval & Generation"
8+
author: "Varsha Prasad Narsing" (@varshaprasad96)
9+
---
10+
11+
# Introduction
12+
13+
As machine learning models become more sophisticated, optimising their performance remains a critical challenge. Kubeflow provides a robust component, [KATIB][Katib], designed for hyperparameter optimization and neural architecture search. As a part of the Kubeflow ecosystem, KATIB enables scalable, automated tuning of underlying machine learning models, reducing the manual effort required for parameter selection while improving model performance across diverse ML workflows.
14+
15+
With Retrieval-Augmented Generation ([RAG][rag]) becoming an increasingly popular approach for improving search and retrieval quality, optimizing its parameters is essential to achieving high-quality results. RAG pipelines involve multiple hyperparameters that influence retrieval accuracy, hallucination reduction, and language generation quality. In this blog, we will explore how KATIB can be leveraged to fine-tune a RAG pipeline, ensuring optimal performance by systematically adjusting key hyperparameters.
16+
17+
# Let's Get Started!
18+
19+
## STEP 1: Setup
20+
21+
Since compute resources are scarcer than a perfectly labeled dataset :), we’ll use a lightweight KinD cluster to run this example locally. Rest assured, this setup can seamlessly scale to larger clusters by increasing the dataset size and the number of hyperparameters to tune.
22+
23+
To get started, we'll first install the KATIB controller in our cluster by following the steps outlined [here][katib_installation].
24+
25+
## STEP 2: Implementing RAG pipeline
26+
27+
In this implementation, we use a retriever model to fetch relevant documents based on a query and a generator model to produce coherent text responses.
28+
29+
### Implementation Details:
30+
31+
1. Retriever: Sentence Transformer & FAISS Index
32+
* A SentenceTransformer model (paraphrase-MiniLM-L6-v2) encodes predefined documents into vector representations.
33+
* FAISS is used to index these document embeddings and perform efficient similarity searches to retrieve the most relevant documents.
34+
2. Generator: Pre-trained GPT-2 Model
35+
* A Hugging Face GPT-2 text generation pipeline (which can be replaced with any other model) is used to generate responses based on the retrieved documents. I chose GPT-2 for this example as it is lightweight enough to run on my local machine while still generating coherent responses.
36+
3. Query Processing & Response Generation
37+
* When a query is submitted, the retriever encodes it and searches the FAISS index for the top-k most similar documents.
38+
* These retrieved documents are concatenated to form a context, which is then passed to the GPT-2 model to generate a response.
39+
4. Evaluation: [BLEU][bleu] (Bilingual Evaluation Understudy) Score Calculation
40+
* To assess the quality of generated responses, we use the BLEU score, a popular metric for evaluating text generation.
41+
* The evaluate function takes a query, retrieves documents, generates a response, and compares it against a ground-truth reference to compute a BLEU score with smoothing functions from the nltk library.
42+
43+
Below is the script implementing this RAG pipeline:
44+
45+
```python
46+
from sentence_transformers import SentenceTransformer
47+
from transformers import pipeline
48+
import numpy as np
49+
import faiss
50+
51+
# Retriever: Pre-trained SentenceTransformer
52+
retriever_model = SentenceTransformer('paraphrase-MiniLM-L6-v2')
53+
54+
documents = fetch_documents()
55+
doc_embeddings = retriever_model.encode(documents)
56+
57+
# Build FAISS index
58+
index = faiss.IndexFlatL2(doc_embeddings.shape[1])
59+
index.add(np.array(doc_embeddings))
60+
61+
# Generator: Pre-trained GPT-2
62+
generator = pipeline("text-generation", model="gpt2", tokenizer="gpt2")
63+
64+
def rag_pipeline_execute(query, top_k=1, temperature=1.0):
65+
query_embedding = retriever_model.encode([query])
66+
distances, indices = index.search(query_embedding, top_k)
67+
retrieved_docs = [documents[i] for i in indices[0]]
68+
69+
context = " ".join(retrieved_docs)
70+
generated = generator(context, max_length=50, temperature=temperature, num_return_sequences=1)
71+
return generated[0]['generated_text']
72+
73+
def fetch_documents():
74+
return [
75+
# Retun the list of documents...
76+
]
77+
```
78+
79+
#### Evaluating the Pipeline - BLEU Score Calculation
80+
The `evaluate` function measures the quality of generated text using the BLEU score. This function runs the RAG pipeline for a given query, compares the output against a reference answer, and computes the BLEU score with smoothing techniques.
81+
82+
```python
83+
import argparse
84+
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
85+
from rag import rag_pipeline_execute
86+
87+
# Simulated RAG pipeline (simplified for example)
88+
def rag_pipeline(query, top_k, temperature):
89+
return rag_pipeline_execute(query, top_k=top_k, temperature=temperature)
90+
91+
# Evaluate BLEU score
92+
def evaluate(query, ground_truth, top_k, temperature):
93+
# Get the RAG pipeline response
94+
response = rag_pipeline(query, top_k, temperature)
95+
96+
# Tokenize the response and ground truth
97+
reference = [ground_truth.split()] # Reference should be a list of tokens
98+
candidate = response.split() # Candidate is the generated response tokens
99+
100+
# Apply smoothing to the BLEU score
101+
smoothie = SmoothingFunction().method1 # Use method1 for smoothing
102+
bleu_score = sentence_bleu(reference, candidate, smoothing_function=smoothie)
103+
104+
return bleu_score
105+
106+
107+
if __name__ == "__main__":
108+
# Parse command-line arguments
109+
parser = argparse.ArgumentParser(description="Evaluate BLEU score for a query using RAG pipeline")
110+
parser.add_argument("--top_k", type=int, required=True, help="Number of top documents to retrieve")
111+
parser.add_argument("--temperature", type=float, required=True, help="Temperature for the generator")
112+
args = parser.parse_args()
113+
114+
#TODO: The quries and ground truth against which the BLEU score needs to be evaluated. They can be provided in the script below or loaded from an external volume.
115+
query = ""
116+
ground_truth = " "
117+
118+
# Call evaluate with arguments from the command line
119+
bleu_score = evaluate(query, ground_truth, args.top_k, args.temperature)
120+
print(f"BLEU={bleu_score}")
121+
```
122+
*Note*: Make sure to return the result in the format of `<parameter>=<value>` for Katib's metrics collector to be able to utilize it. More ways to configure the output are available in [Kubeflow Metrics Collector][Katib_metrics_collector] guide.
123+
124+
The above scripts will be used to build our `rag-pipeline` image, that will be specified in the `Experiment`.
125+
126+
## STEP 3: Run a KATIB Experiment
127+
128+
To optimize the RAG pipeline’s hyperparameters, let's define an `Experiment` CR. An experiment will define a single optimization run. More details on this API is available in the [documentation][katib_api].
129+
130+
```yaml
131+
apiVersion: "kubeflow.org/v1beta1"
132+
kind: Experiment
133+
metadata:
134+
name: rag-tuning-experiment
135+
namespace: kubeflow
136+
spec:
137+
objective:
138+
type: maximize # Ensures that Katib tries to maximize the BLEU score.
139+
goal: 0.8 # Sets the desired BLEU score threshold.
140+
objectiveMetricName: BLEU # This should match with what we have provided in the script.
141+
algorithm:
142+
algorithmName: grid
143+
parameters:
144+
- name: top_k # Controls the number of retrieved documents.
145+
parameterType: int
146+
feasibleSpace:
147+
min: "1"
148+
max: "5"
149+
step: "1" # Adding a step for discrete search
150+
- name: temperature # Influences randomness in text generation.
151+
parameterType: double
152+
feasibleSpace:
153+
min: "0.5"
154+
max: "1.0"
155+
step: "0.1" # Adding a step for temperature
156+
metricsCollectorSpec: # Specifies how Katib collects experiment results.
157+
collector:
158+
kind: StdOut # Tells Katib to extract metrics from standard output logs.
159+
trialTemplate:
160+
primaryContainerName: training-container
161+
trialParameters: # Map hyperparameters (top_k and temperature) to their respective references in the job spec.
162+
- name: top_k
163+
description: Number of top documents to retrieve
164+
reference: top_k
165+
- name: temperature
166+
description: Temperature for text generation
167+
reference: temperature
168+
trialSpec:
169+
apiVersion: batch/v1
170+
kind: Job
171+
spec:
172+
template:
173+
spec:
174+
containers:
175+
- name: training-container
176+
image: rag-pipeline:latest
177+
command:
178+
- "python"
179+
- "/app/optimization-script.py"
180+
- "--top_k=${trialParameters.top_k}"
181+
- "--temperature=${trialParameters.temperature}"
182+
resources:
183+
limits:
184+
cpu: "1"
185+
memory: "2Gi"
186+
restartPolicy: Never
187+
```
188+
189+
On applying this yaml on cluster, we can see our optimization script in action.
190+
191+
```commandline
192+
kubectl get experiments.kubeflow.org -n kubeflow
193+
NAME TYPE STATUS AGE
194+
rag-tuning-experiment Running True 10m
195+
```
196+
197+
We can also see the trials being run to search for the optimized parameter:
198+
199+
```commandline
200+
kubectl get trials --all-namespaces
201+
NAMESPACE NAME TYPE STATUS AGE
202+
kubeflow rag-tuning-experiment-7wskq9b9 Running True 10m
203+
kubeflow rag-tuning-experiment-cll6bt4z Running True 10m
204+
kubeflow rag-tuning-experiment-hzxrzq2t Running True 10m
205+
```
206+
207+
The list of completed trials and their results will be shown in the UI as below. Steps to access Katib UI are available [here][katib_ui].:
208+
209+
![completed_runs](../images/2025-02-21-katib-rag-optimization/katib_experiment_run.jpeg)
210+
![trial details](../images/2025-02-21-katib-rag-optimization/katib_ui.jpeg)
211+
212+
# Conclusion
213+
214+
In this experiment, we leveraged Kubeflow Katib to optimize a Retrieval-Augmented Generation (RAG) pipeline, systematically tuning key hyperparameters like top_k and temperature to enhance retrieval precision and generative response quality.
215+
216+
For anyone working with RAG systems or hyperparameter optimization, Katib is a game-changer—enabling scalable, efficient, and intelligent tuning of machine learning models! In this experiment. Hope this helps you streamline hyperparameter tuning and unlock new efficiencies in your ML workflows!
217+
218+
[Katib]: https://www.kubeflow.org/docs/components/katib/
219+
[rag]: https://en.wikipedia.org/wiki/Retrieval-augmented_generation
220+
[katib_installation]: https://www.kubeflow.org/docs/components/katib/installation/
221+
[bleu]: https://huggingface.co/spaces/evaluate-metric/bleu
222+
[Katib_metrics_collector]: https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/#pull-based-metrics-collector
223+
[katib_api]: https://www.kubeflow.org/docs/components/katib/reference/architecture/#experiment
224+
[katib_ui]: https://www.kubeflow.org/docs/components/katib/user-guides/katib-ui/
Loading
Loading

0 commit comments

Comments
 (0)