Merge pull request #466 from superlinked/robertdhayanturner-patch-2

robertdhayanturner · web-flow · commit ac58bf202f08 · 2024-08-14T19:43:08.000-04:00
Update rag-application-communication-system.md
diff --git a/docs/articles/rag-application-communication-system.md b/docs/articles/rag-application-communication-system.md
@@ -138,7 +138,7 @@ A good fine-tuning dataset, though it requires a significant amount of careful m
 
 Preparation of the instruction dataset and base model improvement should be your main focus; these have the most impact on performance. I don't spend much time optimizing the training design beyond a few hyperparameters (learning rate, batch size, etc.). I've also generally stopped looking into preference fine-tuning (like DPO); the time spent was not worth the very few improvement points.
 
-While it's far less common, you can also apply this approach - fine-tuning your instruction dataset using RAG-generated synthetic data - [to embedding models](https://huggingface.co/blog/davanstrien/synthetic-similarity-datasets). Synthetic data makes it considerably easier to create an instruction dataset that maps the expected format of the similarity dataset (including queries and “hard negatives”). Fine-tuning your embedding models with synthetic data will confer the same benefits as LLM fine-tuning: cost savings (a much smaller model that demonstrates the same level of performance as a big one) and appropriateness, by bringing the “similarity” score closer to the expectations of your retrieval system.
+While it's far less common, you can also apply this approach (i.e., fine-tuning your instruction dataset using RAG-generated synthetic data) [to embedding models](https://huggingface.co/blog/davanstrien/synthetic-similarity-datasets). Synthetic data makes it considerably easier to create an instruction dataset that maps the expected format of the similarity dataset (including queries and “hard negatives”). Fine-tuning your embedding models with synthetic data will confer the same benefits as LLM fine-tuning: cost savings (a much smaller model that demonstrates the same level of performance as a big one) and appropriateness, by bringing the “similarity” score closer to the expectations of your retrieval system.
 
 ### 4.2 Fine-tuning for robustness