Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
📣 4/2/25: We have updated our repo structure to hopefully be more user friendly!
📣 31/1/25: We have open-sourced the Trust-Aligned models here!
📣 22/1/25: This paper has been accepted to ICLR 2025!
This repository contains the original implementation of Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse (accepted at ICLR 2025). There are two parts to this repository:
-
Trust-Align: A preference dataset and framework that aligns LLMs to be more trustworthy, as measured by higher Trust-Score.
-
Trust-Eval: A framework to evaluate the trustworthiness of inline-cited outputs generated by large language models (LLMs) within the Retrieval-Augmented Generation (RAG) setting.
Paper abstract:
LLMs are an integral part of retrieval-augmented generation (RAG) systems. While many studies focus on evaluating the quality of end-to-end RAG systems, there is a lack of research on understanding the appropriateness of an LLM for the RAG task. Thus, we introduce a new metric, Trust-Score, that provides a holistic evaluation of the trustworthiness of LLMs in an RAG framework. We show that various prompting methods, such as in-context learning, fail to adapt LLMs effectively to the RAG task. Thus, we propose Trust-Align, a framework to align LLMs for higher Trust-Score. LLaMA-3-8b, aligned with our method, significantly outperforms open-source LLMs of comparable sizes on ASQA (↑10.7), QAMPARI (↑29.2), and ELI5 (↑14.9).
The evaluation dataset used in Trust-Eval is available on Trust-Align Huggingface.
The SFT and DPO training dataset used in Trust-Align is also available Trust-Align Huggingface.
Trust-Eval quantifies trustworthiness on three main axis using Trust-Score:
- Response Correctness: Correctness of the generated claims
- Attribution Quality: Quality of citations generated. Concerns the recall (Are generated statements well-supported by the set citations?) and precision (Are the citations relevant to the statements?) of citations.
- Refusal Groundedness: Ability of the model to discern if the question can be answered given the documents
We release Trust-Eval as a standalone package. You can install by following the steps below:
-
Set up a Python environment
conda create -n trust_eval python=3.10.13 conda activate trust_eval
-
Install dependencies
pip install trust_eval
Note: that vLLM will be installed with CUDA 12.1. Please ensure your CUDA setup is compatible.
-
Set up NLTK
import nltk nltk.download('punkt_tab')
Please refer to Trust-Eval README for more information.
conda create -n cite python=3.10.13
conda activate cite
pip install -r requirements.txt
We use the latest version of alignment-handbook
for training (ver alignment-handbook-0.4.0.dev0
). We followed the installation instructions on alignment-handbook repository:
git clone https://github.com/huggingface/alignment-handbook.git
cd ./alignment-handbook/
python -m pip install .
Please refer to Trust-Align README for more information.
If you have any questions related to the code or the paper, feel free to email Shang Hong ([email protected]
). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!
If you find our code, data, models, or the paper useful, please cite the paper:
@misc{song2024measuringenhancingtrustworthinessllms,
title={Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse},
author={Maojia Song and Shang Hong Sim and Rishabh Bhardwaj and Hai Leong Chieu and Navonil Majumder and Soujanya Poria},
year={2024},
eprint={2409.11242},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.11242},
}