Skip to content

Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Notifications You must be signed in to change notification settings

declare-lab/trust-align

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

📣 4/2/25: We have updated our repo structure to hopefully be more user friendly!

📣 31/1/25: We have open-sourced the Trust-Aligned models here!

📣 22/1/25: This paper has been accepted to ICLR 2025!

This repository contains the original implementation of Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse (accepted at ICLR 2025). There are two parts to this repository:

  1. Trust-Align: A preference dataset and framework that aligns LLMs to be more trustworthy, as measured by higher Trust-Score.

  2. Trust-Eval: A framework to evaluate the trustworthiness of inline-cited outputs generated by large language models (LLMs) within the Retrieval-Augmented Generation (RAG) setting.

Paper abstract:

LLMs are an integral part of retrieval-augmented generation (RAG) systems. While many studies focus on evaluating the quality of end-to-end RAG systems, there is a lack of research on understanding the appropriateness of an LLM for the RAG task. Thus, we introduce a new metric, Trust-Score, that provides a holistic evaluation of the trustworthiness of LLMs in an RAG framework. We show that various prompting methods, such as in-context learning, fail to adapt LLMs effectively to the RAG task. Thus, we propose Trust-Align, a framework to align LLMs for higher Trust-Score. LLaMA-3-8b, aligned with our method, significantly outperforms open-source LLMs of comparable sizes on ASQA (↑10.7), QAMPARI (↑29.2), and ELI5 (↑14.9).

Data

The evaluation dataset used in Trust-Eval is available on Trust-Align Huggingface.

The SFT and DPO training dataset used in Trust-Align is also available Trust-Align Huggingface.

Trust-Eval

Trust-Eval quantifies trustworthiness on three main axis using Trust-Score:

  1. Response Correctness: Correctness of the generated claims
  2. Attribution Quality: Quality of citations generated. Concerns the recall (Are generated statements well-supported by the set citations?) and precision (Are the citations relevant to the statements?) of citations.
  3. Refusal Groundedness: Ability of the model to discern if the question can be answered given the documents

Trust-Score

We release Trust-Eval as a standalone package. You can install by following the steps below:

  1. Set up a Python environment

    conda create -n trust_eval python=3.10.13
    conda activate trust_eval
  2. Install dependencies

    pip install trust_eval

    Note: that vLLM will be installed with CUDA 12.1. Please ensure your CUDA setup is compatible.

  3. Set up NLTK

    import nltk
    nltk.download('punkt_tab')

Please refer to Trust-Eval README for more information.

Trust-Align

Trust-Align

Set up

conda create -n cite python=3.10.13
conda activate cite
pip install -r requirements.txt

We use the latest version of alignment-handbook for training (ver alignment-handbook-0.4.0.dev0). We followed the installation instructions on alignment-handbook repository:

git clone https://github.com/huggingface/alignment-handbook.git
cd ./alignment-handbook/
python -m pip install .

Please refer to Trust-Align README for more information.

Bug or Questions?

If you have any questions related to the code or the paper, feel free to email Shang Hong ([email protected]). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

Citation

If you find our code, data, models, or the paper useful, please cite the paper:

@misc{song2024measuringenhancingtrustworthinessllms,
      title={Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse}, 
      author={Maojia Song and Shang Hong Sim and Rishabh Bhardwaj and Hai Leong Chieu and Navonil Majumder and Soujanya Poria},
      year={2024},
      eprint={2409.11242},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.11242}, 
}

About

Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published