Reproducing LLAMA-2 metrics #27

sidhantls · 2024-05-20T13:09:53Z

Hello,

I'm trying to reproduce metrics in Table 1 for LLAMA-2. I did so for GPT-J, and the results are consistent; however, for LLAMA-2 for some reason, the results are not matching. Any idea of why this is happening?

For LLAMA-2, Fever, I get:

Baseline (no laser): 54.98% accuracy. The paper shows 59.3%
With LASER: 54.13% accuracy

Logs:

Baseline (no laser): 54.98% accuracy. The paper shows 59.3%
python intervention_llama2_fever.py --lname dont --rate 8.0 --lnum 30 --home_dir out_data/fever --model_path meta-llama/Llama-2-7b-chat-hf

Main: Msg: Final Performance: Dataset size 13086 0-1 Correctness is 54.98242396454226 percentage, Mean F1 score is None, Mean Log Prob is -1.1887680674259296, top-1 accuracy is 54.82958887360538, top-10 accuracy is 99.99235824545316, top-5 accuracy is 99.92358245453156.

With LASER: 54.13% accuracy
python intervention_llama2_fever.py --lname fc_in --rate 8.0 --lnum 30 --home_dir out_data/fever --model_path meta-llama/Llama-2-7b-chat-hf

Main: Msg: Final Performance: Dataset size 13086 0-1 Correctness is 54.13418920984258 percentage, Mean F1 score is None, Mean Log Prob is -1.2900288283587429, top-1 accuracy is 54.09598043710836, top-10 accuracy is 100.0, top-5 accuracy is 99.91594069998472.

Specs:
Python==3.8
Torch: Version: 1.12.1+cu116

The text was updated successfully, but these errors were encountered:

dkmisra · 2024-05-20T20:10:43Z

My guess is that you are using a different Llama2 7B version. See this issue #18

If you want to use this Llama 2 version, then try to run a grid search over LASER values as the optimal values could be different from the ones in the paper.

sidhantls · 2024-05-20T23:36:18Z

Thank you. Using meta-llama/Llama-2-7b-hf, the results are now consistent for fever - 59.131 (baseline) 65.558 (laser)

sidhantls closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing LLAMA-2 metrics #27

Reproducing LLAMA-2 metrics #27

sidhantls commented May 20, 2024 •

edited

Loading

dkmisra commented May 20, 2024 •

edited

Loading

sidhantls commented May 20, 2024

Reproducing LLAMA-2 metrics #27

Reproducing LLAMA-2 metrics #27

Comments

sidhantls commented May 20, 2024 • edited Loading

dkmisra commented May 20, 2024 • edited Loading

sidhantls commented May 20, 2024

sidhantls commented May 20, 2024 •

edited

Loading

dkmisra commented May 20, 2024 •

edited

Loading