Skip to content

Commit 25a1608

Browse files
authored
Merge branch 'main' into main
2 parents bf3e4f8 + fa11d3b commit 25a1608

File tree

11 files changed

+142680
-5
lines changed

11 files changed

+142680
-5
lines changed

docs/data_AlpacaEval_2/weighted_alpaca_eval_gpt4_turbo_leaderboard.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,13 +36,15 @@ Claude 3 Opus (02/29),40.5095080124761,29.10526953334248,1388,,https://github.co
3636
Infinity-Instruct-7M-Gen-mistral-7B,39.66949964831439,34.347412485016434,1742,https://huggingface.co/BAAI/Infinity-Instruct-7M-Gen-mistral-7B,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Infinity-Instruct-7M-Gen-mistral-7B/model_outputs.json,community
3737
Llama 3.1 405B Instruct,39.25732749961743,39.10666895419877,1988,https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Meta-Llama-3.1-405B-Instruct-Turbo/model_outputs.json,minimal
3838
SPPO-Llama-3-Instruct-8B-PairRM,38.56280663670214,39.67286090605648,2066,https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/SPPO-Llama-3-Instruct-8B-PairRM/model_outputs.json,community
39+
GPO-Llama-3-8B-Instruct-GPM-2B,38.4334071653788,48.87200127423316,2613,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/GPO-Llama-3-8B-Instruct-GPM-2B/model_outputs.json,community
3940
GPT-4,38.12808974440021,23.576789314782605,1365,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4/model_outputs.json,verified
4041
Qwen2 72B Instruct,38.07461345451606,29.8527557752399,1626,https://huggingface.co/Qwen/Qwen2-72B-Instruct,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Qwen2-72B-Instruct/model_outputs.json,verified
4142
Llama 3.1 70B Instruct,38.05512453607286,39.12691443804968,2044,https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Meta-Llama-3.1-70B-Instruct-Turbo/model_outputs.json,minimal
4243
Infinity-Instruct-3M-0625-Llama3-70B,37.97881098506053,24.277231851026183,1294,https://huggingface.co/BAAI/Infinity-Instruct-3M-0625-Llama3-70B,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Infinity-Instruct-3M-0625-Llama3-70B/model_outputs.json,community
4344
Aligner 2B+Qwen1.5 72B Chat,36.725868878524274,31.773037737123104,1812,https://github.com/AlignInc/aligner-replication,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/aligner-2b_qwen1.5-72b-chat/model_outputs.json,community
4445
Qwen1.5 72B Chat,36.571754111987296,26.49828339562733,1549,https://huggingface.co/Qwen/Qwen1.5-72B-Chat,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Qwen1.5-72B-Chat/model_outputs.json,verified
4546
GPT-4 (03/14),35.30706121640206,22.073258928708075,1371,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4_0314/model_outputs.json,verified
47+
SPPO-Llama-3-8B-Instruct-GPM-2B,35.30471134991328,45.44098127183851,2490,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/SPPO-Llama-3-8B-Instruct-GPM-2B/model_outputs.json,community
4648
Ein 70B v0.1,35.029054008520646,24.84472049689441,1467,https://huggingface.co/SF-Foundation/EinBase-70B-v0.1-full,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Ein-70B-v0.1/model_outputs.json,community
4749
Claude 3 Sonnet (02/29),34.87247436243302,25.556325292273296,1420,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude-3-sonnet-20240229/model_outputs.json,minimal
4850
FsfairX-Zephyr-Chat-v0.1,34.78744762311656,35.94648644102434,2275,https://huggingface.co/sfairXC/FsfairX-Zephyr-Chat-v0.1,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/FsfairX-Zephyr-Chat-v0.1/model_outputs.json,community

results/GPO-Llama-3-8B-Instruct-GPM-2B/model_outputs.json

Lines changed: 4832 additions & 0 deletions
Large diffs are not rendered by default.

results/GPO-Llama-3-8B-Instruct-GPM-2B/weighted_alpaca_eval_gpt4_turbo/annotations.json

Lines changed: 67209 additions & 0 deletions
Large diffs are not rendered by default.

results/SPPO-Llama-3-8B-Instruct-GPM-2B/model_outputs.json

Lines changed: 4832 additions & 0 deletions
Large diffs are not rendered by default.

results/SPPO-Llama-3-8B-Instruct-GPM-2B/weighted_alpaca_eval_gpt4_turbo/annotations.json

Lines changed: 65754 additions & 0 deletions
Large diffs are not rendered by default.

src/alpaca_eval/leaderboards/data_AlpacaEval_2/weighted_alpaca_eval_gpt4_turbo_leaderboard.csv

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
,win_rate,standard_error,n_wins,n_wins_base,n_draws,n_total,discrete_win_rate,mode,avg_length,length_controlled_winrate,lc_standard_error
22
NullModel,76.91979180386511,0.9090102449662572,676,129,0,805,83.97515527950311,community,872,86.45780691307947,0.14180005113427935
33
SelfMoA_gemma-2-9b-it-WPO-HB,77.58955217385297,1.231940914887347,640,165,0,805,79.5031055900621,community,3261,78.53928111481099,0.3042788133382446
4-
Shopee-SlimMoA-v1,75.61428659805350,1.2706274059194700,621,184,0,805,77.14285714285720,community,1994,77.4515432873834,0.43017522149239600
5-
blendaxai-gm-l6-vo31,69.11033492869565,1.3280735654354863,562,242,1,805,69.87577639751554,community,1809,76.91981221023656,0.5725365663132986
4+
Shopee-SlimMoA-v1,75.6142865980535,1.27062740591947,621,184,0,805,77.1428571428572,community,1994,77.4515432873834,0.430175221492396
5+
blendaxai-gm-l6-vo31,69.11033492869565,1.3280735654354865,562,242,1,805,69.87577639751554,community,1809,76.91981221023656,0.5725365663132986
66
gemma-2-9b-it-WPO-HB,77.82503168985093,1.2355857177790277,640,163,2,805,79.62732919254658,community,2285,76.72506842726064,0.4242603928637889
7-
SelfMoA_gemma-2-9b-it-SimPO,71.9958856144492,1.3495341826849294,597,208,0,805,74.16149068322981,community,1930,75.04950944068965,0.44287068760098436
7+
SelfMoA_gemma-2-9b-it-SimPO,71.9958856144492,1.3495341826849294,597,208,0,805,74.16149068322981,community,1930,75.04950944068965,0.4428706876009843
88
blendaxai-gm-l3-v35,73.41035740244067,1.254951147343878,607,196,2,805,75.527950310559,community,2186,73.37270365010379,0.6163911450738288
99
gemma-2-9b-it-SimPO,65.86422561532919,1.423459922555078,540,264,1,805,67.14285714285714,community,1833,72.3508446939842,0.5167873784867067
1010
openpipe-moa-gpt-4-turbo-v1,63.15493451236265,1.422980098799326,515,283,7,805,64.40993788819875,community,1856,68.37866250336802,0.7309418614587613
@@ -25,7 +25,7 @@ gpt4_1106_preview_verbose,64.30360147101865,1.3348590089025316,525,268,12,805,65
2525
gpt-4o-mini-2024-07-18,44.65413862507926,1.4572395578449813,350,451,4,805,43.72670807453416,minimal,1861,50.727144855901976,0.8284734951761676
2626
Storm-7B,50.26886905528583,1.4728176780737183,397,408,0,805,49.31677018633541,community,2045,50.45110959343775,
2727
gpt4_1106_preview,50.0,0.0,0,0,805,805,50.0,minimal,2049,50.0,
28-
REBEL-Llama-3-8B-Instruct-Armo,48.43655307668638,1.480341435123528,394,410,1,805,49.006211180124225,community,1965,49.314293536857114,0.7061879308002301
28+
REBEL-Llama-3-8B-Instruct-Armo,48.43655307668638,1.480341435123528,394,410,1,805,49.00621118012423,community,1965,49.31429353685712,0.7061879308002301
2929
Infinity-Instruct-7M-Gen-Llama3_1-70B,37.46327383827497,1.4734130373862548,299,501,5,805,37.453416149068325,community,1654,46.10043331712677,0.822439983375277
3030
Llama-3-Instruct-8B-SimPO-ExPO,40.63285400856655,1.4439449942168028,325,479,1,805,40.43478260869565,community,1765,45.78021783946177,
3131
Llama-3-Instruct-8B-SimPO,40.52977498461182,1.422574464675002,319,485,1,805,39.68944099378882,community,1825,44.65131348921881,0.8800655791760451
@@ -39,13 +39,15 @@ claude-3-opus-20240229,29.10526953334248,1.3941539442369442,223,579,3,805,27.888
3939
Infinity-Instruct-7M-Gen-mistral-7B,34.347412485016434,1.412595625747994,263,541,1,805,32.732919254658384,community,1742,39.66949964831439,0.8048310993594987
4040
Meta-Llama-3.1-405B-Instruct-Turbo,39.10666895419877,1.4335939943941904,305,497,3,805,38.07453416149068,minimal,1988,39.25732749961743,0.9064666759144326
4141
SPPO-Llama-3-Instruct-8B-PairRM,39.67286090605648,1.424722356202499,310,494,1,805,38.57142857142858,community,2066,38.56280663670214,0.8694594533275739
42+
GPO-Llama-3-8B-Instruct-GPM-2B,48.87200127423316,1.4567650924969209,394,411,0,805,48.94409937888199,community,2613,38.4334071653788,0.7965862068931436
4243
gpt4,23.576789314782605,1.275704201206918,179,618,8,805,22.732919254658384,verified,1365,38.12808974440021,
4344
Qwen2-72B-Instruct,29.8527557752399,1.3690032071830978,231,569,5,805,29.006211180124225,verified,1626,38.07461345451606,0.8956826164517345
4445
Meta-Llama-3.1-70B-Instruct-Turbo,39.12691443804968,1.4277422726408466,306,496,3,805,38.19875776397515,minimal,2044,38.05512453607286,0.9009912768416926
4546
Infinity-Instruct-3M-0625-Llama3-70B,24.277231851026183,1.3152941480778837,188,613,4,805,23.60248447204969,community,1294,37.97881098506053,0.8189316873655579
4647
aligner-2b_qwen1.5-72b-chat,31.773037737123104,1.2392772646245978,180,473,152,805,31.801242236024844,community,1812,36.725868878524274,
4748
Qwen1.5-72B-Chat,26.49828339562733,1.304236164893057,201,600,4,805,25.217391304347824,verified,1549,36.571754111987296,
4849
gpt4_0314,22.073258928708075,1.2466725494608204,172,627,6,805,21.73913043478261,verified,1371,35.30706121640206,
50+
SPPO-Llama-3-8B-Instruct-GPM-2B,45.44098127183851,1.4552017482034645,362,443,0,805,44.96894409937888,community,2490,35.30471134991328,0.8108797072336295
4951
Ein-70B-v0.1,24.84472049689441,1.521406431103307,199,604,2,805,24.84472049689441,community,1467,35.029054008520646,
5052
claude-3-sonnet-20240229,25.556325292273296,1.3419811051815638,193,608,4,805,24.22360248447205,minimal,1420,34.87247436243302,
5153
FsfairX-Zephyr-Chat-v0.1,35.94648644102434,1.4410058098036145,285,517,3,805,35.59006211180124,community,2275,34.78744762311656,
@@ -212,4 +214,4 @@ oasst-sft-pythia-12b,1.790114083180124,0.3985580883049341,13,790,2,805,1.7391304
212214
guanaco-13b,3.469596859739131,0.5518606725700214,22,780,3,805,2.919254658385093,verified,1774,3.003787329611614,
213215
guanaco-7b,2.880002266173913,0.5202924149314048,21,783,1,805,2.670807453416149,verified,1364,2.871116813131697,
214216
Qwen1.5-1.8B-Chat,3.70555681579365,0.5811750995496215,27,774,3,804,3.544776119402985,verified,2673,2.588498849185137,
215-
baichuan-13b-chat,1.9921455615279504,0.4176985079331233,14,790,1,805,1.8012422360248446,community,1727,2.062170253598568,
217+
baichuan-13b-chat,1.9921455615279504,0.4176985079331233,14,790,1,805,1.8012422360248446,community,1727,2.062170253598568,

src/alpaca_eval/metrics/weights/weighted_alpaca_eval_gpt4_turbo/length_controlled_v1/baseline_gpt4_1106_preview.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -190,3 +190,5 @@ REBEL-Llama-3-8B-Instruct-Armo,-1.0427168605260002,0.6464073051877255,0.03951910
190190
SelfMoA_gemma-2-9b-it-SimPO,-0.8425253084188749,0.5482697859900880,1.2874783673834935
191191
SelfMoA_gemma-2-9b-it-WPO-HB,0.2523363342614252,0.3970191588440620,1.4137351138484051
192192
NullModel,-1.0518971527519405,0.2538445948493148,1.9057926500734572
193+
GPO-Llama-3-8B-Instruct-GPM-2B,-1.1688688988236986,0.7678817822697138,-0.4997466376902971
194+
SPPO-Llama-3-8B-Instruct-GPM-2B,-1.2289746990068291,0.8046474033904255,-0.6767509934260389
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
GPO-Llama-3-8B-Instruct-GPM-2B:
2+
completions_kwargs:
3+
batch_size: 900
4+
max_new_tokens: 4096
5+
model_kwargs:
6+
dtype: bfloat16
7+
model_name: "general-preference/GPO-Llama-3-8B-Instruct-GPM-2B"
8+
stop_token_ids:
9+
- 128001
10+
- 128009
11+
temperature: 0.9
12+
top_p: 1.0
13+
use_beam_search: false
14+
fn_completions: vllm_local_completions
15+
pretty_name: GPO-Llama-3-8B-Instruct-GPM-2B
16+
prompt_template: GPO-Llama-3-8B-Instruct-GPM-2B/prompt.txt
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
2+
3+
{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
4+
5+
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
SPPO-Llama-3-8B-Instruct-GPM-2B:
2+
completions_kwargs:
3+
batch_size: 900
4+
max_new_tokens: 4096
5+
model_kwargs:
6+
dtype: bfloat16
7+
model_name: "general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B"
8+
stop_token_ids:
9+
- 128001
10+
- 128009
11+
temperature: 0.9
12+
top_p: 1.0
13+
use_beam_search: false
14+
fn_completions: vllm_local_completions
15+
pretty_name: SPPO-Llama-3-8B-Instruct-GPM-2B
16+
prompt_template: SPPO-Llama-3-8B-Instruct-GPM-2B/prompt.txt
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
2+
3+
{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
4+
5+

0 commit comments

Comments
 (0)