Skip to content

Commit e280641

Browse files
Type59proType59Gold
and
Type59Gold
authored
Enzyme evals (#19)
Co-authored-by: Type59Gold <[email protected]>
1 parent 7f01360 commit e280641

File tree

9 files changed

+44
-61
lines changed

9 files changed

+44
-61
lines changed

evals/registry/data/00_scipaper_enzyme_activate_compound/samples.jsonl

Lines changed: 0 additions & 3 deletions
This file was deleted.

evals/registry/data/00_scipaper_enzyme_inhibitor/samples.jsonl

Lines changed: 0 additions & 3 deletions
This file was deleted.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{"file_name": "../uni-finder/enzyme/km/paper/10.1007_s00425-014-2102-6.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1007_s00425-014-2102-6.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1007_s00425-014-2102-6.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1007_s00425-014-2102-6.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
2+
{"file_name": "../uni-finder/enzyme/km/paper/10.1007_s10725-019-00528-9.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1007_s10725-019-00528-9.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1007_s10725-019-00528-9.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1007_s10725-019-00528-9.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
3+
{"file_name": "../uni-finder/enzyme/km/paper/10.1016_j.bbrep.2016.11.003.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_j.bbrep.2016.11.003.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1016_j.bbrep.2016.11.003.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_j.bbrep.2016.11.003.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
4+
{"file_name": "../uni-finder/enzyme/km/paper/10.1016_s0005-2728__97__00090-x.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0005-2728__97__00090-x.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1016_s0005-2728__97__00090-x.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0005-2728__97__00090-x.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
5+
{"file_name": "../uni-finder/enzyme/km/paper/10.1016_s0021-9258__18__96277-0.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0021-9258__18__96277-0.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1016_s0021-9258__18__96277-0.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0021-9258__18__96277-0.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
6+
{"file_name": "../uni-finder/enzyme/km/paper/10.1016_s0021-9258__18__96427-6.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0021-9258__18__96427-6.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1016_s0021-9258__18__96427-6.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0021-9258__18__96427-6.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
7+
{"file_name": "../uni-finder/enzyme/km/paper/10.1016_S0076-6879__75__41082-5.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_S0076-6879__75__41082-5.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1016_S0076-6879__75__41082-5.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_S0076-6879__75__41082-5.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
8+
{"file_name": "../uni-finder/enzyme/km/paper/10.1016_s0141-8130__01__00188-x.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0141-8130__01__00188-x.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1016_s0141-8130__01__00188-x.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1016_s0141-8130__01__00188-x.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
9+
{"file_name": "../uni-finder/enzyme/km/paper/10.1021_acs.biochem.6b00536.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1021_acs.biochem.6b00536.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1021_acs.biochem.6b00536.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1021_acs.biochem.6b00536.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
10+
{"file_name": "../uni-finder/enzyme/km/paper/10.1080_09168451.2020.1751582.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1080_09168451.2020.1751582.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1080_09168451.2020.1751582.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1080_09168451.2020.1751582.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
11+
{"file_name": "../uni-finder/enzyme/km/paper/10.1080_09168451.2020.1799749.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1080_09168451.2020.1799749.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1080_09168451.2020.1799749.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1080_09168451.2020.1799749.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
12+
{"file_name": "../uni-finder/enzyme/km/paper/10.1104_pp.19.01225.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1104_pp.19.01225.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1104_pp.19.01225.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1104_pp.19.01225.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
13+
{"file_name": "../uni-finder/enzyme/km/paper/10.1139_b07-081.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1139_b07-081.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/10.1139_b07-081.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/10.1139_b07-081.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
14+
{"file_name": "../uni-finder/enzyme/km/paper/j.1432-1033.1986.tb09548.x.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/j.1432-1033.1986.tb09548.x.pdf", "answerfile_name": "../uni-finder/enzyme/km/answer/j.1432-1033.1986.tb09548.x.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/j.1432-1033.1986.tb09548.x.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}

evals/registry/data/00_scipaper_enzyme_localization/samples.jsonl

Lines changed: 0 additions & 3 deletions
This file was deleted.
Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,14 @@
1-
version https://git-lfs.github.com/spec/v1
2-
oid sha256:6316846852a855013f98ee678e945582013c1269fcad311c8e933859ade77c68
3-
size 1919
1+
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1007_s00425-014-2102-6.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1007_s00425-014-2102-6.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1007_s00425-014-2102-6.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1007_s00425-014-2102-6.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
2+
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1007_s10725-019-00528-9.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1007_s10725-019-00528-9.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1007_s10725-019-00528-9.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1007_s10725-019-00528-9.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
3+
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1007_s11103-006-0040-9.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1007_s11103-006-0040-9.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1007_s11103-006-0040-9.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1007_s11103-006-0040-9.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
4+
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1016_j.bbrep.2016.11.003.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_j.bbrep.2016.11.003.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1016_j.bbrep.2016.11.003.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_j.bbrep.2016.11.003.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
5+
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1016_s0005-2728__97__00090-x.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_s0005-2728__97__00090-x.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1016_s0005-2728__97__00090-x.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_s0005-2728__97__00090-x.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
6+
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1016_s0021-9258__18__96277-0.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_s0021-9258__18__96277-0.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1016_s0021-9258__18__96277-0.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_s0021-9258__18__96277-0.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
7+
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1016_s0021-9258__18__96427-6.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_s0021-9258__18__96427-6.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1016_s0021-9258__18__96427-6.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_s0021-9258__18__96427-6.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
8+
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1016_S0076-6879__75__41082-5.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_S0076-6879__75__41082-5.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1016_S0076-6879__75__41082-5.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1016_S0076-6879__75__41082-5.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
9+
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1021_acs.biochem.6b00536.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1021_acs.biochem.6b00536.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1021_acs.biochem.6b00536.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1021_acs.biochem.6b00536.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
10+
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1080_09168451.2020.1751582.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1080_09168451.2020.1751582.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1080_09168451.2020.1751582.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1080_09168451.2020.1751582.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
11+
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1080_09168451.2020.1799749.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1080_09168451.2020.1799749.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1080_09168451.2020.1799749.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1080_09168451.2020.1799749.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
12+
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1104_pp.19.01225.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1104_pp.19.01225.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1104_pp.19.01225.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1104_pp.19.01225.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
13+
{"file_name": "../uni-finder/enzyme/substrate/paper/s_10.1139_b07-081.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1139_b07-081.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_10.1139_b07-081.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_10.1139_b07-081.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
14+
{"file_name": "../uni-finder/enzyme/substrate/paper/s_j.1432-1033.1986.tb09548.x.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_j.1432-1033.1986.tb09548.x.pdf", "answerfile_name": "../uni-finder/enzyme/substrate/answer/s_j.1432-1033.1986.tb09548.x.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/s_j.1432-1033.1986.tb09548.x.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}

evals/registry/evals/00_scipaper_enzyme_activate_compound.yaml

Lines changed: 0 additions & 18 deletions
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,19 @@
1-
scipaper_enzyme_inhibitor:
2-
id: scipaper_enzyme_inhibitor.val.csv
1+
scipaper_enzyme_km:
2+
id: scipaper_enzyme_km.val.csv
33
metrics: [accuracy]
44

5-
scipaper_enzyme_inhibitor.val.csv:
5+
scipaper_enzyme_km.val.csv:
66
class: evals.elsuite.rag_table_extract:TableExtract
77
args:
8-
samples_jsonl: 00_scipaper_enzyme_inhibitor/samples.jsonl
8+
samples_jsonl: 00_scipaper_enzyme_km/samples.jsonl
99
instructions: |
10-
Please give a complete list of Inhibitor, Commentand Organism of all substrates in the paper. Usually the substrates' tags are numbers or IUPAC names.
10+
Please give a complete list of Substrate, Commentand Organism of all substrates in the paper. Usually the substrates' tags are numbers or IUPAC names.
1111
1. Output in csv format, write units not in header but in the value like "10.5 µM". Quote the value if it has comma! For example:
1212
```csv
13-
Inhibitor,Comment,Organism
14-
ATP,"competitive inhibition of verapamil-dependent ATPase-activity",Homo sapiens
15-
p-xylene,"11.4 mM, slight inhibitor",Bos taurus
16-
NH4+, 0.002 mM,Bos taurus
13+
Substrate,Comment,Organism,Km Value
14+
ATP,"competitive inhibition of verapamil-dependent ATPase-activity",Homo sapiens, 3.5 nM
15+
p-xylene,"20 mM Tris-HCl(pH 7.0), 5 mM MgCl2, at 25 ℃"",Bos taurus, 12 nM
16+
D-ribose 6-phosphate, - , Homo sapiens, 120 nM
1717
```
1818
2. If there are multiple tables, concat them. Don't give me reference or using "...", give me complete table!
19+
3. If no relevant information was found in the paper, use '-' to fill in the form in CSV.

evals/registry/evals/00_scipaper_enzyme_localization.yaml

Lines changed: 0 additions & 16 deletions
This file was deleted.

evals/registry/evals/00_scipaper_enzyme_substrate.yaml

100644100755
Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@ scipaper_enzyme_substrate.val.csv:
77
args:
88
samples_jsonl: 00_scipaper_enzyme_substrate/samples.jsonl
99
instructions: |
10-
Please give a complete list of SMILES structures, Km values, Vmax values, target info (protein or cell line), and organism of all substrates in the paper. Usually the substrates' tags are numbers or IUPAC names.
10+
Please give a complete list of Substrate, Commentand Organism of all substrates, Products and Comment of Product in the paper. Usually the substrates' tags are numbers or IUPAC names.
1111
1. Output in csv format, write units not in header but in the value like "10.5 µM". Quote the value if it has comma! For example:
1212
```csv
13-
Substrate,Inhibitors, Km value,Km max,Comment,organism,Vmax value,SMILES,Target info,Activating Compound,
14-
ATP,Cu2+,0.001 mM,-,-,Homo sapiens,-,-,ATP-linker aldehyde,Carboxybenzaldehyde,
15-
p-xylene,NADH,0.004 mM,-,-,Homo sapiens,-,C1CCCCC1,-,Methylbenzaldehyde
16-
NADPH,benzaldehyde, 0.12 mM,125 mM,enzyme form ATP,Bos taurus,-,-,NH4+
17-
13+
Substrate,Comment,Organism,Products,"Comment (Product)"
14+
"NADH + H+ + O2","20 mM Tris-HCl(pH 7.0)",Homo sapiens,"NAD+ + H2O", -
15+
"D-glucose + 6-phosphate","20 mM Tris-HCl(pH 7.0), 5 mM MgCl2, at 25 ℃"",Bos taurus, -
16+
"D-ribose 6-phosphate", - , Homo sapiens, "glycerol + phosphate", -
1817
```
1918
2. If there are multiple tables, concat them. Don't give me reference or using "...", give me complete table!
19+
3. If no relevant information was found in the paper, use '-' to fill in the form in CSV.

0 commit comments

Comments
 (0)