Skip to content

Commit 8d2204d

Browse files
committed
add enzyme information
add km and substrate
1 parent 8d44bbb commit 8d2204d

File tree

13 files changed

+79
-60
lines changed

13 files changed

+79
-60
lines changed

Diff for: evals/registry/data/00_scipaper_enzyme_activate_compound/samples.jsonl

-3
This file was deleted.

Diff for: evals/registry/data/00_scipaper_enzyme_inhibitor/samples.jsonl

-3
This file was deleted.
+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/bin/bash
2+
target_job=$1
3+
if [[ ${target_job} == "" ]]
4+
then
5+
echo ">>> Error: target_job is not define"
6+
exit
7+
fi
8+
if [[ ! -f samples.jsonl ]]
9+
then
10+
touch samples.jsonl
11+
fi
12+
for paper in /root/uni-finder/enzyme/"${target_job}"/paper/*.pdf
13+
do
14+
echo "find file ${paper}"
15+
file_name="${paper##*/}"
16+
name=${file_name%.*}
17+
key_word=""
18+
key_word=$(grep "${name}" samples.jsonl)
19+
if [[ ${key_word} == "" ]]
20+
then
21+
echo "add ${name} to jsonl"
22+
sed 's|target_mark|'"${name}"'|g' sample_file | sed 's|target_Job|'"${target_job}"'|g' >> samples.jsonl
23+
else
24+
echo "${name}: was already in the jsonl"
25+
fi
26+
done
27+
+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"file_name": "../uni-finder/enzyme/target_Job/paper/target_mark.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/target_mark.pdf", "answerfile_name": "../uni-finder/enzyme/target_Job/answer/target_mark.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/target_mark.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Km Value (mM)"], "index": "Substrate"}
+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:b1d54b5a0607f2e1992cdc213309440f24bb630dc3a3b57bc939e32dd47079aa
3+
size 6846

Diff for: evals/registry/data/00_scipaper_enzyme_localization/samples.jsonl

-3
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/bin/bash
2+
target_job=$1
3+
if [[ ${target_job} == "" ]]
4+
then
5+
echo ">>> Error: target_job is not define"
6+
exit
7+
fi
8+
if [[ ! -f samples.jsonl ]]
9+
then
10+
touch samples.jsonl
11+
fi
12+
for paper in /root/uni-finder/enzyme/"${target_job}"/paper/*.pdf
13+
do
14+
echo "find file ${paper}"
15+
file_name="${paper##*/}"
16+
name=${file_name%.*}
17+
key_word=""
18+
key_word=$(grep "${name}" samples.jsonl)
19+
if [[ ${key_word} == "" ]]
20+
then
21+
echo "add ${name} to jsonl"
22+
sed 's|target_mark|'"${name}"'|g' sample_file | sed 's|target_Job|'"${target_job}"'|g' >> samples.jsonl
23+
else
24+
echo "${name}: was already in the jsonl"
25+
fi
26+
done
27+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"file_name": "../uni-finder/enzyme/target_Job/paper/target_mark.pdf", "file_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/target_mark.pdf", "answerfile_name": "../uni-finder/enzyme/target_Job/answer/target_mark.csv", "answerfile_link": "https://dp-filetrans-bj.oss-cn-beijing.aliyuncs.com/changjunhan/target_mark.csv", "compare_fields": ["Substrate", "Comment", "Organism", "Products", "Comment (Product)"], "index": "Substrate"}
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
version https://git-lfs.github.com/spec/v1
2-
oid sha256:6316846852a855013f98ee678e945582013c1269fcad311c8e933859ade77c68
3-
size 1919
2+
oid sha256:78a3b4fbbfdb149b3420f6aec13b8022e9becc6ea16370b5f2dbd23fd429c848
3+
size 7815

Diff for: evals/registry/evals/00_scipaper_enzyme_activate_compound.yaml

-18
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,20 @@
1-
scipaper_enzyme_inhibitor:
2-
id: scipaper_enzyme_inhibitor.val.csv
1+
scipaper_enzyme_km:
2+
id: scipaper_enzyme_km.val.csv
33
metrics: [accuracy]
44

5-
scipaper_enzyme_inhibitor.val.csv:
5+
scipaper_enzyme_km.val.csv:
66
class: evals.elsuite.rag_table_extract:TableExtract
77
args:
8-
samples_jsonl: 00_scipaper_enzyme_inhibitor/samples.jsonl
8+
samples_jsonl: 00_scipaper_enzyme_km/samples.jsonl
99
instructions: |
10-
Please give a complete list of Inhibitor, Commentand Organism of all substrates in the paper. Usually the substrates' tags are numbers or IUPAC names.
10+
Please give a complete list of Substrate, Commentand Organism of all substrates in the paper. Usually the substrates' tags are numbers or IUPAC names.
1111
1. Output in csv format, write units not in header but in the value like "10.5 µM". Quote the value if it has comma! For example:
1212
```csv
13-
Inhibitor,Comment,Organism
14-
ATP,"competitive inhibition of verapamil-dependent ATPase-activity",Homo sapiens
15-
p-xylene,"11.4 mM, slight inhibitor",Bos taurus
16-
NH4+, 0.002 mM,Bos taurus
13+
Substrate,Comment,Organism,Km Value
14+
ATP,"competitive inhibition of verapamil-dependent ATPase-activity",Homo sapiens, 3.5 nM
15+
p-xylene,"20 mM Tris-HCl(pH 7.0), 5 mM MgCl2, at 25 ℃"",Bos taurus, 12 nM
16+
D-ribose 6-phosphate, - , Homo sapiens, 120 nM
1717
```
1818
2. If there are multiple tables, concat them. Don't give me reference or using "...", give me complete table!
19+
3. If no relevant information was found in the paper, use '-' to fill in the form in CSV.
20+

Diff for: evals/registry/evals/00_scipaper_enzyme_localization.yaml

-16
This file was deleted.

Diff for: evals/registry/evals/00_scipaper_enzyme_substrate.yaml

+7-6
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,14 @@ scipaper_enzyme_substrate.val.csv:
77
args:
88
samples_jsonl: 00_scipaper_enzyme_substrate/samples.jsonl
99
instructions: |
10-
Please give a complete list of SMILES structures, Km values, Vmax values, target info (protein or cell line), and organism of all substrates in the paper. Usually the substrates' tags are numbers or IUPAC names.
10+
Please give a complete list of Substrate, Commentand Organism of all substrates, Products and Comment of Product in the paper. Usually the substrates' tags are numbers or IUPAC names.
1111
1. Output in csv format, write units not in header but in the value like "10.5 µM". Quote the value if it has comma! For example:
1212
```csv
13-
Substrate,Inhibitors, Km value,Km max,Comment,organism,Vmax value,SMILES,Target info,Activating Compound,
14-
ATP,Cu2+,0.001 mM,-,-,Homo sapiens,-,-,ATP-linker aldehyde,Carboxybenzaldehyde,
15-
p-xylene,NADH,0.004 mM,-,-,Homo sapiens,-,C1CCCCC1,-,Methylbenzaldehyde
16-
NADPH,benzaldehyde, 0.12 mM,125 mM,enzyme form ATP,Bos taurus,-,-,NH4+
17-
13+
Substrate,Comment,Organism,Products,"Comment (Product)"
14+
"NADH + H+ + O2","20 mM Tris-HCl(pH 7.0)",Homo sapiens,"NAD+ + H2O", -
15+
"D-glucose + 6-phosphate","20 mM Tris-HCl(pH 7.0), 5 mM MgCl2, at 25 ℃"",Bos taurus, -
16+
"D-ribose 6-phosphate", - , Homo sapiens, "glycerol + phosphate", -
1817
```
1918
2. If there are multiple tables, concat them. Don't give me reference or using "...", give me complete table!
19+
3. If no relevant information was found in the paper, use '-' to fill in the form in CSV.
20+

0 commit comments

Comments
 (0)