Skip to content

Commit 200da9e

Browse files
authored
Merge pull request #223 from ModelSEED/db_documentation
Solr documentation / examples
2 parents 685a77f + 6d65793 commit 200da9e

12 files changed

+357
-201
lines changed

Biochemistry/COMPOUNDS.md

+56
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# ModelSEED Biochemistry Compounds
2+
3+
The compounds are found in the compounds.tsv and compounds.json
4+
files. They are described here:
5+
6+
1. **id**: Unique ID for the compound in the format cpdNNNNN where NNNNN is a five digit number (e.g. cpd00001)
7+
2. **abbreviation**: Short name of compound
8+
3. **name**: Long name of compound
9+
4. **formula**: Standard chemical format (using Hill system) in protonated form to match reported charge
10+
5. **mass**: Mass of compound or "null" when unknown
11+
6. **source**: Source database of compound (currently only source is ModelSEED)
12+
7. **inchikey**: Structure of compound using IUPAC International Chemical Identifier (InChI) format
13+
8. **charge**: Electric charge of compound
14+
9. **is_core**: True if compound is in core biochemistry (currently all compounds are set to true)
15+
10. **is_obsolete**: True if compound is obsolete and replaced by different compound (currently all compounds are set to false)
16+
11. **linked_compound**: List of compound IDs separated by semicolon related to this compound or "null" if not specified (used to link an obsolete compound to replacement compound)
17+
12. **is_cofactor**: True if compound is a cofactor (currently all compounds are set to false)
18+
13. **deltag**: Value for change in free energy of compound or "null" when unknown
19+
14. **deltagerr**: Value for change in free energy error of compound or "null" when unknown
20+
15. **pka**: Acid dissociation constants of compound (see below for description of format)
21+
16. **pkb**: Base dissociation constants of compound (see below for description of format)
22+
17. **abstract_compound**: True if compound is an abstraction of a chemical concept (currently all compounds are set to null)
23+
18. **comprised_of**: or "null" if not specified (currently all compounds are set to null)
24+
19. **aliases**: List of alternative names of compound separated by semicolon or "null" if not specified (see below for description of format)
25+
20. **smiles**: Structure of compound using Simplified Molecular-Input Line-Entry System (SMILES) format
26+
21. **notes**: Abbreviated notation used to store derived information about the compound.
27+
28+
### Format of pka and pkb
29+
The pka and pkb fields are in this format:
30+
31+
fragment:atom:value
32+
33+
"fragment" is the index of the molecular fragment. This is almost
34+
always 1 but there are a few molecular structures that contain more
35+
than one distinct structure. "atom" is the index of the atom and
36+
"value" is the dissociation constant value. The pkas or pkbs of
37+
multiple atoms are separated by a semicolon. For example, this is the
38+
pka for NAD:
39+
40+
1:17:1.8;1:18:2.56;1:6:12.32;1:25:11.56;1:35:13.12
41+
42+
### Format of aliases
43+
An alias is in this format:
44+
45+
"source:value"
46+
47+
where "source" is the name of the alternative database and "value" is
48+
the name or ID in the alternative database. Multiple aliases are
49+
separated by a semicolon. For example, this is the list of aliases
50+
for Cobamide (cpd00181):
51+
52+
"KEGG:C00210";"name:Cobamide";"searchname:cobamide";"ModelSEED:cpd00181";"KBase:kb|cpd.181"
53+
54+
### Notes abbreviations
55+
56+
### Sources

Biochemistry/REACTIONS.md

+94
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# ModelSEED Biochemistry Reactions
2+
3+
The reactions are found in the reactions.tsv and reactions.json
4+
files. They are described here:
5+
6+
1. **id**: Unique ID for the reaction in the format rxnNNNNN where NNNNN is a five digit number (e.g. rxn03789)
7+
2. **abbreviation**: Short name of reaction
8+
3. **name**: Long name of reaction
9+
4. **code**: Definition of reaction expressed using compound IDs and before protonation (see below for description of format)
10+
5. **stoichiometry**: Definition of reaction expressed in stoichiometry format (see below for description of format)
11+
6. **is_transport**: True if reaction is a transport reaction
12+
7. **equation**: Definition of reaction expressed using compound IDs and after protonation (see below for description of format)
13+
8. **definition**: Definition of reaction expressed using compound names (see below for description of format)
14+
9. **reversibility**: Reversibility of reaction where ">" means right directional, "<" means left directional, "=" means bi-directional, and "?" means unknown (_Need a better description of reversibility and direction_)
15+
10. **direction**: Direction of reaction where ">" means right directional, "<" means left directional, and "=" means bi-directional
16+
11. **abstract_reaction**: _Need definition_ or "null" if not specified (currently all reactions are set to null)
17+
12. **pathways**: Pathways reaction is a part of or "null" if not specified (currently all reactions are set to null)
18+
13. **aliases**: List of alternative names of reaction separated by semicolon or "null" if not specified (format is the same as Compounds file)
19+
14. **ec_numbers**: Enzyme Commission numbers of enzymes that catalyze reaction or "null" if not specified (currently all reactions are set to null)
20+
15. **deltag**: Value for change in free energy of reaction or 10000000 when unknown
21+
16. **deltagerr**: Value for change in free energy error of reaction or 10000000 when unknown
22+
17. **compound_ids**: List of compound IDs separated by semicolon for compounds involved in reaction
23+
18. **status**: String describing status of the reaction with multiple values delimited with a "|" character. See below for details.
24+
19. **is_obsolete**: True if reaction is obsolete and replaced by different reaction
25+
20. **linked_reaction**: List of reaction IDs separated by semicolon related to this reaction or "null" if not specified (used to link an obsolete reaction to replacement reaction)
26+
21. **notes**: Abbreviated notation used to store derived information about the reaction
27+
22. **source**: Source database of reaction (currently only source is ModelSEED)
28+
29+
### Format of stoichiometry field
30+
31+
### Format of reaction definition using compound IDs
32+
Each compound participating in the reaction is in this format:
33+
34+
(n) cpdid[m]
35+
36+
where "n" is the compound coefficient, "cpdid" is the compound ID, and "m" is a compartment index number. Compounds are separated by "+" and reactant and product compounds are delimited by the direction symbol. For example, this is the definition of reaction rxn00001:
37+
38+
(1) cpd00001[0] + (1) cpd00012[0] <=> (2) cpd00009[0] + (1) cpd00067[0]
39+
40+
### Format of reaction definition using compound names
41+
Each compound participating in the reaction is in this format:
42+
43+
(n) cpdname[m]
44+
45+
where "n" is the compound coefficient, "cpdname" is the compound name, and "m" is a compartment index number. Compounds are separated by "+" and reactant and product compounds are delimited by the direction symbol. For example, this is the definition of reaction rxn00001:
46+
47+
(1) H2O[0] + (1) PPi[0] <=> (2) Phosphate[0] + (1) H+[0]
48+
49+
### Format of reaction stoichiometry
50+
Each compound participating in the reaction is in this format:
51+
52+
n:cpdid:m:i:"cpdname"
53+
54+
where "n" is the compound coefficient and a negative number indicates
55+
a reactant and a positive number indicates a product, "cpdid" is the
56+
compound ID, "m" is the compartment index number, "i" is the community
57+
index number (I think this is no longer needed), and "cpdname" is the
58+
compound name. Compounds are separated by semicolon. For example,
59+
this is the stoichiometry of reaction rxn00001:
60+
61+
-1:cpd00001:0:0:"H2O";-1:cpd00012:0:0:"PPi";2:cpd00009:0:0:"Phosphate";1:cpd00067:0:0:"H+"
62+
63+
### Reaction status values
64+
65+
The reaction status field is a string with one or more
66+
values. Multiple values are separated with a "|" character. There are
67+
multiple values when a reaction is updated after mass and charge
68+
balancing. If "OK" is one of the values, the reaction is valid and
69+
additional values describe the changes made to the reaction to balance
70+
it. The status values are:
71+
72+
* OK means the reaction is valid. If "OK" is the only value, then the reaction was valid with no changes.
73+
* MI means there is a mass imbalance. The remainder of the string after the first colon indicates what atoms are unbalanced and the number of atoms needed to balance the reaction. Multiple atoms are separated by a "/" character. A positive number means the righthand side of the reaction has more atoms and a negative number means the lefthand side of the reaction has more atoms.
74+
* CI means there is a charge imbalance. A positive number after the first colon means the righthand side of the reaction has a larger charge and a negative number means the lefthand side of the reaction has a larger charge.
75+
* HB means the reaction has been balanced by adding hydrogen to it.
76+
* EMPTY means reactants cancel out completely.
77+
* CPDFORMERROR means at least one compound either has no formula or has an invalid formula.
78+
79+
For example, rxn00277 has this definition:
80+
81+
(1) Glycine[0] <=> (1) HCN[0] or
82+
(1) C2H5NO2 <=> (1) CHN
83+
84+
and its status shows that it has a mass imbalance with 1 extra carbon, 4 extra hydrogen, 2 extra oxygen atoms on the lefthand side of the reaction:
85+
86+
MI:C:-1/H:-4/O:-2
87+
88+
And rxn00008 has this definition:
89+
90+
(2) H2O[0] <=> (1) H2O2[0] + (2) H+[0]
91+
92+
and its status shows it has a charge imbalance with the righthand side of the reaction having a larger charge:
93+
94+
CI:2

0 commit comments

Comments
 (0)