2. Tutorial
2.1. Introduction to Component Contributions
All of eQuilibrator’s Gibbs energy estimates are based on the Component Contribution method [1], which combines Group Contribution [2] [3] [4] [5] with the more accurate Reactant Contribution [6] by decomposing each reaction into two parts and applying one of the methods to each of them. This method gives priority to the reactant contributions over group contributions while guaranteeing that all estimates are consistent, i.e. will not violate the first law of thermodynamics.
This tutorial focuses on usage and applications, and these do not require understanding the underlying calculations. However, a reader who is interested in learning how Component Contribution works, would benefit from reading the original publication [1].
2.2. Parsing chemical formulas
In this section, we will learn how to convert strings containing reaction formulae
to Reaction
objects, using the parsing functions in equilibrator-api
.
First, one must create an object from the ComponentContribution
class:
from equilibrator_api import ComponentContribution, Q_
cc = ComponentContribution()
Calling the constructor can take a few seconds, since it is loading the entire database
of compounds into RAM. Important notice: when you do this for the first time after installing
equilibrator-api
, this database will be downloaded from a remote website (Zenodo) and
that can take about 10 minutes (or more, depending on your bandwidth).
Notice that we also imported equilibrator_api.Q_
, which is a shorthand to the Quantity
object that
will help us define physical values with proper units.
The default aqueous conditions are pH of 7.5, ionic strenght of 250 mM, and pMg of 3.
You can change any or all of them by direct assignment:
cc.p_h = Q_(7.0)
cc.p_mg = Q_(2.5)
cc.ionic_strength = Q_("0.1M")
Note that changing these aqueous conditions can be also be done later on (e.g. for checking how sensitive a reaction’s Gibbs energy is to pH).
2.2.1. Creating a Compound
object
First, we will see how to find compounds in the database (which we call the compound_cache
).
The simplest way is to have an accession from on of the supported repositories:
Namespace |
Pattern |
Example |
---|---|---|
metanetx.chemical |
MNXM{int} |
metanetx.chemical:MNXM5 |
bigg.metabolite |
{str} |
bigg.metabolite:h2o |
kegg |
{C/D/G}{int:05d} |
kegg:C00001 |
chebi |
CHEBI:{int} |
chebi:CHEBI:30616 |
sabiork.compound |
{int} |
sabiork.compound:34 |
metacyc.compound |
{str} or CPD-{int} |
metacyc.compound:ATP |
hmdb |
HMDB{int} |
hmdb:HMDB0000538 |
swisslipid |
SLM:{int} |
swisslipid:SLM:000000002 |
reactome |
R-ALL-{int} |
reactome:R-ALL-113592 |
lipidmaps |
LM**{int} |
lipidmaps:LMFA00000001 |
seed |
cpd{int:05d} |
seed:cpd00002 |
For example, we are interested in the compound ATP and we found it in ChEBI
(with the identifier CHEBI:30616
). We can now use the following command:
atp_compound = cc.get_compound("chebi:CHEBI:30616")
The first colon separates the namespace from the ID, while the second one is just part of the identifier itself. Not all databases will have two colons, for example
atp_compound = cc.get_compound("kegg:C00002")
Now that we created a compound, we can print some of its properties:
print(f"InChI = {atp_compound.inchi}")
print(f"InChIKey = {atp_compound.inchi_key}")
print(f"formula = {atp_compound.formula}")
print(f"molecular mass = {atp_compound.mass}")
print(f"pKas = {atp_compound.dissociation_constants}")
If you don’t have an database accession for your compound, you can also get it using the InChI:
acetate_compound = cc.get_compound_by_inchi("InChI=1S/C2H4O2/c1-2(3)4/h1H3,(H,3,4)/p-1")
2.2.2. Searching for a Compound
If you don’t have an exact identifier (or full InChI), you have two options for searching the database: by InChIKey or by common name.
To search by name, simply use the function search_compound
. The search is approximate
(using N-grams) and tries to find the best match in terms of edit-distance (similar to
the I’m Feeling Lucky option on Google search). However,
it can be unreliable for long compound names (and a bit slow for large datasets).
acetate_compound = cc.search_compound("acetat")
The other search option is based on InChIKeys. search_compound_by_inchi_key
returns
a list of all the matching compounds whose InChIKey matches the query as their prefix.
For example:
cc.search_compound_by_inchi_key("MNQZXJOMYWMBOU")
returns a list of 3 compound objects, corresponding to D-glyceraldehyde, L-glyceraldehyde, and glyceraldehyde (with unspecific chirality).
2.2.3. Using the parse_reaction_formula
function
Now, we can parse our first formula using the method parse_reaction_formula
.
In the following example, we use the BiGG database for the compound IDs to create the ATPase reaction:
atpase_reaction = cc.parse_reaction_formula(
"bigg.metabolite:atp + bigg.metabolite:h2o = bigg.metabolite:adp + bigg.metabolite:pi"
)
Note that the namespace bigg.metabolite
has to be repeated before each compound identifier,
and that several namespaces can be combined within the same reaction. For example:
atpase_reaction = cc.parse_reaction_formula(
"chebi:CHEBI:30616 + kegg:C00001 = hmdb:HMDB01341 + seed:cpd00009"
)
We highly recommend that you check that the reaction is atomically balanced (conserves atoms) and charge balanced (redox neutral). We’ve found that it’s easy to accidentally write unbalanced reactions in this format (e.g. forgetting to balance water is a common mistake) and so we always check ourselves. Calling the function atpase_reaction.is_balanced()
returns a boolean with the answer.
Now, try to create more reactions of your own using parse_reaction_formula
and make sure they are balanced.
2.2.4. Using the search_reaction
function
Although parse_reaction_formula
is the most common way to create reactions,
there are a few alternatives that fit some specific use-cases.
One option is to use the search_reaction
function, which works similarly to
search_compound
by choosing the best match for each of the compound names.
For example:
formate_dehydrogenase_reaction = cc.search_reaction("formate + NAD = CO2 + NADH")
In this case, the resulting reaction object is correct (and the reaction is indeed chemically balanced). However, things can become more fuzzy when dealing with compounds that have many versions with similar names, such as sugars with multiple chiral centers. For instance,
glucose_isomerase_reaction = cc.search_reaction("beta-D-glucose = alpha-D-glucose")
In this case, the β-D-glucose (InChIKey = WQZGKKKJIJFFOK-VFUOTHLCSA-N) is correctly identified, but the product chosen by eQuilibrator is D-glucose (InChIKey = WQZGKKKJIJFFOK-GASJEMHNSA-N) and not α-D-glucose (InChIKey = WQZGKKKJIJFFOK-DVKNGEFBSA-N). These mistakes can be unpredictable and depend on the synonym lists in various chemical databases which are not always well-curated. There is absolutely no guarantee that the first hit would be the intended compound. If you must use the reaction search option, it is vital that you carefully go through the results and make sure no mistakes have been made. As demonstrated in the last example, simply checking the chemical balancing is not enough.
2.2.5. Using the Reaction
constructor
The third option for creating a Reaction
object is by first creating your own
set of Compound
objects (e.g. using get_compound
based on one of the namespaces).
For example, one can store them in a dictionary with the IDs as the keys.
The main input argument for the constructor is a dictionary with Compound
as keys and
the stoichiometric coefficients (float) as the values.
from equilibrator_api import Reaction
compound_ids = ["h2o", "adp", "atp", "pi"]
compound_dict = {cid : cc.get_compound(f"bigg.metabolite:{cid}") for cid in compound_ids}
atpase_reaction = Reaction({
compound_dict["atp"]: -1,
compound_dict["h2o"]: -1,
compound_dict["adp"]: 1,
compound_dict["pi"]: 1,
})
2.3. Estimating standard Gibbs energies
Once you have a Reaction
object, you can use it to get estimates for the Gibbs energy.
2.3.1. Transformed standards Gibbs energy of a single reaction
Run the following command:
standard_dg_prime = cc.standard_dg_prime(atpase_reaction)
The return value is a Measurement
object (from the pint
package) which contains the
mean estimated value and its uncertainty (in terms of standard deviation).
2.3.2. Transformed standard Gibbs energies of multiple reactions
Often when working with metabolic models, we need to work with more than one reaction at a time. Although running standard_dg_prime()
on each reaction is possible, this means that we ignore any data about the dependencies between the estimates. In reality, errors in reaction Gibbs energies are highly correlated, due to the fact that they often have metabolites in common. Furthermore, the Component Contribution method uses sub-structures to estimate ΔGs and these are even more coupled (for example, consider two transaminase reactions where the vector of group changes is identical, even if the reactants themselves are different).
As an example, we start by defining three additional reactions:
shikimate_kinase_reaction = cc.parse_reaction_formula(
"bigg.metabolite:skm + bigg.metabolite:atp = bigg.metabolite:skm5p + bigg.metabolite:adp"
)
tyramine_dehydrogenase_reaction = cc.parse_reaction_formula(
"kegg:C00483 + bigg.metabolite:nad + bigg.metabolite:h2o = kegg:C04227 + bigg.metabolite:nadh"
)
fluorene_dehydrogenase_reaction = cc.parse_reaction_formula(
"kegg:C07715 + bigg.metabolite:nad + bigg.metabolite:h2o = kegg:C06711 + bigg.metabolite:nadh"
)
reactions = [atpase_reaction, shikimate_kinase_reaction, tyramine_dehydrogenase_reaction, fluorene_dehydrogenase_reaction]
for r in reactions:
assert r.is_balanced(), "One of the reactions is not chemically balanced"
Now that we store all our reactions in a list, we can use the standard_dg_prime_multi
function, which is a batch version of the standard_dg_prime
function:
(
standard_dg_prime, dg_uncertainty
) = cc.standard_dg_prime_multi(reactions, uncertainty_representation="cov")
Since the software calculates the full covariance matrix for the uncertinaty of the estimates,
we cannot use the Measurement
class (which stores only one standard deviation per estimate). Therefore, there are two return values:
standard_dg_prime
(Quantity) - an array containing the CC estimation of the reactions’ standard transformed energies
dg_uncertainty
(Quantity) - the uncertainty covariance matrix, which can be given in 4 different formats (determined by theuncertinaty_representation
flag):
cov
- the full covariance matrix
precision
- precision matrix (i.e. the inverse of the covariance matrix)
sqrt
- a sqaure root of the covariance, based on the uncertainty vectors
fullrank
- full-rank square root of the covariance which is a compressed form of the sqrt result
In the above example, we used the cov
option, which means that dg_uncertainty
stores the full covariance matrix:
ATPase |
Shikimate |
Tyramine |
Fluorene |
|
---|---|---|---|---|
kinase |
dehydrogenase |
dehydrogenase |
||
ATPase |
0.09 |
0.02 |
0.05 |
0.05 |
Shikimate kinase |
0.02 |
8.07 |
-0.42 |
-0.69 |
Tyramine dehydrogenase |
0.05 |
-0.42 |
0.73 |
0.68 |
Fluorene dehydrogenase |
0.05 |
-0.69 |
0.68 |
2.24 |
All values are in units of \(kJ^2 / mol^2\). The diagonal values are the uncertainties of single reactions (the square of the standard deviation), while the off-diagonal values represent the covariances between reactions. For a detailed explanation of the importance of these values and how to use them in sampling and optimization problems, see Covariance.
2.3.3. Multi-compartment reactions
Reactions that involve membrane transport are a bit more complicated than single-compartment reactions. First, we need to indicate the location of each reactant (i.e. in which of the two compartments it is present). In addition, we need to provide extra information, such as the electrostatic potential, the pH/I/pMg on both sides of the membrane, and the amount of protons that are being transported together with the rest of the compounds.
The calculations are based on Haraldsdottir et al. [7], specitically by adding the following term to the calculated value of \(\Delta G'\)
\(-N_H \cdot RT\ln\left( 10^{\Delta pH} \right) - Q \cdot F \Delta \Phi\)
where R is the gas constant, T is the temperature (in Kalvin), F is Faraday’s constant (the total electric charge of one mol of electrons – 96.5 kC/mol), ΔpH is the difference in pH between initial and final compartment, \(N_H\) is the net number of hydrogen ions transported from initial to final compartment, and Q is the stoichiometric coefficient of the transported charges.
This code example below shows how to estimate \(\Delta_{r}G'^{\circ}\) for glucose uptake through the phosphotransferase system. The result is -44.8 \(\pm\) 0.6 kJ/mol. Note that we only account for uncertainty stemming from the component-contribution method. All other estimates (e.g. based on electrostatic forces) are assumed to be precise:
from equilibrator_api import ComponentContribution, Q_
cytoplasmic_p_h = Q_(7.5)
cytoplasmic_ionic_strength = Q_("250 mM")
periplasmic_p_h = Q_(7.0)
periplasmic_ionic_strength = Q_("200 mM")
e_potential_difference = Q_("0.15 V")
cytoplasmic_reaction = "bigg.metabolite:pep = bigg.metabolite:g6p + bigg.metabolite:pyr"
periplasmic_reaction = "bigg.metabolite:glc__D = "
cc = ComponentContribution()
cc.p_h = cytoplasmic_p_h
cc.ionic_strength = cytoplasmic_ionic_strength
standard_dg_prime = cc.multicompartmental_standard_dg_prime(
cc.parse_reaction_formula(cytoplasmic_reaction),
cc.parse_reaction_formula(periplasmic_reaction),
e_potential_difference=e_potential_difference,
p_h_outer=periplasmic_p_h,
ionic_strength_outer=periplasmic_ionic_strength,
)
print(standard_dg_prime)