2. Tutorial

2.1. Introduction to Component Contributions

All of eQuilibrator’s Gibbs energy estimates are based on the Component Contribution method [1], which combines Group Contribution [2] [3] [4] [5] with the more accurate Reactant Contribution [6] by decomposing each reaction into two parts and applying one of the methods to each of them. This method gives priority to the reactant contributions over group contributions while guaranteeing that all estimates are consistent, i.e. will not violate the first law of thermodynamics.

This tutorial focuses on usage and applications, and these do not require understanding the underlying calculations. However, a reader who is interested in learning how Component Contribution works, would benefit from reading the original publication [1].

2.2. Parsing chemical formulas

In this section, we will learn how to convert strings containing reaction formulae to Reaction objects, using the parsing functions in equilibrator-api.

First, one must create an object from the ComponentContribution class:

from equilibrator_api import ComponentContribution, Q_
cc = ComponentContribution()

Calling the constructor can take a few seconds, since it is loading the entire database of compounds into RAM. Important notice: when you do this for the first time after installing equilibrator-api, this database will be downloaded from a remote website (Zenodo) and that can take about 10 minutes (or more, depending on your bandwidth).

Notice that we also imported equilibrator_api.Q_, which is a shorthand to the Quantity object that will help us define physical values with proper units. The default aqueous conditions are pH of 7.5, ionic strenght of 250 mM, and pMg of 3. You can change any or all of them by direct assignment:

cc.p_h = Q_(7.0)
cc.p_mg = Q_(2.5)
cc.ionic_strength = Q_("0.1M")

Note that changing these aqueous conditions can be also be done later on (e.g. for checking how sensitive a reaction’s Gibbs energy is to pH).

2.2.1. Creating a Compound object

First, we will see how to find compounds in the database (which we call the compound_cache). The simplest way is to have an accession from on of the supported repositories:

Namespace

Pattern

Example

metanetx.chemical

MNXM{int}

metanetx.chemical:MNXM5

bigg.metabolite

{str}

bigg.metabolite:h2o

kegg

{C/D/G}{int:05d}

kegg:C00001

chebi

CHEBI:{int}

chebi:CHEBI:30616

sabiork.compound

{int}

sabiork.compound:34

metacyc.compound

{str} or CPD-{int}

metacyc.compound:ATP

hmdb

HMDB{int}

hmdb:HMDB0000538

swisslipid

SLM:{int}

swisslipid:SLM:000000002

reactome

R-ALL-{int}

reactome:R-ALL-113592

lipidmaps

LM**{int}

lipidmaps:LMFA00000001

seed

cpd{int:05d}

seed:cpd00002

For example, we are interested in the compound ATP and we found it in ChEBI (with the identifier CHEBI:30616). We can now use the following command:

atp_compound = cc.get_compound("chebi:CHEBI:30616")

The first colon separates the namespace from the ID, while the second one is just part of the identifier itself. Not all databases will have two colons, for example

atp_compound = cc.get_compound("kegg:C00002")

Now that we created a compound, we can print some of its properties:

print(f"InChI = {atp_compound.inchi}")
print(f"InChIKey = {atp_compound.inchi_key}")
print(f"formula = {atp_compound.formula}")
print(f"molecular mass = {atp_compound.mass}")
print(f"pKas = {atp_compound.dissociation_constants}")

If you don’t have an database accession for your compound, you can also get it using the InChI:

acetate_compound = cc.get_compound_by_inchi("InChI=1S/C2H4O2/c1-2(3)4/h1H3,(H,3,4)/p-1")

2.2.2. Searching for a Compound

If you don’t have an exact identifier (or full InChI), you have two options for searching the database: by InChIKey or by common name.

To search by name, simply use the function search_compound. The search is approximate (using N-grams) and tries to find the best match in terms of edit-distance (similar to the I’m Feeling Lucky option on Google search). However, it can be unreliable for long compound names (and a bit slow for large datasets).

acetate_compound = cc.search_compound("acetat")

The other search option is based on InChIKeys. search_compound_by_inchi_key returns a list of all the matching compounds whose InChIKey matches the query as their prefix. For example:

cc.search_compound_by_inchi_key("MNQZXJOMYWMBOU")

returns a list of 3 compound objects, corresponding to D-glyceraldehyde, L-glyceraldehyde, and glyceraldehyde (with unspecific chirality).

2.2.3. Using the parse_reaction_formula function

Now, we can parse our first formula using the method parse_reaction_formula. In the following example, we use the BiGG database for the compound IDs to create the ATPase reaction:

atpase_reaction = cc.parse_reaction_formula(
    "bigg.metabolite:atp + bigg.metabolite:h2o = bigg.metabolite:adp + bigg.metabolite:pi"
)

Note that the namespace bigg.metabolite has to be repeated before each compound identifier, and that several namespaces can be combined within the same reaction. For example:

atpase_reaction = cc.parse_reaction_formula(
    "chebi:CHEBI:30616 + kegg:C00001 = hmdb:HMDB01341 + seed:cpd00009"
)

We highly recommend that you check that the reaction is atomically balanced (conserves atoms) and charge balanced (redox neutral). We’ve found that it’s easy to accidentally write unbalanced reactions in this format (e.g. forgetting to balance water is a common mistake) and so we always check ourselves. Calling the function atpase_reaction.is_balanced() returns a boolean with the answer.

Now, try to create more reactions of your own using parse_reaction_formula and make sure they are balanced.

2.2.4. Using the search_reaction function

Although parse_reaction_formula is the most common way to create reactions, there are a few alternatives that fit some specific use-cases.

One option is to use the search_reaction function, which works similarly to search_compound by choosing the best match for each of the compound names. For example:

formate_dehydrogenase_reaction = cc.search_reaction("formate + NAD = CO2 + NADH")

In this case, the resulting reaction object is correct (and the reaction is indeed chemically balanced). However, things can become more fuzzy when dealing with compounds that have many versions with similar names, such as sugars with multiple chiral centers. For instance,

glucose_isomerase_reaction = cc.search_reaction("beta-D-glucose = alpha-D-glucose")

In this case, the β-D-glucose (InChIKey = WQZGKKKJIJFFOK-VFUOTHLCSA-N) is correctly identified, but the product chosen by eQuilibrator is D-glucose (InChIKey = WQZGKKKJIJFFOK-GASJEMHNSA-N) and not α-D-glucose (InChIKey = WQZGKKKJIJFFOK-DVKNGEFBSA-N). These mistakes can be unpredictable and depend on the synonym lists in various chemical databases which are not always well-curated. There is absolutely no guarantee that the first hit would be the intended compound. If you must use the reaction search option, it is vital that you carefully go through the results and make sure no mistakes have been made. As demonstrated in the last example, simply checking the chemical balancing is not enough.

2.2.5. Using the Reaction constructor

The third option for creating a Reaction object is by first creating your own set of Compound objects (e.g. using get_compound based on one of the namespaces). For example, one can store them in a dictionary with the IDs as the keys.

The main input argument for the constructor is a dictionary with Compound as keys and the stoichiometric coefficients (float) as the values.

from equilibrator_api import Reaction
compound_ids = ["h2o", "adp", "atp", "pi"]
compound_dict = {cid : cc.get_compound(f"bigg.metabolite:{cid}") for cid in compound_ids}
atpase_reaction = Reaction({
    compound_dict["atp"]: -1,
    compound_dict["h2o"]: -1,
    compound_dict["adp"]: 1,
    compound_dict["pi"]: 1,
})

2.3. Estimating standard Gibbs energies

Once you have a Reaction object, you can use it to get estimates for the Gibbs energy.

2.3.1. Transformed standards Gibbs energy of a single reaction

Run the following command:

standard_dg_prime = cc.standard_dg_prime(atpase_reaction)

The return value is a Measurement object (from the pint package) which contains the mean estimated value and its uncertainty (in terms of standard deviation).

2.3.2. Transformed standard Gibbs energies of multiple reactions

Often when working with metabolic models, we need to work with more than one reaction at a time. Although running standard_dg_prime() on each reaction is possible, this means that we ignore any data about the dependencies between the estimates. In reality, errors in reaction Gibbs energies are highly correlated, due to the fact that they often have metabolites in common. Furthermore, the Component Contribution method uses sub-structures to estimate ΔGs and these are even more coupled (for example, consider two transaminase reactions where the vector of group changes is identical, even if the reactants themselves are different).

As an example, we start by defining three additional reactions:

shikimate_kinase_reaction = cc.parse_reaction_formula(
    "bigg.metabolite:skm + bigg.metabolite:atp = bigg.metabolite:skm5p + bigg.metabolite:adp"
)
tyramine_dehydrogenase_reaction = cc.parse_reaction_formula(
    "kegg:C00483 + bigg.metabolite:nad + bigg.metabolite:h2o = kegg:C04227 + bigg.metabolite:nadh"
)
fluorene_dehydrogenase_reaction = cc.parse_reaction_formula(
    "kegg:C07715 + bigg.metabolite:nad + bigg.metabolite:h2o = kegg:C06711 + bigg.metabolite:nadh"
)
reactions = [atpase_reaction, shikimate_kinase_reaction, tyramine_dehydrogenase_reaction, fluorene_dehydrogenase_reaction]

for r in reactions:
    assert r.is_balanced(), "One of the reactions is not chemically balanced"

Now that we store all our reactions in a list, we can use the standard_dg_prime_multi function, which is a batch version of the standard_dg_prime function:

(
    standard_dg_prime, dg_uncertainty
) = cc.standard_dg_prime_multi(reactions, uncertainty_representation="cov")

Since the software calculates the full covariance matrix for the uncertinaty of the estimates, we cannot use the Measurement class (which stores only one standard deviation per estimate). Therefore, there are two return values:

  • standard_dg_prime (Quantity) - an array containing the CC estimation of the reactions’ standard transformed energies

  • dg_uncertainty (Quantity) - the uncertainty covariance matrix, which can be given in 4 different formats (determined by the uncertinaty_representation flag):

    • cov - the full covariance matrix

    • precision - precision matrix (i.e. the inverse of the covariance matrix)

    • sqrt - a sqaure root of the covariance, based on the uncertainty vectors

    • fullrank - full-rank square root of the covariance which is a compressed form of the sqrt result

In the above example, we used the cov option, which means that dg_uncertainty stores the full covariance matrix:

ATPase

Shikimate

Tyramine

Fluorene

kinase

dehydrogenase

dehydrogenase

ATPase

0.09

0.02

0.05

0.05

Shikimate kinase

0.02

8.07

-0.42

-0.69

Tyramine dehydrogenase

0.05

-0.42

0.73

0.68

Fluorene dehydrogenase

0.05

-0.69

0.68

2.24

All values are in units of \(kJ^2 / mol^2\). The diagonal values are the uncertainties of single reactions (the square of the standard deviation), while the off-diagonal values represent the covariances between reactions. For a detailed explanation of the importance of these values and how to use them in sampling and optimization problems, see Covariance.

2.3.3. Multi-compartment reactions

Reactions that involve membrane transport are a bit more complicated than single-compartment reactions. First, we need to indicate the location of each reactant (i.e. in which of the two compartments it is present). In addition, we need to provide extra information, such as the electrostatic potential, the pH/I/pMg on both sides of the membrane, and the amount of protons that are being transported together with the rest of the compounds.

The calculations are based on Haraldsdottir et al. [7], specitically by adding the following term to the calculated value of \(\Delta G'\)

\(-N_H \cdot RT\ln\left( 10^{\Delta pH} \right) - Q \cdot F \Delta \Phi\)

where R is the gas constant, T is the temperature (in Kalvin), F is Faraday’s constant (the total electric charge of one mol of electrons – 96.5 kC/mol), ΔpH is the difference in pH between initial and final compartment, \(N_H\) is the net number of hydrogen ions transported from initial to final compartment, and Q is the stoichiometric coefficient of the transported charges.

This code example below shows how to estimate \(\Delta_{r}G'^{\circ}\) for glucose uptake through the phosphotransferase system. The result is -44.8 \(\pm\) 0.6 kJ/mol. Note that we only account for uncertainty stemming from the component-contribution method. All other estimates (e.g. based on electrostatic forces) are assumed to be precise:

from equilibrator_api import ComponentContribution, Q_

cytoplasmic_p_h = Q_(7.5)
cytoplasmic_ionic_strength = Q_("250 mM")
periplasmic_p_h = Q_(7.0)
periplasmic_ionic_strength = Q_("200 mM")
e_potential_difference = Q_("0.15 V")
cytoplasmic_reaction = "bigg.metabolite:pep = bigg.metabolite:g6p + bigg.metabolite:pyr"
periplasmic_reaction = "bigg.metabolite:glc__D = "

cc = ComponentContribution()
cc.p_h = cytoplasmic_p_h
cc.ionic_strength = cytoplasmic_ionic_strength
standard_dg_prime = cc.multicompartmental_standard_dg_prime(
    cc.parse_reaction_formula(cytoplasmic_reaction),
    cc.parse_reaction_formula(periplasmic_reaction),
    e_potential_difference=e_potential_difference,
    p_h_outer=periplasmic_p_h,
    ionic_strength_outer=periplasmic_ionic_strength,
)

print(standard_dg_prime)

2.4. References