Tutorial ======== Introduction to Component Contributions *************************************** All of eQuilibrator's Gibbs energy estimates are based on the *Component Contribution* method :footcite:`noor_consistent_2013`, which combines *Group Contribution* :footcite:`mavrovouniotis_group_1988` :footcite:`mavrovouniotis_group_1990` :footcite:`mavrovouniotis_estimation_1991` :footcite:`jankowski_group_2008` with the more accurate *Reactant Contribution* :footcite:`noor_integrated_2012` by decomposing each reaction into two parts and applying one of the methods to each of them. This method gives priority to the reactant contributions over group contributions while guaranteeing that all estimates are consistent, i.e. will not violate the first law of thermodynamics. This tutorial focuses on usage and applications, and these do not require understanding the underlying calculations. However, a reader who is interested in learning how Component Contribution works, would benefit from reading the original publication :footcite:`noor_consistent_2013`. Parsing chemical formulas ************************* In this section, we will learn how to convert strings containing reaction formulae to ``Reaction`` objects, using the parsing functions in ``equilibrator-api``. First, one must create an object from the ``ComponentContribution`` class:: from equilibrator_api import ComponentContribution, Q_ cc = ComponentContribution() Calling the constructor can take a few seconds, since it is loading the entire database of compounds into RAM. **Important notice:** when you do this for the first time after installing ``equilibrator-api``, this database will be downloaded from a remote website (Zenodo) and that can take about 10 minutes (or more, depending on your bandwidth). Notice that we also imported ``equilibrator_api.Q_``, which is a shorthand to the ``Quantity`` object that will help us define physical values with proper units. The default aqueous conditions are pH of 7.5, ionic strenght of 250 mM, and pMg of 3. You can change any or all of them by direct assignment: .. code-block:: python cc.p_h = Q_(7.0) cc.p_mg = Q_(2.5) cc.ionic_strength = Q_("0.1M") Note that changing these aqueous conditions can be also be done later on (e.g. for checking how sensitive a reaction's Gibbs energy is to pH). Creating a ``Compound`` object ------------------------------ First, we will see how to find compounds in the database (which we call the ``compound_cache``). The simplest way is to have an accession from on of the supported repositories: ================= ================== ============================= Namespace Pattern Example ================= ================== ============================= metanetx.chemical MNXM{int} metanetx.chemical:MNXM5 bigg.metabolite {str} bigg.metabolite:h2o kegg {C/D/G}{int:05d} kegg:C00001 chebi CHEBI:{int} chebi:CHEBI:30616 sabiork.compound {int} sabiork.compound:34 metacyc.compound {str} or CPD-{int} metacyc.compound:ATP hmdb HMDB{int} hmdb:HMDB0000538 swisslipid SLM:{int} swisslipid:SLM:000000002 reactome R-ALL-{int} reactome:R-ALL-113592 lipidmaps LM**{int} lipidmaps:LMFA00000001 seed cpd{int:05d} seed:cpd00002 ================= ================== ============================= For example, we are interested in the compound *ATP* and we found it in `ChEBI `_ (with the identifier ``CHEBI:30616``). We can now use the following command: .. code-block:: python atp_compound = cc.get_compound("chebi:CHEBI:30616") The first colon separates the namespace from the ID, while the second one is just part of the identifier itself. Not all databases will have two colons, for example .. code-block:: python atp_compound = cc.get_compound("kegg:C00002") Now that we created a compound, we can print some of its properties: .. code-block:: python print(f"InChI = {atp_compound.inchi}") print(f"InChIKey = {atp_compound.inchi_key}") print(f"formula = {atp_compound.formula}") print(f"molecular mass = {atp_compound.mass}") print(f"pKas = {atp_compound.dissociation_constants}") If you don't have an database accession for your compound, you can also get it using the InChI: .. code-block:: python acetate_compound = cc.get_compound_by_inchi("InChI=1S/C2H4O2/c1-2(3)4/h1H3,(H,3,4)/p-1") Searching for a ``Compound`` ---------------------------- If you don't have an exact identifier (or full InChI), you have two options for searching the database: by InChIKey or by common name. To search by name, simply use the function ``search_compound``. The search is approximate (using N-grams) and tries to find the best match in terms of edit-distance (similar to the *I'm Feeling Lucky* option on Google search). However, it can be unreliable for long compound names (and a bit slow for large datasets). .. code-block:: python acetate_compound = cc.search_compound("acetat") The other search option is based on InChIKeys. ``search_compound_by_inchi_key`` returns a list of all the matching compounds whose InChIKey matches the query as their prefix. For example: .. code-block:: python cc.search_compound_by_inchi_key("MNQZXJOMYWMBOU") returns a list of 3 compound objects, corresponding to D-glyceraldehyde, L-glyceraldehyde, and glyceraldehyde (with unspecific chirality). Using the ``parse_reaction_formula`` function --------------------------------------------- Now, we can parse our first formula using the method ``parse_reaction_formula``. In the following example, we use the BiGG database for the compound IDs to create the ATPase reaction: .. code-block:: python atpase_reaction = cc.parse_reaction_formula( "bigg.metabolite:atp + bigg.metabolite:h2o = bigg.metabolite:adp + bigg.metabolite:pi" ) Note that the namespace ``bigg.metabolite`` has to be repeated before each compound identifier, and that several namespaces can be combined within the same reaction. For example: .. code-block:: python atpase_reaction = cc.parse_reaction_formula( "chebi:CHEBI:30616 + kegg:C00001 = hmdb:HMDB01341 + seed:cpd00009" ) We highly recommend that you check that the reaction is atomically balanced (conserves atoms) and charge balanced (redox neutral). We've found that it's easy to accidentally write unbalanced reactions in this format (e.g. forgetting to balance water is a common mistake) and so we always check ourselves. Calling the function ``atpase_reaction.is_balanced()`` returns a boolean with the answer. Now, try to create more reactions of your own using ``parse_reaction_formula`` and make sure they are balanced. Using the ``search_reaction`` function -------------------------------------- Although ``parse_reaction_formula`` is the most common way to create reactions, there are a few alternatives that fit some specific use-cases. One option is to use the ``search_reaction`` function, which works similarly to ``search_compound`` by choosing the best match for each of the compound names. For example: .. code-block:: python formate_dehydrogenase_reaction = cc.search_reaction("formate + NAD = CO2 + NADH") In this case, the resulting reaction object is correct (and the reaction is indeed chemically balanced). However, things can become more fuzzy when dealing with compounds that have many versions with similar names, such as sugars with multiple chiral centers. For instance, .. code-block:: python glucose_isomerase_reaction = cc.search_reaction("beta-D-glucose = alpha-D-glucose") In this case, the β-D-glucose (InChIKey = WQZGKKKJIJFFOK-VFUOTHLCSA-N) is correctly identified, but the product chosen by eQuilibrator is D-glucose (InChIKey = WQZGKKKJIJFFOK-GASJEMHNSA-N) and not α-D-glucose (InChIKey = WQZGKKKJIJFFOK-DVKNGEFBSA-N). These mistakes can be unpredictable and depend on the synonym lists in various chemical databases which are not always well-curated. There is absolutely no guarantee that the first hit would be the intended compound. If you must use the reaction search option, it is vital that you carefully go through the results and make sure no mistakes have been made. As demonstrated in the last example, simply checking the chemical balancing is not enough. Using the ``Reaction`` constructor ---------------------------------- The third option for creating a ``Reaction`` object is by first creating your own set of ``Compound`` objects (e.g. using ``get_compound`` based on one of the namespaces). For example, one can store them in a dictionary with the IDs as the keys. The main input argument for the constructor is a dictionary with ``Compound`` as keys and the stoichiometric coefficients (float) as the values. .. code-block:: python from equilibrator_api import Reaction compound_ids = ["h2o", "adp", "atp", "pi"] compound_dict = {cid : cc.get_compound(f"bigg.metabolite:{cid}") for cid in compound_ids} atpase_reaction = Reaction({ compound_dict["atp"]: -1, compound_dict["h2o"]: -1, compound_dict["adp"]: 1, compound_dict["pi"]: 1, }) Estimating standard Gibbs energies ********************************** Once you have a ``Reaction`` object, you can use it to get estimates for the Gibbs energy. Transformed standards Gibbs energy of a single reaction ------------------------------------------------------- Run the following command:: standard_dg_prime = cc.standard_dg_prime(atpase_reaction) The return value is a ``Measurement`` object (from the ``pint`` package) which contains the mean estimated value and its uncertainty (in terms of standard deviation). Transformed standard Gibbs energies of multiple reactions --------------------------------------------------------- Often when working with metabolic models, we need to work with more than one reaction at a time. Although running ``standard_dg_prime()`` on each reaction is possible, this means that we ignore any data about the dependencies between the estimates. In reality, errors in reaction Gibbs energies are highly correlated, due to the fact that they often have metabolites in common. Furthermore, the Component Contribution method uses sub-structures to estimate ΔGs and these are even more coupled (for example, consider two transaminase reactions where the vector of group changes is identical, even if the reactants themselves are different). As an example, we start by defining three additional reactions: .. code-block:: python shikimate_kinase_reaction = cc.parse_reaction_formula( "bigg.metabolite:skm + bigg.metabolite:atp = bigg.metabolite:skm5p + bigg.metabolite:adp" ) tyramine_dehydrogenase_reaction = cc.parse_reaction_formula( "kegg:C00483 + bigg.metabolite:nad + bigg.metabolite:h2o = kegg:C04227 + bigg.metabolite:nadh" ) fluorene_dehydrogenase_reaction = cc.parse_reaction_formula( "kegg:C07715 + bigg.metabolite:nad + bigg.metabolite:h2o = kegg:C06711 + bigg.metabolite:nadh" ) reactions = [atpase_reaction, shikimate_kinase_reaction, tyramine_dehydrogenase_reaction, fluorene_dehydrogenase_reaction] for r in reactions: assert r.is_balanced(), "One of the reactions is not chemically balanced" Now that we store all our reactions in a list, we can use the ``standard_dg_prime_multi`` function, which is a batch version of the ``standard_dg_prime`` function: .. code-block:: python ( standard_dg_prime, dg_uncertainty ) = cc.standard_dg_prime_multi(reactions, uncertainty_representation="cov") Since the software calculates the full covariance matrix for the uncertinaty of the estimates, we cannot use the ``Measurement`` class (which stores only one standard deviation per estimate). Therefore, there are two return values: - ``standard_dg_prime`` (Quantity) - an array containing the CC estimation of the reactions' standard transformed energies - ``dg_uncertainty`` (Quantity) - the uncertainty covariance matrix, which can be given in 4 different formats (determined by the ``uncertinaty_representation`` flag): - ``cov`` - the full covariance matrix - ``precision`` - precision matrix (i.e. the inverse of the covariance matrix) - ``sqrt`` - a sqaure root of the covariance, based on the uncertainty vectors - ``fullrank`` - full-rank square root of the covariance which is a compressed form of the sqrt result In the above example, we used the ``cov`` option, which means that ``dg_uncertainty`` stores the full covariance matrix: +--------------------------+------+----------------+----------------------+----------------------+ | |ATPase|Shikimate |Tyramine |Fluorene | +--------------------------+------+----------------+----------------------+----------------------+ | | |kinase |dehydrogenase |dehydrogenase | +==========================+======+================+======================+======================+ |**ATPase** | 0.09| 0.02 | 0.05 | 0.05 | +--------------------------+------+----------------+----------------------+----------------------+ |**Shikimate kinase** | 0.02| 8.07 | -0.42 | -0.69 | +--------------------------+------+----------------+----------------------+----------------------+ |**Tyramine dehydrogenase**| 0.05| -0.42 | 0.73 | 0.68 | +--------------------------+------+----------------+----------------------+----------------------+ |**Fluorene dehydrogenase**| 0.05| -0.69 | 0.68 | 2.24 | +--------------------------+------+----------------+----------------------+----------------------+ All values are in units of :math:`kJ^2 / mol^2`. The diagonal values are the uncertainties of single reactions (the square of the standard deviation), while the off-diagonal values represent the covariances between reactions. For a detailed explanation of the importance of these values and how to use them in sampling and optimization problems, see :ref:`Covariance`. Multi-compartment reactions --------------------------- Reactions that involve membrane transport are a bit more complicated than single-compartment reactions. First, we need to indicate the location of each reactant (i.e. in which of the two compartments it is present). In addition, we need to provide extra information, such as the electrostatic potential, the pH/I/pMg on both sides of the membrane, and the amount of protons that are being transported together with the rest of the compounds. The calculations are based on Haraldsdottir et al. :footcite:`haraldsdottir_quantitative_2012`, specitically by adding the following term to the calculated value of :math:`\Delta G'` :math:`-N_H \cdot RT\ln\left( 10^{\Delta pH} \right) - Q \cdot F \Delta \Phi` where *R* is the gas constant, *T* is the temperature (in Kalvin), *F* is Faraday's constant (the total electric charge of one mol of electrons -- 96.5 kC/mol), *ΔpH* is the difference in pH between initial and final compartment, :math:`N_H` is the net number of hydrogen ions transported from initial to final compartment, and *Q* is the stoichiometric coefficient of the transported charges. This code example below shows how to estimate :math:`\Delta_{r}G'^{\circ}` for glucose uptake through the phosphotransferase system. The result is -44.8 :math:`\pm` 0.6 kJ/mol. Note that we only account for uncertainty stemming from the component-contribution method. All other estimates (e.g. based on electrostatic forces) are assumed to be precise: .. code-block:: python from equilibrator_api import ComponentContribution, Q_ cytoplasmic_p_h = Q_(7.5) cytoplasmic_ionic_strength = Q_("250 mM") periplasmic_p_h = Q_(7.0) periplasmic_ionic_strength = Q_("200 mM") e_potential_difference = Q_("0.15 V") cytoplasmic_reaction = "bigg.metabolite:pep = bigg.metabolite:g6p + bigg.metabolite:pyr" periplasmic_reaction = "bigg.metabolite:glc__D = " cc = ComponentContribution() cc.p_h = cytoplasmic_p_h cc.ionic_strength = cytoplasmic_ionic_strength standard_dg_prime = cc.multicompartmental_standard_dg_prime( cc.parse_reaction_formula(cytoplasmic_reaction), cc.parse_reaction_formula(periplasmic_reaction), e_potential_difference=e_potential_difference, p_h_outer=periplasmic_p_h, ionic_strength_outer=periplasmic_ionic_strength, ) print(standard_dg_prime) References ********** .. footbibliography::