3. Covariance

The uncertainties of the free energies estimated with eQuilibrator are often correlated. Sometimes we have moieties whose formation energy has high uncertainty (such as coenzyme-A, Figure 1A), but this uncertainty cancels out in reactions where the moiety is present on both sides. In contrast, there are cases where reaction energies cannot be determined because of completely uncharacterized compounds (such as flavodoxin, Figure 1B), but using explicit formation energies reveals couplings between multiple reactions. Thus, it is often unclear whether one should use the domain of reaction energies or of formation energies. With eQuilibrator 3.0, we encourage the usage of the covariance matrix of the uncertainty when modeling multiple reactions. This matrix fully captures the correlations in the uncertainty of all quantities, and always constrains the values at least as much as when using independent uncertainties. In Figure 1, only using the covariance matrix allows to determine reaction directions in both examples. Importantly, the covariance can be used in the domains of formation as well as reaction energies without loss of information on the reaction energies. In this section we summarize how the convariance matrix can be used in sampling and constraint based methods.

_images/covariance_fig1.svg

Figure 1: Examples for the importance of the covariance in the estimation uncertainty found in the iML1515 E. coli model. (A) Homoserine O-succinyltransferase (HSST) and 3-oxoadipyl-CoA thiolase (3OXCOAT) both convert succinyl-CoA (succoa) to CoA. Because of the uncertainty in the \(\Delta_f G'^\circ\) of CoA, computing reaction energies from independent \(\Delta_f G'^\circ\) estimates results in a large uncertainty (orange). As the \(\Delta_f G'^\circ\) of succoa and CoA are strongly correlated, direct estimates of \(\Delta_r G'^\circ\) have smaller uncertainty (blue) comparable to the uncertainty obtained using the covariance matrix of either \(\Delta_f G'^\circ\) or \(\Delta_r G'^\circ\) (cyan). (B) Pyruvate synthase (POR5) converts acetyl-CoA (accoa) into pyruvate (pyr) by oxidizing flavodoxin which, in iML1515, can be regenerated only through oxidation of NADPH (FLDR2). The \(\Delta_r G'^\circ\) of both reactions is unknown. However, \(\Delta_f G'^\circ\) of flxr and flxso must have the same value in both reactions, leading to strong correlation in the uncertainty of \(\Delta_r G'^\circ\). Thus, textit{in-vivo} synthesis of pyruvate through POR5 is unfavorable. (C) Covariances of \(\Delta_f G'^\circ\) and \(\Delta_r G'^\circ\) yield the same information about reaction energies. The small uncertainty in \(\Delta_r G'^\circ\) for HSST and 3OXCOAT matches the correlation in \(\Delta_f G'^\circ\) of succoa and coa, while the coupling of FLDR2 and POR5 through flavodoxin is captured by the covariance in the \(\Delta_r G'^\circ\) of the two reactions.

3.1. Random Sampling

Consider a reaction network with stoichiometric matrix \(\bar{X}\). The number of degrees of freedom \(\bar{q}\) in the uncertainty is often smaller than the number of reactions \(n\) (note that \(\bar{q} \leq q = 669\)). Thus, it is convenient to represent the uncertainty with a random vector \(\mathbf{m} \in \mathbb{R}^{\bar{q}}\) following the standard normal distribution \(\mathcal{N}(0, I)\) and a square root \(Q(\bar{X}) \in \mathbb{R}^{n \times \bar{q}}\) of the covariance \(\Sigma(\bar{X})\) [1], such that

\[\begin{split}\mathbf{m} &\sim \mathcal{N}(0, I) \\ \Delta_{r}G'^{\circ} &= \Delta_{r}G'^{\circ}(\bar{X}) + Q(\bar{X}) \mathbf{m}\end{split}\]

where \(I\) is the \(\bar{q}\)-dimensional identity matrix. While \(Q(\bar{X})\) can be computed from the eigenvalue decomposition of \(\Sigma(\bar{X})\), this is sensitive to numerical issues if \(\bar{X}\) is large. Instead, eQuilibrator computes \(Q(\bar{X})\) directly, providing a numerically accurate result.

In order to draw random samples of the Gibbs free energies we can first draw samples of \(\mathbf{m}\) using standard methods and then compute the corresponding free energies using the above equations.

See Random sampling for a code example.

3.2. Constraint-based models

In a constraint-based setting, we can use the same formulation to define a quadratic constraint to bound free energies to a desired confidence level \(\alpha\):

\[\begin{split}||\mathbf{m}||_2^2 &\leq \chi^2_{\bar{q};\alpha} \\ \Delta_{r}G'^{\circ} &= \Delta_{r}G'^{\circ}(\bar{X}) + Q(\bar{X}) \mathbf{m}\end{split}\]

where \(\chi^2_{\bar{q};\alpha}\) is the PPF (percent point function, or quantile function) of the \(\chi^2\)-distribution with \(\bar{q}\) degrees of freedom. In Python it can be calculated using scipy.stats.chi2.ppf().

When quadratic constraints cannot be used, one can replace the first inequality with upper and lower bounds for each \(m_i\) separately, corresponding to a confidence interval \(\alpha\) on each individual degree of freedom in the uncertainty:

\[|m_i| \leq \sqrt{\chi^2_{1;\alpha}} \qquad \forall \; 1\leq i\leq \bar{q}\]

Although simpler, this formulation should be used with care. Uncertainties are multivariate estimates and independent bounds can over-constrain the free energies, in particular for large networks. For example, when \(\bar{q}\) = 50 and \(\alpha\) = 0.95, the bounds define a confidence region on \(\mathbf{m}\) with an overly restrictive confidence level \(\alpha^{\bar{q}}\) = 0.08.

See Constraint-based modeling for a code example.

3.3. References