3. Covariance
The uncertainties of the free energies estimated with eQuilibrator are often correlated. Sometimes we have moieties whose formation energy has high uncertainty (such as coenzyme-A, Figure 1A), but this uncertainty cancels out in reactions where the moiety is present on both sides. In contrast, there are cases where reaction energies cannot be determined because of completely uncharacterized compounds (such as flavodoxin, Figure 1B), but using explicit formation energies reveals couplings between multiple reactions. Thus, it is often unclear whether one should use the domain of reaction energies or of formation energies. With eQuilibrator 3.0, we encourage the usage of the covariance matrix of the uncertainty when modeling multiple reactions. This matrix fully captures the correlations in the uncertainty of all quantities, and always constrains the values at least as much as when using independent uncertainties. In Figure 1, only using the covariance matrix allows to determine reaction directions in both examples. Importantly, the covariance can be used in the domains of formation as well as reaction energies without loss of information on the reaction energies. In this section we summarize how the convariance matrix can be used in sampling and constraint based methods.
3.1. Random Sampling
Consider a reaction network with stoichiometric matrix \(\bar{X}\). The number of degrees of freedom \(\bar{q}\) in the uncertainty is often smaller than the number of reactions \(n\) (note that \(\bar{q} \leq q = 669\)). Thus, it is convenient to represent the uncertainty with a random vector \(\mathbf{m} \in \mathbb{R}^{\bar{q}}\) following the standard normal distribution \(\mathcal{N}(0, I)\) and a square root \(Q(\bar{X}) \in \mathbb{R}^{n \times \bar{q}}\) of the covariance \(\Sigma(\bar{X})\) [1], such that
where \(I\) is the \(\bar{q}\)-dimensional identity matrix. While \(Q(\bar{X})\) can be computed from the eigenvalue decomposition of \(\Sigma(\bar{X})\), this is sensitive to numerical issues if \(\bar{X}\) is large. Instead, eQuilibrator computes \(Q(\bar{X})\) directly, providing a numerically accurate result.
In order to draw random samples of the Gibbs free energies we can first draw samples of \(\mathbf{m}\) using standard methods and then compute the corresponding free energies using the above equations.
See Random sampling for a code example.
3.2. Constraint-based models
In a constraint-based setting, we can use the same formulation to define a quadratic constraint to bound free energies to a desired confidence level \(\alpha\):
where \(\chi^2_{\bar{q};\alpha}\) is the PPF (percent point function, or quantile function) of the \(\chi^2\)-distribution with \(\bar{q}\) degrees of freedom. In Python it can be calculated using scipy.stats.chi2.ppf()
.
When quadratic constraints cannot be used, one can replace the first inequality with upper and lower bounds for each \(m_i\) separately, corresponding to a confidence interval \(\alpha\) on each individual degree of freedom in the uncertainty:
Although simpler, this formulation should be used with care. Uncertainties are multivariate estimates and independent bounds can over-constrain the free energies, in particular for large networks. For example, when \(\bar{q}\) = 50 and \(\alpha\) = 0.95, the bounds define a confidence region on \(\mathbf{m}\) with an overly restrictive confidence level \(\alpha^{\bar{q}}\) = 0.08.
See Constraint-based modeling for a code example.