{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Local cache\n", "\n", "The LocalCompoundCache class, found in local_compound_cache, provides methods to generate Compound objects as well as storing and retrieving these compounds from a local component contribution database.\n", "\n", "This notebook will highlight the following use-cases:\n", "\n", "1. Adding compounds and retrieving them from the coco namespace using `add_compounds`\n", "2. Adding compounds and retrieving them using `get_compounds`\n", "3. Options to control behavior for `get_compounds` and `add_compounds` " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Requirements\n", "\n", "- equilibrator-assets: `!pip install equilibrator-assets`\n", "- openbabel: `!pip install openbabel-wheel` or `!conda install -c conda-forge openbabel`\n", "- chemaxon (including license): `cxcalc` must be in \"PATH\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialize the local compound cache" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2024-01-28T13:17:50.166430296Z", "start_time": "2024-01-28T13:17:49.132185676Z" }, "tags": [] }, "outputs": [], "source": [ "import pandas as pd\n", "from equilibrator_assets.local_compound_cache import LocalCompoundCache\n", "lc = LocalCompoundCache()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generating a new local cache\n", "\n", "A copy of the default zenodo cache must be used for the local_cache.\n", "\n", "*You can skip this cell if the local cache already exists*" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2024-01-28T13:17:50.198696225Z", "start_time": "2024-01-28T13:17:50.170143700Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "compounds.sqlite already exists.\n", "Delete existing file and replace?\n" ] }, { "name": "stdin", "output_type": "stream", "text": [ "Proceed? (yes/no): yes\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Deleting compounds.sqlite\n", "Copying default Zenodo compound cache to compounds.sqlite\n" ] } ], "source": [ "# Copies the default zenodo compounds.sqlite cache to file location\n", "# If that location already exists, user is prompted to delete\n", "lc.generate_local_cache_from_default_zenodo('compounds.sqlite')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading an already existing local cache" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "start_time": "2024-01-28T13:17:50.173271336Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loading compounds from compounds.sqlite\n" ] } ], "source": [ "# load the local cache from the .sqlite database\n", "lc.load_cache('compounds.sqlite')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating and adding compounds to the coco namespace\n", "\n", "`add_compounds` provides a method to take a data frame consisting of compound information and generating and adding new compounds into the database. When generated, three compound properties must be defined:\n", "\n", "1. `struct` - a SMILES string representing the compound structure\n", "1. `coco_id` - an ID (string) enabling use with the equilibrator-api parser, e.g. `my_compound` can be accessed using `coco:my_compound`\n", "2. `name` - the name of a compound that will appear when creating plots for analyses such as Max-min Driving Force (MDF)\n", "\n", "To generate compounds, a DataFrame must be provided following this example:\n", "\n", "| struct | coco_id | name |\n", "| :----- | :------ | :--- |\n", "| CCO | etoh | Ethanol |\n", "| C/C1=CC(\\O)=C/C(=O)O1 | TAL | Triacetic Acid Lactone |\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "start_time": "2024-01-28T13:17:50.175579658Z" } }, "outputs": [], "source": [ "def display_compound_result(cpd_result, print_identifiers: bool = True):\n", " print(\"structure =\", cpd_result.structure)\n", " print(\"method = \", cpd_result.method)\n", " print(\"status = \", cpd_result.status)\n", " if cpd_result.compound is not None:\n", " print(\"Compound ID =\", cpd_result.compound.id)\n", " print(\"pK_a =\", cpd_result.compound.dissociation_constants)\n", " print(\"pK_Mg =\", cpd_result.compound.magnesium_dissociation_constants)\n", " print(\"InChIKey =\", cpd_result.compound.inchi_key)\n", " print(\"standardized SMILES =\", cpd_result.compound.smiles)\n", " \n", " if print_identifiers:\n", " print(\"\\nidentifiers\\n-----------\")\n", " for _id in cpd_result.compound.identifiers:\n", " print(_id.registry.namespace, \":\", _id.accession)\n", " \n", " print(\"\\nmicrospecies\\n------------\")\n", " for _ms in cpd_result.compound.microspecies:\n", " print(f\"charge = {_ms.charge}, number of H+ = {_ms.number_protons}, number of Mg2+ = {_ms.number_magnesiums}, ΔΔG/RT = {_ms.ddg_over_rt:.2f}\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "start_time": "2024-01-28T13:17:50.179217786Z" }, "tags": [] }, "outputs": [], "source": [ "# Generating an example .csv for adding compounds\n", "# 3A4HA is already present, but custom names can be added\n", "# to the coco namespace\n", "compound_df = pd.DataFrame(\n", " data=[\n", " [\"OC(=O)C1=CC(NC(=O)C2=CC=CC=C2)=C(O)C=C1\", \"3B4HA\", \"3-Benzamido-4-hydroxybenzoic acid\"],\n", " [\"NC1=C(O)C=CC(=C1)C(O)=O\", \"3A4HA\", \"3-Amino-4-hydroxybenzoic acid\"] \n", " ],\n", " columns=[\"struct\",\"coco_id\", \"name\"]\n", ")\n", "\n", "lc.add_compounds(compound_df, mol_format=\"smiles\")\n", "# added compound has the ID 3B4HA that can be access as coco:3B4HA\n", "# and prints as 3-Amino-4-hydroxybenzoic acid in plots\n", "cpd_results = lc.get_compounds([\"OC(=O)C1=CC(NC(=O)C2=CC=CC=C2)=C(O)C=C1\"])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "start_time": "2024-01-28T13:17:50.224772825Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "structure = OC(=O)C1=CC(NC(=O)C2=CC=CC=C2)=C(O)C=C1\n", "method = database\n", "status = valid\n", "Compound ID = 694325\n", "pK_a = [8.92, 4.23]\n", "pK_Mg = []\n", "InChIKey = RKCVLDMDZASBEO-UHFFFAOYSA-N\n", "standardized SMILES = OC1=C(NC(=O)C2=CC=CC=C2)C=C(C=C1)C([O-])=O\n", "\n", "identifiers\n", "-----------\n", "coco : 3B4HA\n", "synonyms : 3-Benzamido-4-hydroxybenzoic acid\n", "\n", "microspecies\n", "------------\n", "charge = -2, number of H+ = 9, number of Mg2+ = 0, ΔΔG/RT = 20.54\n", "charge = -1, number of H+ = 10, number of Mg2+ = 0, ΔΔG/RT = 0.00\n", "charge = 0, number of H+ = 11, number of Mg2+ = 0, ΔΔG/RT = -9.74\n" ] } ], "source": [ "display_compound_result(cpd_results[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the coco namespace to define reactions with `equilibrator-api`\n", "This method uses the `equilibrator_api` and the `LocalCompoundCache` to enable custom-compound use. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "start_time": "2024-01-28T13:17:50.225218834Z" }, "tags": [] }, "outputs": [], "source": [ "from equilibrator_api import ComponentContribution, Q_\n", "# the local cache is passed to ComponentContribution\n", "cc = ComponentContribution(ccache = lc.ccache)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "start_time": "2024-01-28T13:17:50.225573781Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ΔG0 = (39.1 +/- 3.4) kilojoule / mole\n", "ΔG'0 = (-2.0 +/- 3.4) kilojoule / mole\n", "ΔG'm = (-19.1 +/- 3.4) kilojoule / mole\n" ] } ], "source": [ "# use coco:ID to access user-specified coco namespace\n", "rxn = cc.parse_reaction_formula(\"coco:3B4HA + kegg:C00001 = coco:3A4HA + kegg:C00180\")\n", "if not rxn.is_balanced():\n", " print('%s is not balanced' % rxn)\n", "\n", "cc.p_h = Q_(7) # set pH\n", "cc.ionic_strength = Q_(\"100 mM\") # set I\n", "\n", "print(f\"ΔG0 = {cc.standard_dg(rxn)}\")\n", "print(f\"ΔG'0 = {cc.standard_dg_prime(rxn)}\")\n", "print(f\"ΔG'm = {cc.physiological_dg_prime(rxn)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using `get_compounds` to directly generate `Compound` objects \n", "The `get_compounds` method accepts a single string or a list of strings that are molecule structures in either smiles or inchi form. \n", "The database is queried for each molecule and any misses are generated and inserted into the database. A list of compounds is returned.\n", "\n", "Generated compounds are assigned an ID that is one greater than the current largest ID." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "start_time": "2024-01-28T13:17:50.225817449Z" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "==============================\n", "*** Open Babel Warning in InChI code\n", " #1 :Omitted undefined stereo\n", "==============================\n", "*** Open Babel Warning in InChI code\n", " #1 :Omitted undefined stereo\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "--------------------------------------------------------------------------------\n", "structure = CC(=O)O\n", "method = database\n", "status = valid\n", "Compound ID = 28\n", "pK_a = [4.54]\n", "pK_Mg = [MagnesiumDissociationConstant(compound_id=28, number_protons=3, number_magnesiums=1)]\n", "InChIKey = QTBSBXVTEAMEQO-UHFFFAOYSA-M\n", "standardized SMILES = CC([O-])=O\n", "\n", "microspecies\n", "------------\n", "charge = -1, number of H+ = 3, number of Mg2+ = 0, ΔΔG/RT = 0.00\n", "charge = 0, number of H+ = 4, number of Mg2+ = 0, ΔΔG/RT = -10.45\n", "charge = 1, number of H+ = 3, number of Mg2+ = 1, ΔΔG/RT = -186.15\n", "--------------------------------------------------------------------------------\n", "structure = CC(O)C(=O)O\n", "method = database\n", "status = valid\n", "Compound ID = 2667\n", "pK_a = [3.78]\n", "pK_Mg = []\n", "InChIKey = JVTAAEKCZFNVCJ-UHFFFAOYSA-M\n", "standardized SMILES = CC(O)C([O-])=O\n", "\n", "microspecies\n", "------------\n", "charge = -1, number of H+ = 5, number of Mg2+ = 0, ΔΔG/RT = 0.00\n", "charge = 0, number of H+ = 6, number of Mg2+ = 0, ΔΔG/RT = -8.70\n", "--------------------------------------------------------------------------------\n", "structure = CCCOP(=O)(O)O\n", "method = database\n", "status = valid\n", "Compound ID = 694326\n", "pK_a = [6.84, 1.82]\n", "pK_Mg = []\n", "InChIKey = MHZDONKZSXBOGL-UHFFFAOYSA-N\n", "standardized SMILES = CCCOP([O-])([O-])=O\n", "\n", "microspecies\n", "------------\n", "charge = -2, number of H+ = 7, number of Mg2+ = 0, ΔΔG/RT = 0.00\n", "charge = -1, number of H+ = 8, number of Mg2+ = 0, ΔΔG/RT = -15.75\n", "charge = 0, number of H+ = 9, number of Mg2+ = 0, ΔΔG/RT = -19.94\n", "--------------------------------------------------------------------------------\n", "structure = OCC(N)C(O)CO\n", "method = database\n", "status = valid\n", "Compound ID = 694327\n", "pK_a = [13.69, 8.92]\n", "pK_Mg = []\n", "InChIKey = PMLGQXIKBPFHJZ-UHFFFAOYSA-N\n", "standardized SMILES = [NH3+]C(CO)C(O)CO\n", "\n", "microspecies\n", "------------\n", "charge = -1, number of H+ = 10, number of Mg2+ = 0, ΔΔG/RT = 52.06\n", "charge = 0, number of H+ = 11, number of Mg2+ = 0, ΔΔG/RT = 20.54\n", "charge = 1, number of H+ = 12, number of Mg2+ = 0, ΔΔG/RT = 0.00\n" ] } ], "source": [ "cpd_results = lc.get_compounds([\"CC(=O)O\", \"CC(O)C(=O)O\", 'CCCOP(=O)(O)O', \"OCC(N)C(O)CO\"])\n", "\n", "for cpd_res in cpd_results:\n", " print(\"-\" * 80)\n", " display_compound_result(cpd_res, print_identifiers=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Highlighting local cache persistence\n", "Compounds remain in the local cache between runs. To highlight this, two compounds are added to local cache and given ids. The cache is reloaded and the compounds are queried in reverse, showing the ids remain with the specific compound." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2024-01-28T13:17:50.290021697Z", "start_time": "2024-01-28T13:17:50.225999990Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before Reload\n", "--------------------------------------------------------------------------------\n", "structure = C(CC)CCOP(=O)(O)O\n", "method = database\n", "status = valid\n", "Compound ID = 694328\n", "pK_a = [6.83, 1.81]\n", "pK_Mg = []\n", "InChIKey = NVTPMUHPCAUGCB-UHFFFAOYSA-N\n", "standardized SMILES = CCCCCOP([O-])([O-])=O\n", "\n", "identifiers\n", "-----------\n", "synonyms : 694328\n", "\n", "microspecies\n", "------------\n", "charge = -2, number of H+ = 11, number of Mg2+ = 0, ΔΔG/RT = 0.00\n", "charge = -1, number of H+ = 12, number of Mg2+ = 0, ΔΔG/RT = -15.73\n", "charge = 0, number of H+ = 13, number of Mg2+ = 0, ΔΔG/RT = -19.89\n", "--------------------------------------------------------------------------------\n", "structure = C(CCC)CCOP(=O)(O)O\n", "method = database\n", "status = valid\n", "Compound ID = 694329\n", "pK_a = [6.83, 1.81]\n", "pK_Mg = []\n", "InChIKey = PHNWGDTYCJFUGZ-UHFFFAOYSA-N\n", "standardized SMILES = CCCCCCOP([O-])([O-])=O\n", "\n", "identifiers\n", "-----------\n", "synonyms : 694329\n", "\n", "microspecies\n", "------------\n", "charge = -2, number of H+ = 13, number of Mg2+ = 0, ΔΔG/RT = 0.00\n", "charge = -1, number of H+ = 14, number of Mg2+ = 0, ΔΔG/RT = -15.73\n", "charge = 0, number of H+ = 15, number of Mg2+ = 0, ΔΔG/RT = -19.89\n", "\n", "\n", "Loading compounds from compounds.sqlite\n", "\n", "\n", "After Reload\n", "--------------------------------------------------------------------------------\n", "structure = C(CCC)CCOP(=O)(O)O\n", "method = database\n", "status = valid\n", "Compound ID = 694329\n", "pK_a = [6.83, 1.81]\n", "pK_Mg = []\n", "InChIKey = PHNWGDTYCJFUGZ-UHFFFAOYSA-N\n", "standardized SMILES = CCCCCCOP([O-])([O-])=O\n", "\n", "identifiers\n", "-----------\n", "synonyms : 694329\n", "\n", "microspecies\n", "------------\n", "charge = -2, number of H+ = 13, number of Mg2+ = 0, ΔΔG/RT = 0.00\n", "charge = -1, number of H+ = 14, number of Mg2+ = 0, ΔΔG/RT = -15.73\n", "charge = 0, number of H+ = 15, number of Mg2+ = 0, ΔΔG/RT = -19.89\n", "--------------------------------------------------------------------------------\n", "structure = C(CC)CCOP(=O)(O)O\n", "method = database\n", "status = valid\n", "Compound ID = 694328\n", "pK_a = [6.83, 1.81]\n", "pK_Mg = []\n", "InChIKey = NVTPMUHPCAUGCB-UHFFFAOYSA-N\n", "standardized SMILES = CCCCCOP([O-])([O-])=O\n", "\n", "identifiers\n", "-----------\n", "synonyms : 694328\n", "\n", "microspecies\n", "------------\n", "charge = -2, number of H+ = 11, number of Mg2+ = 0, ΔΔG/RT = 0.00\n", "charge = -1, number of H+ = 12, number of Mg2+ = 0, ΔΔG/RT = -15.73\n", "charge = 0, number of H+ = 13, number of Mg2+ = 0, ΔΔG/RT = -19.89\n" ] } ], "source": [ "# get two new compounds\n", "cpds_before = lc.get_compounds([\"C(CC)CCOP(=O)(O)O\", \"C(CCC)CCOP(=O)(O)O\"])\n", "\n", "print('Before Reload')\n", "for cpd in cpds_before:\n", " print(\"-\" * 80)\n", " display_compound_result(cpd)\n", "\n", "print(\"\\n\")\n", "# reload cache\n", "lc.ccache.session.close()\n", "lc.load_cache('compounds.sqlite')\n", "print(\"\\n\")\n", "\n", "# query compounds in reverse\n", "# ids stay with inchi keys, indicating compound persistence in the local cache\n", "cpds_after = lc.get_compounds([\"C(CCC)CCOP(=O)(O)O\", \"C(CC)CCOP(=O)(O)O\"])\n", "\n", "print('After Reload')\n", "for cpd in cpds_after:\n", " print(\"-\" * 80)\n", " display_compound_result(cpd)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploring More Options of add_compounds and get_compounds\n", "There are a number of options to further control the behavior of get_compounds that will be explained below:\n", "\n", "1. Varying the inchi-key connectivity for searches\n", "2. Handling compound creation errors\n", " - Investigting Log\n", " - Bypassing Chemaxon\n", " - Inserting Empty Compounds\n", " - Returning Failed Compounds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inchi-key block control over searches\n", "The connectivity_only option in get_compounds allows for the use of only the first block in the InChI key to be used in a search, otherwise the first two blocks will be used.\n", "\n", "An example is shown with D-Glucose and L-Glucose. The connectivity-only searches yield the same results, as is expected." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "start_time": "2024-01-28T13:17:50.226180707Z" } }, "outputs": [], "source": [ "cc = ComponentContribution()\n", "TRAINING_IDS = cc.predictor.params.train_G.index\n", "\n", "d_glucose_con = lc.get_compounds(['C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O'], connectivity_only=True)[0]\n", "d_glucose = lc.get_compounds(['C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O'], connectivity_only=False)[0]\n", "l_glucose_con = lc.get_compounds(['O[C@@H]1[C@@H](O)[C@@H](OC(O)[C@H]1O)CO'], connectivity_only=True)[0]\n", "l_glucose = lc.get_compounds(['O[C@@H]1[C@@H](O)[C@@H](OC(O)[C@H]1O)CO'], connectivity_only=False)[0]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "start_time": "2024-01-28T13:17:50.226536167Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "D-Glucose Search\n", "Two InChI Key blocks: False\n", "structure = C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O\n", "method = database\n", "status = valid\n", "Compound ID = 93\n", "pK_a = [13.58, 12.69, 11.3]\n", "pK_Mg = []\n", "InChIKey = WQZGKKKJIJFFOK-DVKNGEFBSA-N\n", "standardized SMILES = OC[C@H]1O[C@H](O)[C@H](O)[C@@H](O)[C@@H]1O\n", "\n", "microspecies\n", "------------\n", "charge = -3, number of H+ = 9, number of Mg2+ = 0, ΔΔG/RT = 86.51\n", "charge = -2, number of H+ = 10, number of Mg2+ = 0, ΔΔG/RT = 55.24\n", "charge = -1, number of H+ = 11, number of Mg2+ = 0, ΔΔG/RT = 26.02\n", "charge = 0, number of H+ = 12, number of Mg2+ = 0, ΔΔG/RT = 0.00\n", "\n", "Connectivity Only: True\n", "structure = C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O\n", "method = database\n", "status = valid\n", "Compound ID = 43\n", "pK_a = [13.58, 12.69, 11.3]\n", "pK_Mg = []\n", "InChIKey = WQZGKKKJIJFFOK-GASJEMHNSA-N\n", "standardized SMILES = OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O\n", "\n", "microspecies\n", "------------\n", "charge = -3, number of H+ = 9, number of Mg2+ = 0, ΔΔG/RT = 86.51\n", "charge = -2, number of H+ = 10, number of Mg2+ = 0, ΔΔG/RT = 55.24\n", "charge = -1, number of H+ = 11, number of Mg2+ = 0, ΔΔG/RT = 26.02\n", "charge = 0, number of H+ = 12, number of Mg2+ = 0, ΔΔG/RT = 0.00\n", "\n", "\n", "L-Glucose Search\n", "Two InChI Key blocks: False\n", "structure = O[C@@H]1[C@@H](O)[C@@H](OC(O)[C@H]1O)CO\n", "method = database\n", "status = valid\n", "Compound ID = 11639\n", "pK_a = [13.58, 12.69, 11.3]\n", "pK_Mg = []\n", "InChIKey = WQZGKKKJIJFFOK-ZZWDRFIYSA-N\n", "standardized SMILES = OC[C@@H]1OC(O)[C@@H](O)[C@H](O)[C@H]1O\n", "\n", "microspecies\n", "------------\n", "charge = -3, number of H+ = 9, number of Mg2+ = 0, ΔΔG/RT = 86.51\n", "charge = -2, number of H+ = 10, number of Mg2+ = 0, ΔΔG/RT = 55.24\n", "charge = -1, number of H+ = 11, number of Mg2+ = 0, ΔΔG/RT = 26.02\n", "charge = 0, number of H+ = 12, number of Mg2+ = 0, ΔΔG/RT = 0.00\n", "\n", "Connectivity Only: True\n", "structure = O[C@@H]1[C@@H](O)[C@@H](OC(O)[C@H]1O)CO\n", "method = database\n", "status = valid\n", "Compound ID = 43\n", "pK_a = [13.58, 12.69, 11.3]\n", "pK_Mg = []\n", "InChIKey = WQZGKKKJIJFFOK-GASJEMHNSA-N\n", "standardized SMILES = OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O\n", "\n", "microspecies\n", "------------\n", "charge = -3, number of H+ = 9, number of Mg2+ = 0, ΔΔG/RT = 86.51\n", "charge = -2, number of H+ = 10, number of Mg2+ = 0, ΔΔG/RT = 55.24\n", "charge = -1, number of H+ = 11, number of Mg2+ = 0, ΔΔG/RT = 26.02\n", "charge = 0, number of H+ = 12, number of Mg2+ = 0, ΔΔG/RT = 0.00\n" ] } ], "source": [ "print(\"D-Glucose Search\")\n", "print(f\"Two InChI Key blocks: {d_glucose.compound.id in TRAINING_IDS}\")\n", "display_compound_result(d_glucose, print_identifiers=False)\n", "\n", "print(f\"\\nConnectivity Only: {d_glucose_con.compound.id in TRAINING_IDS}\")\n", "display_compound_result(d_glucose_con, print_identifiers=False)\n", "\n", "print('\\n')\n", "print(\"L-Glucose Search\")\n", "print(f\"Two InChI Key blocks: {l_glucose.compound.id in TRAINING_IDS}\")\n", "display_compound_result(l_glucose, print_identifiers=False)\n", "\n", "print(f\"\\nConnectivity Only: {l_glucose_con.compound.id in TRAINING_IDS}\")\n", "display_compound_result(l_glucose_con, print_identifiers=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Handling Compound Creation Errors\n", "Sometimes compounds fail to be decomposed. This is due to chemaxon errors or the structure being invalid. As a result, there are a few workarounds to this problem. Users can specify two options, `bypass_chemaxon` and `save_empty_compounds`, to get around these errors. \n", "\n", "`bypass_chemxon` will attempt to create a compound from the user-specified structure. If the compound cannot be decomposed even without `bypass_chemaxon=True` then it can still be saved as an empty compound by specifying `save_empty_compounds=True`. \n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" } }, "nbformat": 4, "nbformat_minor": 4 }