{ "cells": [ { "cell_type": "markdown", "id": "5b311c85", "metadata": {}, "source": [ "# Cookbook: Every way to make a `Molecule`" ] }, { "cell_type": "markdown", "id": "05bf5732", "metadata": {}, "source": [ "Every pathway through the OpenFF Toolkit boils down to four steps:\n", "\n", "1. Using other tools, assemble a graph of a molecule, including all of its atoms, bonds, bond orders, formal charges, and stereochemistry[^rs]\n", "2. Use that information to construct a [`Molecule`](openff.toolkit.topology.Molecule)\n", "3. Combine a number of `Molecule` objects to construct a [`Topology`](openff.toolkit.topology.Topology)\n", "4. Call [`ForceField.create_openmm_system(topology)`](openff.toolkit.typing.engines.smirnoff.forcefield.ForceField.create_openmm_system) to create an OpenMM [`System`](simtk.openmm.openmm.System) (or, in the near future, an OpenFF [`Interchange`](https://github.com/openforcefield/openff-interchange) for painless conversion to all sorts of MD formats)\n", "\n", "So let's take a look at every way there is to construct a molecule! We'll use zwitterionic L-alanine as an example biomolecule with all the tricky bits - a stereocenter, non-zero formal charges, and bonds of different orders.\n", "\n", "[^rs]: Note that this stereochemistry must be defined on the *graph* of the molecule. It's not good enough to just co-ordinates with the correct stereochemistry. But if you have the co-ordinates, you can try getting the stereochemistry automatically with `rdkit` or `openeye` --- If you dare!" ] }, { "cell_type": "code", "execution_count": null, "id": "0a3d1e21", "metadata": { "tags": [ "remove-stderr", "remove-input", "remove-stdout" ] }, "outputs": [], "source": [ "# Imports\n", "from openff.toolkit.topology import Molecule, Topology\n", "from openff.toolkit.typing.engines.smirnoff import ForceField\n", "\n", "# Hide tracebacks for simpler errors\n", "import sys\n", "ipython = get_ipython()\n", "\n", "def hide_traceback(exc_tuple=None, filename=None, tb_offset=None,\n", " exception_only=False, running_compiled_code=False):\n", " etype, value, tb = sys.exc_info()\n", " value.__cause__ = None # suppress chained exceptions\n", " return ipython._showtraceback(etype, value, ipython.InteractiveTB.get_exception_only(etype, value))\n", "\n", "ipython.showtraceback = hide_traceback\n" ] }, { "cell_type": "markdown", "id": "37d692a1", "metadata": {}, "source": [ "## From SMILES\n", "\n", "SMILES is the classic way to create a `Molecule`. SMILES is a widely-used compact textual representation of arbitrary molecules. This lets us specify an exact molecule, including stereochemistry and bond orders, very easily --- though they may not be the most human-readable format.\n", "\n", "The [`Molecule.from_smiles()`](openff.toolkit.topology.Molecule.from_smiles) method is used to create a `Molecule` from a SMILES code.\n", "\n", "### Implicit hydrogens SMILES" ] }, { "cell_type": "code", "execution_count": null, "id": "71199513", "metadata": {}, "outputs": [], "source": [ "zw_l_alanine = Molecule.from_smiles(\"C[C@H]([NH3+])C(=O)[O-]\")\n", "\n", "zw_l_alanine.visualize()" ] }, { "cell_type": "markdown", "id": "59fe1b58", "metadata": {}, "source": [ "### Explicit hydrogens SMILES" ] }, { "cell_type": "code", "execution_count": null, "id": "07e0584e", "metadata": {}, "outputs": [], "source": [ "smiles_explicit_h = Molecule.from_smiles(\n", " \"[H][C]([H])([H])[C@@]([H])([C](=[O])[O-])[N+]([H])([H])[H]\", \n", " hydrogens_are_explicit=True\n", ")\n", "\n", "assert zw_l_alanine.is_isomorphic_with(smiles_explicit_h)\n", "\n", "smiles_explicit_h.visualize()" ] }, { "cell_type": "markdown", "id": "6e1adb21", "metadata": {}, "source": [ "### Mapped SMILES\n", "\n", "By default, no guarantees are made about the indexing of atoms from a SMILES string. If the indexing is important, a mapped SMILES string may be used. In this case, Hydrogens must be explicit. Note that though mapped SMILES strings must start at index 1, Python lists start at index 0." ] }, { "cell_type": "code", "execution_count": null, "id": "178de2c5", "metadata": {}, "outputs": [], "source": [ "mapped_smiles = Molecule.from_mapped_smiles(\n", " \"[H:10][C:2]([H:7])([H:8])[C@@:4]([H:9])([C:3](=[O:5])[O-:6])[N+:1]([H:11])([H:12])[H:13]\"\n", ")\n", "\n", "assert zw_l_alanine.is_isomorphic_with(mapped_smiles)\n", "\n", "assert mapped_smiles.atoms[0].atomic_number == 7 # First index is the Nitrogen\n", "assert all([a.atomic_number==1 for a in mapped_smiles.atoms[6:]]) # Final indices are all H\n", "\n", "mapped_smiles.visualize()" ] }, { "cell_type": "markdown", "id": "6897144e", "metadata": {}, "source": [ "### SMILES without stereochemistry\n", "\n", "The Toolkit won't accept an ambiguous SMILES. This SMILES could be L- or D- alanine; rather than guess, the Toolkit throws an error:" ] }, { "cell_type": "code", "execution_count": null, "id": "ec669ad6", "metadata": { "scrolled": true, "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "smiles_non_isomeric = Molecule.from_smiles(\n", " \"CC([NH3+])C(=O)[O-]\"\n", ")" ] }, { "cell_type": "markdown", "id": "821c4687", "metadata": {}, "source": [ "We can downgrade this error to a warning with the `allow_undefined_stereo` argument. This will not apply an improper dihedral term to the stereocenter and may lead to simulations with unphysical stereoisomerisation." ] }, { "cell_type": "code", "execution_count": null, "id": "278e3cb1", "metadata": { "tags": [ "remove-stderr" ] }, "outputs": [], "source": [ "smiles_non_isomeric = Molecule.from_smiles(\n", " \"CC([NH3+])C(=O)[O-]\",\n", " allow_undefined_stereo=True\n", ")\n", "\n", "assert not zw_l_alanine.is_isomorphic_with(smiles_non_isomeric)\n", "\n", "smiles_non_isomeric.visualize()" ] }, { "cell_type": "markdown", "id": "da716060", "metadata": {}, "source": [ "## By hand\n", "\n", "You can always construct a `Molecule` by building it up from individual atoms and bonds. Other methods are generally easier, but it's a useful fallback for when you need to write your own constructor for an unsupported source format.\n", "\n", "The [`Molecule()`](openff.toolkit.topology.Molecule.__init__) constructor and the [`add_atom()`](openff.toolkit.topology.Molecule.add_atom) and [`add_bond()`](openff.toolkit.topology.Molecule.add_bond) methods are used to construct a `Molecule` by hand." ] }, { "cell_type": "code", "execution_count": null, "id": "aa8746eb", "metadata": {}, "outputs": [], "source": [ "by_hand = Molecule()\n", "by_hand.name = \"Zwitterionic l-Alanine\"\n", "\n", "by_hand.add_atom(\n", " atomic_number = 8, # Atomic number 8 is Oxygen\n", " formal_charge = -1, # Formal negative charge\n", " is_aromatic = False, # Atom is not part of an aromatic system\n", " stereochemistry = None, # Optional argument; \"R\" or \"S\" stereochemistry\n", " name = \"O-\" # Optional argument; descriptive name for the atom\n", ")\n", "by_hand.add_atom(6, 0, False, name=\"C\")\n", "by_hand.add_atom(8, 0, False, name=\"O\")\n", "by_hand.add_atom(6, 0, False, stereochemistry=\"S\", name=\"CA\")\n", "by_hand.add_atom(1, 0, False, name=\"CAH\")\n", "by_hand.add_atom(6, 0, False, name=\"CB\")\n", "by_hand.add_atom(1, 0, False, name=\"HB1\")\n", "by_hand.add_atom(1, 0, False, name=\"HB2\")\n", "by_hand.add_atom(1, 0, False, name=\"HB3\")\n", "by_hand.add_atom(7, +1, False, name=\"N+\")\n", "by_hand.add_atom(1, 0, False, name=\"HN1\")\n", "by_hand.add_atom(1, 0, False, name=\"HN2\")\n", "by_hand.add_atom(1, 0, False, name=\"HN3\")\n", "\n", "\n", "by_hand.add_bond( \n", " atom1 = 0, # First (zero-indexed) atom specified above (\"O-\") \n", " atom2 = 1, # Second atom specified above (\"C\")\n", " bond_order = 1, # Single bond\n", " is_aromatic = False, # Bond is not aromatic\n", " stereochemistry = None, # Optional argument; \"E\" or \"Z\" stereochemistry\n", " fractional_bond_order = None # Optional argument; Wiberg (or similar) bond order\n", ")\n", "by_hand.add_bond( 1, 2, 2, False) # C = O\n", "by_hand.add_bond( 1, 3, 1, False) # C - CA\n", "by_hand.add_bond( 3, 4, 1, False) # CA - CAH\n", "by_hand.add_bond( 3, 5, 1, False) # CA - CB\n", "by_hand.add_bond( 5, 6, 1, False) # CB - HB1\n", "by_hand.add_bond( 5, 7, 1, False) # CB - HB2\n", "by_hand.add_bond( 5, 8, 1, False) # CB - HB3\n", "by_hand.add_bond( 3, 9, 1, False) # CB - N+\n", "by_hand.add_bond( 9, 10, 1, False) # N+ - HN1\n", "by_hand.add_bond( 9, 11, 1, False) # N+ - HN2\n", "by_hand.add_bond( 9, 12, 1, False) # N+ - HN3\n", "\n", "assert zw_l_alanine.is_isomorphic_with(by_hand)\n", "\n", "by_hand.visualize()" ] }, { "cell_type": "markdown", "id": "15931dd9", "metadata": {}, "source": [ "### From a dictionary\n", "\n", "Rather than build up the `Molecule` one method at a time, the [`Molecule.from_dict()`](openff.toolkit.topology.Molecule.from_dict) method can construct a `Molecule` in one shot from a Python `dict` that describes the molecule in question. This allows `Molecule` objects to be written to and read from disk in any format that can be interpreted as a `dict`; this mechanism underlies the [`from_bson()`](openff.toolkit.topology.Molecule.from_bson), [`from_json()`](openff.toolkit.topology.Molecule.from_json), [`from_messagepack()`](openff.toolkit.topology.Molecule.from_messagepack), [`from_pickle()`](openff.toolkit.topology.Molecule.from_pickle), [`from_toml()`](openff.toolkit.topology.Molecule.from_toml), [`from_xml()`](openff.toolkit.topology.Molecule.from_xml), and [`from_yaml()`](openff.toolkit.topology.Molecule.from_yaml) methods.\n", "\n", "This format can get very verbose, as it is intended for serialization, so this example uses hydrogen cyanide rather than alanine." ] }, { "cell_type": "code", "execution_count": null, "id": "dc756a7f", "metadata": {}, "outputs": [], "source": [ "molecule_dict = {\n", " \"name\": \"\",\n", " \"atoms\": [\n", " {\n", " \"atomic_number\": 1,\n", " \"formal_charge\": 0,\n", " \"is_aromatic\": False,\n", " \"stereochemistry\": None,\n", " \"name\": \"H\",\n", " },\n", " {\n", " \"atomic_number\": 6,\n", " \"formal_charge\": 0,\n", " \"is_aromatic\": False,\n", " \"stereochemistry\": None,\n", " \"name\": \"C\",\n", " },\n", " {\n", " \"atomic_number\": 7,\n", " \"formal_charge\": 0,\n", " \"is_aromatic\": False,\n", " \"stereochemistry\": None,\n", " \"name\": \"N\",\n", " },\n", " ],\n", " \"virtual_sites\": [],\n", " \"bonds\": [\n", " {\n", " \"atom1\": 0,\n", " \"atom2\": 1,\n", " \"bond_order\": 1,\n", " \"is_aromatic\": False,\n", " \"stereochemistry\": None,\n", " \"fractional_bond_order\": None,\n", " },\n", " {\n", " \"atom1\": 1,\n", " \"atom2\": 2,\n", " \"bond_order\": 3,\n", " \"is_aromatic\": False,\n", " \"stereochemistry\": None,\n", " \"fractional_bond_order\": None,\n", " },\n", " ],\n", " \"properties\": {},\n", " \"conformers\": None,\n", " \"partial_charges\": None,\n", " \"partial_charges_unit\": None,\n", "}\n", "\n", "from_dictionary = Molecule.from_dict(molecule_dict)\n", "\n", "from_dictionary.visualize()" ] }, { "cell_type": "markdown", "id": "1b50a800", "metadata": {}, "source": [ "## From a file\n", "\n", "We can construct a `Molecule` from a file or file-like object with the [`from_file()`](openff.toolkit.topology.Molecule.from_file) method. We're a bit constrained in what file formats we can accept, because they need to provide all the information needed to construct the molecular graph; not just coordinates, but also elements, formal charges, bond orders, and stereochemistry." ] }, { "cell_type": "markdown", "id": "79aec960", "metadata": {}, "source": [ "### From SDF file\n", "\n", "We generally recommend the SDF format. The SDF file used here can be found [on GitHub](https://github.com/openforcefield/openff-toolkit/blob/master/docs/users/zw_l_alanine.sdf)" ] }, { "cell_type": "code", "execution_count": null, "id": "a78e4e0f", "metadata": {}, "outputs": [], "source": [ "sdf_path = Molecule.from_file(\"zw_l_alanine.sdf\")\n", "assert zw_l_alanine.is_isomorphic_with(sdf_path)\n", "sdf_path.visualize()" ] }, { "cell_type": "markdown", "id": "3508a6ed", "metadata": {}, "source": [ "### From SDF file object\n", "\n", "`from_file()` can also take a file object, rather than a path. Note that the object must be in binary mode!" ] }, { "cell_type": "code", "execution_count": null, "id": "1cecd71c", "metadata": {}, "outputs": [], "source": [ "with open(\"zw_l_alanine.sdf\", mode=\"rb\") as file:\n", " sdf_object = Molecule.from_file(file, file_format=\"SDF\")\n", " \n", "assert zw_l_alanine.is_isomorphic_with(sdf_object)\n", "sdf_object.visualize()" ] }, { "cell_type": "markdown", "id": "02653974", "metadata": {}, "source": [ "### From PDB file\n", "\n", "Using PDB files is not recommended, even if they have CONECT records, as they do not provide stereoisomeric information or bond orders. The RDKit backend assumes that bond orders are 1, so the toolkit refuses to use it:" ] }, { "cell_type": "code", "execution_count": null, "id": "f199d45f", "metadata": { "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "from openff.toolkit.utils.toolkits import RDKitToolkitWrapper\n", "\n", "pdb = Molecule.from_file(\"zw_l_alanine.pdb\", \"pdb\", toolkit_registry=RDKitToolkitWrapper())" ] }, { "cell_type": "markdown", "id": "3713f85f", "metadata": {}, "source": [ "OpenEye can infer bond orders and stereochemistry from the structure. This is not recommended, as it can make mistakes that may be difficult to catch. Note also that this requires a license for OpenEye, as this is proprietary software.\n", "\n", "If we instead provide a SMILES code, a PDB file can be used to populate the `Molecule` object's `conformers` attribute and provide atom ordering, as well as check that the SMILES code matches the PDB file. This method is the recommended way to create a `Molecule` from a PDB file. The PDB file used here can be found [on GitHub](https://github.com/openforcefield/openff-toolkit/blob/master/docs/users/zw_l_alanine.pdb)\n", "\n", ":::{important}\n", "Note that the Toolkit doesn't guarantee that the coordinates in the PDB are correctly assigned to atoms. It makes an effort, but you should check that the results are reasonable.\n", ":::" ] }, { "cell_type": "code", "execution_count": null, "id": "1d003955", "metadata": {}, "outputs": [], "source": [ "pdb_with_smiles = Molecule.from_pdb_and_smiles(\n", " \"zw_l_alanine.pdb\", \n", " \"C[C@H]([NH3+])C(=O)[O-]\"\n", ") \n", "\n", "assert zw_l_alanine.is_isomorphic_with(pdb_with_smiles)\n", "\n", "pdb_with_smiles.visualize()" ] }, { "cell_type": "markdown", "id": "cbdde2f3", "metadata": {}, "source": [ "## Other string identification formats\n", "\n", "The OpenFF Toolkit supports a few text based molecular identity formats other than SMILES ([see above](#from-smiles))" ] }, { "cell_type": "markdown", "id": "bcd5fce1", "metadata": {}, "source": [ "### From InChI\n", "\n", "The [`Molecule.from_inchi()`](openff.toolkit.topology.Molecule.from_inchi) method constructs a `Molecule` from an IUPAC [InChI](https://iupac.org/who-we-are/divisions/division-details/inchi/) string. Note that InChI cannot distinguish the zwitterionic form of alanine from the neutral form (see section 13.2 of the [InChI Technical FAQ](https://www.inchi-trust.org/technical-faq-2/)), so the toolkit defaults to the neutral form.\n", "\n", ":::{warning}\n", "The OpenFF Toolkit makes no guarantees about the atomic ordering produced by the `from_inchi` method. InChI is not intended to be an interchange format.\n", ":::" ] }, { "cell_type": "code", "execution_count": null, "id": "1ea7bb81", "metadata": {}, "outputs": [], "source": [ "inchi = Molecule.from_inchi(\"InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5,6)/t2-/m0/s1\") \n", "\n", "inchi.visualize()" ] }, { "cell_type": "markdown", "id": "93592822", "metadata": {}, "source": [ "### From IUPAC name\n", "\n", "The [`Molecule.from_iupac()`](openff.toolkit.topology.Molecule.from_iupac) method constructs a `Molecule` from an IUPAC name.\n", "\n", ":::{important}\n", "This code requires the OpenEye toolkit.\n", ":::" ] }, { "cell_type": "code", "execution_count": null, "id": "839994a0", "metadata": { "tags": [ "remove-stderr", "raises-exception" ] }, "outputs": [], "source": [ "iupac = Molecule.from_iupac(\"(2S)-2-azaniumylpropanoate\")\n", "\n", "assert zw_l_alanine.is_isomorphic_with(iupac)\n", "\n", "iupac.visualize()" ] }, { "cell_type": "markdown", "id": "14d8a068", "metadata": {}, "source": [ "## Remapping an existing `Molecule`\n", "\n", "Most `Molecule` creation methods don't specify the ordering of atoms in the new `Molecule`. The [`Molecule.remap()`](openff.toolkit.topology.Molecule.remap) method allows a new ordering to be applied to an existing `Molecule`.\n", "\n", "See also [Mapped SMILES](#mapped-smiles).\n", "\n", ":::{warning}\n", "The `Molecule.remap()` method is experimental and subject to change. \n", ":::" ] }, { "cell_type": "code", "execution_count": null, "id": "c9a343c2", "metadata": {}, "outputs": [], "source": [ "# Note that this mapping is off-by-one from the mapping taken \n", "# by the remap method, as Python indexing is 0-based but SMILES\n", "# is 1-based\n", "print(\"Before remapping:\", zw_l_alanine.to_smiles(mapped=True))\n", "\n", "# Flip the positions of the oxygen atoms\n", "remapped = zw_l_alanine.remap({0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 6, 6: 5, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12})\n", "\n", "print(\"After remapping: \", remapped.to_smiles(mapped=True))\n", "\n", "# Doesn't affect the identity of the molecule\n", "assert zw_l_alanine.is_isomorphic_with(remapped)\n", "remapped.visualize()" ] }, { "cell_type": "markdown", "id": "d5e4330a", "metadata": {}, "source": [ "## Via `Topology` objects\n", "\n", "The [`Topology`](openff.toolkit.topology.Topology) class represents a biomolecular system; it is analogous to the similarly named objects in GROMACS, MDTraj or OpenMM. Notably, it does not include co-ordinates and may represent multiple copies of a particular molecular species or even more complex mixtures of molecules. `Topology` objects are usually built up one species at a time from `Molecule` objects.\n", "\n", "The [`Molecule.from_topology()`](openff.toolkit.topology.Molecule.from_topology) method constructs a `Molecule` from a `Topology`. This is usually going backwards, but the method does allow construction of `Molecule` objects from a few sources that represent molecular mixtures, like the aforementioned `Topology` or `System`.\n", "\n", "Constructor methods that are available for `Topology` but not `Molecule` generally require a `Molecule` to be provided via the `unique_molecules` keyword argument. The provided `Molecule` is used to provide the identity of the molecule, including aromaticity, bond orders, formal charges, and so forth. These methods therefore don't provide a route to the graph of the molecule, but can be useful for reordering atoms to match another software package." ] }, { "cell_type": "markdown", "id": "95eab062", "metadata": {}, "source": [ "### From an OpenMM `Topology`\n", "\n", "The [`Topology.from_openmm()`](openff.toolkit.topology.Topology.from_openmm) method constructs an OpenFF `Topology` from an OpenMM [`Topology`](simtk.openmm.app.topology.Topology). The method requires that all the unique molecules in the `Topology` are provided as OpenFF `Molecule` objects, as the structure of an OpenMM `Topology` doesn't include the concept of a molecule. When using this method to create a `Molecule`, this limitation means that the method really only offers a pathway to reorder the atoms of a `Molecule` to match that of the OpenMM `Topology`." ] }, { "cell_type": "code", "execution_count": null, "id": "40645664", "metadata": {}, "outputs": [], "source": [ "from simtk.openmm.app.pdbfile import PDBFile\n", "\n", "openmm_topology = PDBFile('zw_l_alanine.pdb').getTopology()\n", "openff_topology = Topology.from_openmm(openmm_topology, unique_molecules=[zw_l_alanine])\n", "\n", "from_openmm_topology = Molecule.from_topology(openff_topology)\n", "\n", "print(zw_l_alanine.to_smiles(mapped=True))\n", "print(from_openmm_topology.to_smiles(mapped=True))\n", "\n", "from_openmm_topology.visualize()" ] }, { "cell_type": "markdown", "id": "39e06787", "metadata": {}, "source": [ "### From an MDTraj `Topology`\n", "\n", "The [`Topology.from_mdtraj()`](openff.toolkit.topology.Topology.from_mdtraj) method constructs an OpenFF `Topology` from an MDTraj [`Topology`](mdtraj.Topology). The method requires that all the unique molecules in the `Topology` are provided as OpenFF `Molecule` objects to ensure that the graph of the molecule is correct. When using this method to create a `Molecule`, this limitation means that the method really only offers a pathway to reorder the atoms of a `Molecule` to match that of the MDTraj `Topology`." ] }, { "cell_type": "code", "execution_count": null, "id": "68ee9ed2", "metadata": {}, "outputs": [], "source": [ "from mdtraj import load_pdb\n", "\n", "mdtraj_topology = load_pdb('zw_l_alanine.pdb').topology\n", "openff_topology = Topology.from_openmm(openmm_topology, unique_molecules=[zw_l_alanine])\n", "\n", "from_mdtraj_topology = Molecule.from_topology(openff_topology)\n", "\n", "print(zw_l_alanine.to_smiles(mapped=True))\n", "print(from_mdtraj_topology.to_smiles(mapped=True))\n", "\n", "from_mdtraj_topology.visualize()" ] }, { "cell_type": "markdown", "id": "29c39bd1", "metadata": {}, "source": [ "## From Toolkit objects\n", "\n", "The OpenFF Toolkit calls out to other software to perform low-level tasks like reading SMILES or files. These external software packages are called toolkits, and presently include [RDKit](https://www.rdkit.org/) and the [OpenEye Toolkit](https://www.eyesopen.com/toolkit-development). OpenFF `Molecule` objects can be created from the equivalent objects in these toolkits." ] }, { "cell_type": "markdown", "id": "c6f2af84", "metadata": {}, "source": [ "### From RDKit `Mol`\n", "\n", "The [`Molecule.from_rdkit()`](openff.toolkit.topology.Molecule.from_rdkit) method converts an [`rdkit.Chem.rdchem.Mol`](rdkit.Chem.rdchem.Mol) object to an OpenFF `Molecule`." ] }, { "cell_type": "code", "execution_count": null, "id": "fae232e6", "metadata": {}, "outputs": [], "source": [ "from rdkit import Chem\n", "rdmol = Chem.MolFromSmiles(\"C[C@H]([NH3+])C([O-])=O\")\n", "\n", "print(\"rdmol is of type\", type(rdmol))\n", "\n", "from_rdmol = Molecule.from_rdkit(rdmol)\n", "\n", "assert zw_l_alanine.is_isomorphic_with(from_rdmol)\n", "from_rdmol.visualize()" ] }, { "cell_type": "markdown", "id": "6fe44fac", "metadata": {}, "source": [ "### From OpenEye `OEMol`\n", "\n", "The [`Molecule.from_openeye()`](openff.toolkit.topology.Molecule.from_rdkit) method converts an object that inherits from [`openeye.oechem.OEMolBase`](https://docs.eyesopen.com/toolkits/python/oechemtk/OEChemClasses/OEMolBase.html) to an OpenFF `Molecule`." ] }, { "cell_type": "code", "execution_count": null, "id": "c186257f", "metadata": {}, "outputs": [], "source": [ "from openeye import oechem\n", "\n", "oemol = oechem.OEGraphMol()\n", "oechem.OESmilesToMol(oemol, \"C[C@H]([NH3+])C([O-])=O\")\n", "\n", "assert isinstance(oemol, oechem.OEMolBase)\n", "\n", "from_oemol = Molecule.from_openeye(oemol)\n", "\n", "assert zw_l_alanine.is_isomorphic_with(from_oemol)\n", "from_oemol.visualize()" ] }, { "cell_type": "markdown", "id": "15765778", "metadata": {}, "source": [ "## From QCArchive\n", "\n", "[QCArchive](https://qcarchive.molssi.org/) is a repository of quantum chemical calculations on small molecules. The [`Molecule.from_qcschema()`](openff.toolkit.topology.Molecule.from_qcschema) method creates a `Molecule` from a record from the archive. Because the identity of a molecule can change of the course of a QC calculation, the Toolkit accepts records only if they contain a hydrogen-mapped SMILES code.\n", "\n", ":::{note}\n", "These examples use molecules other than l-Alanine because of their availability in QCArchive\n", ":::" ] }, { "cell_type": "markdown", "id": "0f143a62", "metadata": {}, "source": [ "### From a QCArchive molecule record\n", "\n", "The [`Molecule.from_qcschema()`](openff.toolkit.topology.Molecule.from_qcschema) method can take a molecule record queried from the QCArchive and create a `Molecule` from it." ] }, { "cell_type": "code", "execution_count": null, "id": "8e026b0f", "metadata": {}, "outputs": [], "source": [ "from qcportal import FractalClient\n", "\n", "client = FractalClient()\n", "query = client.query_molecules(molecular_formula=\"C16H20N3O5\")\n", "\n", "from_qcarchive = Molecule.from_qcschema(query[0])\n", " \n", "from_qcarchive.visualize()" ] }, { "cell_type": "markdown", "id": "3f1991fb", "metadata": {}, "source": [ "### From a QCArchive optimisation record\n", "\n", "`Molecule.from_qcschema()` can also take an optimisation record and create the corresponding `Molecule`." ] }, { "cell_type": "code", "execution_count": null, "id": "edd1fa9e", "metadata": {}, "outputs": [], "source": [ "optimization_dataset = client.get_collection(\n", " \"OptimizationDataset\",\n", " \"SMIRNOFF Coverage Set 1\"\n", ")\n", "dimethoxymethanol_optimization = optimization_dataset.get_entry('coc(o)oc-0')\n", "\n", "from_optimisation = Molecule.from_qcschema(dimethoxymethanol_optimization)\n", "\n", "from_optimisation.visualize()" ] } ], "metadata": { "celltoolbar": "Tags", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "08e16c2c73ef4ac4a787b30b441e2e7b": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "14c22e59a2ae49ec84dde8720e98d0b8": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "width": "34px" } }, "277cdcd24a494da5ba7e760a21598122": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "LinkModel", "state": { "source": [ "IPY_MODEL_e5615f7c0b6f4533b48a02529062c0e6", "value" ], "target": [ "IPY_MODEL_6a029f2e84fd477bae314907cf2e005c", "value" ] } }, "2ec14012be23440883e52d43582eb8c9": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "SliderStyleModel", "state": { "description_width": "" } }, "4935c92b167d4b729fbc084d8e557e0b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "LinkModel", "state": { "source": [ "IPY_MODEL_6a029f2e84fd477bae314907cf2e005c", "max" ], "target": [ "IPY_MODEL_99dcce34cf374921a6f43e46c4a4196e", "max_frame" ] } }, "49ddfb47a55c4a3b90ff121ac69c25c9": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "LinkModel", "state": { "source": [ "IPY_MODEL_e5615f7c0b6f4533b48a02529062c0e6", "max" ], "target": [ "IPY_MODEL_99dcce34cf374921a6f43e46c4a4196e", "max_frame" ] } }, "4dbc876c32d84f438d1602ec2de88647": { "model_module": "nglview-js-widgets", "model_module_version": "3.0.1", "model_name": "ColormakerRegistryModel", "state": { "_msg_ar": [], "_msg_q": [], "_ready": true, "layout": "IPY_MODEL_08e16c2c73ef4ac4a787b30b441e2e7b" } }, "54f53a2162ed4274ab9b12e5d3e56f18": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "58f6b1d22ecc4a01a8c3909de1bfda2e": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ButtonModel", "state": { "icon": "compress", "layout": "IPY_MODEL_14c22e59a2ae49ec84dde8720e98d0b8", "style": "IPY_MODEL_c06029f8e10b4a1299efea13fa2eb483" } }, "6a029f2e84fd477bae314907cf2e005c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "IntSliderModel", "state": { "layout": "IPY_MODEL_54f53a2162ed4274ab9b12e5d3e56f18", "max": 1, "style": "IPY_MODEL_2ec14012be23440883e52d43582eb8c9" } }, "6f050442d6c842318ee15566846ef686": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "771443f6186140aa804aa38e3031f952": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "819e31b0e73646bc8315f258c8712eb7": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "LinkModel", "state": { "source": [ "IPY_MODEL_e5615f7c0b6f4533b48a02529062c0e6", "value" ], "target": [ "IPY_MODEL_99dcce34cf374921a6f43e46c4a4196e", "frame" ] } }, "99dcce34cf374921a6f43e46c4a4196e": { "model_module": "nglview-js-widgets", "model_module_version": "3.0.1", "model_name": "NGLModel", "state": { "_camera_orientation": [ 22.810875528904155, 11.021739938823039, 4.410248991160865, 0, -10.534564284612403, 23.19768433819542, -3.4864719297669513, 0, -5.4728422941330335, 1.285993742067655, 25.093013114354285, 0, -0.07200002670288086, -0.10500001907348633, -0.11549997329711914, 1 ], "_camera_str": "orthographic", "_gui_theme": null, "_ibtn_fullscreen": "IPY_MODEL_58f6b1d22ecc4a01a8c3909de1bfda2e", "_igui": null, "_iplayer": "IPY_MODEL_cbddac53f5da4b3386e5bf926524f036", "_ngl_color_dict": {}, "_ngl_coordinate_resource": {}, "_ngl_full_stage_parameters": { "ambientColor": 14540253, "ambientIntensity": 0.2, "backgroundColor": "white", "cameraEyeSep": 0.3, "cameraFov": 40, "cameraType": "perspective", "clipDist": 10, "clipFar": 100, "clipNear": 0, "fogFar": 100, "fogNear": 50, "hoverTimeout": 0, "impostor": true, "lightColor": 14540253, "lightIntensity": 1, "mousePreset": "default", "panSpeed": 1, "quality": "medium", "rotateSpeed": 2, "sampleLevel": 0, "tooltip": true, "workerDefault": true, "zoomSpeed": 1.2 }, "_ngl_msg_archive": [ { "args": [ { "binary": false, "data": "MODEL 1\nHETATM 1 C1 UNL 1 -3.367 0.454 0.079 1.00 0.00 C \nHETATM 2 C2 UNL 1 -2.602 1.535 -0.295 1.00 0.00 C \nHETATM 3 C3 UNL 1 -1.231 1.477 -0.084 1.00 0.00 C \nHETATM 4 C4 UNL 1 -0.636 0.365 0.486 1.00 0.00 C \nHETATM 5 C5 UNL 1 -1.450 -0.687 0.840 1.00 0.00 C \nHETATM 6 C6 UNL 1 -2.830 -0.664 0.644 1.00 0.00 C \nHETATM 7 C7 UNL 1 0.825 0.297 0.714 1.00 0.00 C \nHETATM 8 C8 UNL 1 1.557 -0.296 -0.486 1.00 0.00 C \nHETATM 9 C9 UNL 1 3.000 -0.325 -0.168 1.00 0.00 C \nHETATM 10 O1 UNL 1 3.604 -1.407 -0.072 1.00 0.00 O \nHETATM 11 O2 UNL 1 3.693 0.851 0.024 1.00 0.00 O \nHETATM 12 N1 UNL 1 1.050 -1.653 -0.643 1.00 0.00 N \nHETATM 13 H1 UNL 1 -4.434 0.494 -0.083 1.00 0.00 H \nHETATM 14 H2 UNL 1 -3.047 2.399 -0.735 1.00 0.00 H \nHETATM 15 H3 UNL 1 -0.645 2.330 -0.380 1.00 0.00 H \nHETATM 16 H4 UNL 1 -1.035 -1.571 1.286 1.00 0.00 H \nHETATM 17 H5 UNL 1 -3.463 -1.505 0.929 1.00 0.00 H \nHETATM 18 H6 UNL 1 1.097 -0.281 1.622 1.00 0.00 H \nHETATM 19 H7 UNL 1 1.214 1.336 0.887 1.00 0.00 H \nHETATM 20 H8 UNL 1 1.362 0.286 -1.391 1.00 0.00 H \nHETATM 21 H9 UNL 1 4.578 0.909 0.537 1.00 0.00 H \nHETATM 22 H10 UNL 1 1.636 -2.189 -1.322 1.00 0.00 H \nHETATM 23 H11 UNL 1 1.123 -2.155 0.272 1.00 0.00 H \nCONECT 1 2 2 6 13\nCONECT 2 3 14\nCONECT 3 4 4 15\nCONECT 4 5 7\nCONECT 5 6 6 16\nCONECT 6 17\nCONECT 7 8 18 19\nCONECT 8 9 12 20\nCONECT 9 10 10 11\nCONECT 11 21\nCONECT 12 22 23\nENDMDL\nMODEL 2\nHETATM 1 C1 UNL 1 3.122 0.840 0.504 1.00 0.00 C \nHETATM 2 C2 UNL 1 2.197 1.676 -0.062 1.00 0.00 C \nHETATM 3 C3 UNL 1 1.012 1.186 -0.555 1.00 0.00 C \nHETATM 4 C4 UNL 1 0.687 -0.161 -0.508 1.00 0.00 C \nHETATM 5 C5 UNL 1 1.656 -0.982 0.078 1.00 0.00 C \nHETATM 6 C6 UNL 1 2.845 -0.492 0.572 1.00 0.00 C \nHETATM 7 C7 UNL 1 -0.560 -0.734 -1.010 1.00 0.00 C \nHETATM 8 C8 UNL 1 -1.597 -0.830 0.066 1.00 0.00 C \nHETATM 9 C9 UNL 1 -1.935 0.478 0.645 1.00 0.00 C \nHETATM 10 O1 UNL 1 -1.053 1.137 1.236 1.00 0.00 O \nHETATM 11 O2 UNL 1 -3.206 1.008 0.556 1.00 0.00 O \nHETATM 12 N1 UNL 1 -2.834 -1.413 -0.433 1.00 0.00 N \nHETATM 13 H1 UNL 1 4.035 1.233 0.881 1.00 0.00 H \nHETATM 14 H2 UNL 1 2.407 2.729 -0.120 1.00 0.00 H \nHETATM 15 H3 UNL 1 0.295 1.887 -1.008 1.00 0.00 H \nHETATM 16 H4 UNL 1 1.455 -2.054 0.143 1.00 0.00 H \nHETATM 17 H5 UNL 1 3.594 -1.142 1.026 1.00 0.00 H \nHETATM 18 H6 UNL 1 -0.419 -1.693 -1.556 1.00 0.00 H \nHETATM 19 H7 UNL 1 -0.941 -0.024 -1.802 1.00 0.00 H \nHETATM 20 H8 UNL 1 -1.273 -1.494 0.907 1.00 0.00 H \nHETATM 21 H9 UNL 1 -3.387 2.024 0.543 1.00 0.00 H \nHETATM 22 H10 UNL 1 -2.736 -2.388 -0.733 1.00 0.00 H \nHETATM 23 H11 UNL 1 -3.363 -0.791 -1.053 1.00 0.00 H \nCONECT 1 2 2 6 13\nCONECT 2 3 14\nCONECT 3 4 4 15\nCONECT 4 5 7\nCONECT 5 6 6 16\nCONECT 6 17\nCONECT 7 8 18 19\nCONECT 8 9 12 20\nCONECT 9 10 10 11\nCONECT 11 21\nCONECT 12 22 23\nENDMDL\n", "type": "blob" } ], "kwargs": { "defaultRepresentation": true, "ext": "pdb" }, "methodName": "loadFile", "reconstruc_color_scheme": false, "target": "Stage", "type": "call_method" } ], "_ngl_original_stage_parameters": { "ambientColor": 14540253, "ambientIntensity": 0.2, "backgroundColor": "white", "cameraEyeSep": 0.3, "cameraFov": 40, "cameraType": "perspective", "clipDist": 10, "clipFar": 100, "clipNear": 0, "fogFar": 100, "fogNear": 50, "hoverTimeout": 0, "impostor": true, "lightColor": 14540253, "lightIntensity": 1, "mousePreset": "default", "panSpeed": 1, "quality": "medium", "rotateSpeed": 2, "sampleLevel": 0, "tooltip": true, "workerDefault": true, "zoomSpeed": 1.2 }, "_ngl_repr_dict": { "0": { "0": { "params": { "aspectRatio": 1.5, "assembly": "default", "bondScale": 0.3, "bondSpacing": 0.75, "clipCenter": { "x": 0, "y": 0, "z": 0 }, "clipNear": 0, "clipRadius": 0, "colorMode": "hcl", "colorReverse": false, "colorScale": "", "colorScheme": "element", "colorValue": 9474192, "cylinderOnly": false, "defaultAssembly": "", "depthWrite": true, "diffuse": 16777215, "diffuseInterior": false, "disableImpostor": false, "disablePicking": false, "flatShaded": false, "interiorColor": 2236962, "interiorDarkening": 0, "lazy": false, "lineOnly": false, "linewidth": 2, "matrix": { "elements": [ 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1 ] }, "metalness": 0, "multipleBond": "off", "opacity": 1, "openEnded": true, "quality": "high", "radialSegments": 20, "radiusData": {}, "radiusScale": 2, "radiusSize": 0.15, "radiusType": "size", "roughness": 0.4, "sele": "", "side": "double", "sphereDetail": 2, "useInteriorColor": true, "visible": true, "wireframe": false }, "type": "ball+stick" } } }, "_ngl_serialize": false, "_ngl_version": "2.0.0-dev.36", "_ngl_view_id": [ "7CB702E7-220F-44BE-A927-D6D5DCFF620E" ], "_player_dict": {}, "_scene_position": {}, "_scene_rotation": {}, "_synced_model_ids": [], "_synced_repr_model_ids": [], "_view_height": "", "_view_width": "", "background": "white", "frame": 0, "gui_style": null, "layout": "IPY_MODEL_cc26efdb80e848fabb511db5c50159a0", "max_frame": 1, "n_components": 1, "picked": {} } }, "9ca77a345a3842969d13976ed63eddb9": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ImageModel", "state": { "layout": "IPY_MODEL_771443f6186140aa804aa38e3031f952", "width": "900.0" } }, "a43ee2b371934d8e9c17ce73887c1002": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "c06029f8e10b4a1299efea13fa2eb483": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ButtonStyleModel", "state": {} }, "cbddac53f5da4b3386e5bf926524f036": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_e5615f7c0b6f4533b48a02529062c0e6", "IPY_MODEL_6a029f2e84fd477bae314907cf2e005c" ], "layout": "IPY_MODEL_6f050442d6c842318ee15566846ef686" } }, "cc26efdb80e848fabb511db5c50159a0": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "d0f930a83f094fe1bba0d36e96bb8348": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "e5615f7c0b6f4533b48a02529062c0e6": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "PlayModel", "state": { "layout": "IPY_MODEL_d0f930a83f094fe1bba0d36e96bb8348", "max": 1, "style": "IPY_MODEL_a43ee2b371934d8e9c17ce73887c1002" } } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }