1. Installing the CDPL Python Bindings
To be able to follow this tutorial the CDPL Python bindings have to be installed on your computer. The most straightforward way to accomplish this task is to install the latest official release deposited on PyPI using the pip command as follows:
pip install cdpkit
Other ways to install the Python bindings are described in section Installation.
2. CDPL Package Overview
The CDPL comprises several sub-packages each providing functionality related to a certain aspect of chem- and pharmacoinformatics. The following table lists all available sub-packages together with a brief description of the kind of functionality they provide:
Package |
Contents |
|---|---|
Core classes defining a software framework for functionality implemented in the other CDPL packages |
|
Implementations of useful general purpose algorithms, containers, function objects and free functions |
|
Data structures, algorithms and functions related to mathematics |
|
Infrastructure for the in-memory representation, I/O and basic processing of molecular structures and reactions |
|
Functionality for the calculation/prediction of physicochemical and topological atom, bond and molecule properties |
|
Functionality for the I/O and processing of biological macromolecules |
|
Functionality for the generation and processing of pharmacophore and molecule descriptors |
|
Infrastructure for pharmacophore representation, I/O, perception, processing, alignment and screening |
|
Infrastructure for Gaussian volume-based molecular shape representation, processing, alignment and screening |
|
Implementation of MMFF94(s) for molecule conformer energy calculation and 3D structure optimization |
|
Functionality for molecule 3D structure and conformer ensemble generation |
|
Infrastructure for grid data storage, I/O and processing |
|
Functionality for the generation of GRAIL data sets [11] and GRADE descriptors [18] |
|
Functionality for molecule, reaction and 3D pharmacophore visualization |
3. Basic Concepts
3.1. Dynamic Properties
The CDPL stores properties associated with certain types of data like molecules, atoms, bonds, pharmacophores, etc. not as ordinary data members of the implementing classes but as key:value pairs in a dictionary (similar to the __dict__ attribute of Python objects). This design decision was made due to several advantages of this approach:
Flexibility and extensibility: new properties can be defined at runtime by user code
Class instance specific property values can be stored directly in the dictionary of the instance they are associated with, no external accompanying data structures are required for storing user-defined properties unknown to the CDPL
It can be easily determined whether the value of a particular property is available or not by checking if the dictionary contains a corresponding entry. C++ class data members (note that CDPL Python objects just wrap corresponding C++ class instances!) exist in memory after a class instance has been constructed and from that point on have a value. This is particularly problematic for properties that cannot be assigned a reasonable default value.
All CDPL classes supporting this kind of dynamic property storage are derived from class CDPL.Base.PropertyContainer which provides methods for property value lookup, storage, removal, iteration, existence testing and counting. Properties are identified by unique keys of type CDPL.Base.LookupKey that are created on-the-fly during the CDPL initialization phase. Keys of pre-defined CDPL properties are exported as static attributes of classes that follow the naming scheme CDPL.<PN>.<CN>Property. <PN> denotes the CDPL sub-package name (see table above) and <CN> is the name of a child class of CDPL.Base.PropertyContainer for which these properties have been defined (example: atom property keys accessible via class CDPL.Chem.AtomProperty). Property values virtually can be of any type and get stored in the dictionary as instances of the data wrapper class CDPL.Base.Any.
Since CDPL.Base.PropertyContainer methods acting upon a particular property always demand the key of the property as argument and setter/getter methods in addition require knowledge of the value type, corresponding code is not only tedious to write but also hard to read and error prone. Therefore, each CDPL sub-package that introduces properties also provides four free functions (at package level) per property that encapsulate the low-level CDPL.Base.PropertyContainer method calls. These functions internally not only specify the correct property key and value type but also constrain the type of the CDPL.Base.PropertyContainer subclass the property has been introduced for. CDPL.Chem.getOrder(), CDPL.Chem.setOrder(), CDPL.Chem.hasOrder() and CDPL.Chem.clearOrder() represent an example of such four functions that are provided for the property CDPL.Chem.BondProperty.ORDER of CDPL.Chem.Bond instances using integer as value type. Using property getter functions (like CDPL.Chem.getOrder()) has the additional benefit that they will, if one has been defined, automatically return a default value for unset properties. Defined property default values are exported and accessible as static attributes of classes that follow the naming scheme CDPL.<PN>.<CN>PropertyDefault (example: CDPL.Chem.BondPropertyDefault; for the meaning of <PN> and <CN> see text above).
3.2. Control-Parameters
Control-parameters are used for the runtime configuration of arbitrary functionality in a generic, flexible and functionality independent way (in the CDPL mainly used by the data I/O and visualization code). The implementation and usage of the control-parameter infrastructure largely parallels the one for dynamic properties:
Control-parameters are identified via unique instances of class CDPL.Base.LookupKey
Values can be of any type and are stored in a dictionary as CDPL.Base.Any objects
Keys of pre-defined control-parameters are exported as static attributes of classes that follow the naming scheme CDPL.<PN>.ControlParameter (<PN> = CDPL sub-package name, example: CDPL.Chem.ControlParameter)
Four convenience functions are provided for each control-parameter introduced by a package
CDPL classes employing the control-parameter infrastructure (directly or indirectly) are derived from class CDPL.Base.ControlParameterContainer. The class provides methods which are similar to those found in CDPL.Base.PropertyContainer but also offers methods (setParent() and getParent()) that allow to connect CDPL.Base.ControlParameterContainer instances in a parent-child manner. This way tree-like hierarchies of CDPL.Base.ControlParameterContainer instances for resolving parameter value requests can be built. If a requested parameter value is not stored in a given container, the request gets automatically forwarded to the registered parent container which may again forward the request to its parent until a value is found or the root of the tree has been reached. Furthermore, methods are provided which allow the registration of user-defined functions or function objects that get called on events such as parameter value change (methods registerParameterChangedCallback() and unregisterParameterChangedCallback()), parameter value removal (methods registerParameterRemovedCallback() and unregisterParameterRemovedCallback()) and parent change (methods registerParentChangedCallback() and unregisterParentChangedCallback()).
A notable difference between dynamic properties and control-parameters is that the latter always possess a default value which gets returned by the associated getter function if a parameter value has not been explicitly set. Control-parameter default values are exported and accessible as static attributes of classes that follow the naming scheme CDPL.<PN>.ControlParameterDefault (<PN> = CDPL sub-package name; example: CDPL.Chem.ControlParameterDefault).
3.3. Data I/O Framework
Classes implementing the input/output of data of a certain type in a particular format (e.g. molecular structures in SD-file format) are derived from abstract base classes that follow the naming scheme CDPL.<PN>.<DT>ReaderBase and CDPL.<PN>.<DT>WriterBase, respectively. <PN> denotes the CDPL sub-package name and <DT> is the name of the data type to read or write (e.g. classes CDPL.Chem.MoleculeReaderBase and CDPL.Chem.MolecularGraphWriterBase). These base classes are all derived from the abstract class CDPL.Base.DataIOBase which itself is derived from CDPL.Base.ControlParameterContainer. Instances of concrete classes implementing the I/O of data in a particular format thus support the configuration of their runtime behavior by control-parameters (see CDPL.Chem.ControlParameter for examples). The names of the format-specific classes all follow the scheme CDPL.<PN>.<FID><DT>Reader and CDPL.<PN>.<FID><DT>Writer, respectively where <PN> denotes the CDPL sub-package name, <FID> is a format identifier (usually a characteristic file extension) and <DT> is the name of the data type to read or write (e.g. CDPL.Chem.SDFMoleculeReader and CDPL.Chem.SDFMolecularGraphWriter).
Data reader classes all expect an instance of class CDPL.Base.IStream and data writer classes an instance of CDPL.Base.OStream as argument to their constructor. These stream-based I/O classes represent abstract storage devices which allow the same code to handle I/O to files, in-memory strings, or custom adaptor devices that perform arbitrary operations (e.g. compression) on the fly. Concrete types of storage devices are implemented by dedicated subclasses of CDPL.Base.IStream and CDPL.Base.OStream such as class CDPL.Base.FileIOStream for file I/O and CDPL.Base.StringIOStream for in-memory string data I/O, respectively.
Since files represent the most dealt-with kind of data storage, file I/O-specific variants of reader/writer classes are provided that make reading/writing data from/to files more convenient. These classes follow the naming scheme CDPL.<PN>.File<FID><DT>Reader and CDPL.<PN>.File<FID><DT>Writer (for the meaning of <PN>, <FID> and <DT> see text above). Instead of an instance of CDPL.Base.IStream/CDPL.Base.OStream they accept the path to a file as constructor argument and thus circumvent the need to explicitly create and manage instances of class CDPL.Base.FileIOStream.
Each data format implemented by the CDPL is described by an instance of class CDPL.Base.DataFormat which stores and gives access to relevant format-specific information such as common file-extensions or mime-type. Pre-defined data format descriptors are exported as static attributes of classes following the naming scheme CDPL.<PN>.DataFormat where <PN> is the name of the CDPL sub-package implementing the format (e.g. CDPL.Chem.DataFormat).
The link between a CDPL.Base.DataFormat instance describing a particular data format and associated classes implementing the reading/writing of data in this format gets established by dedicated input- and output-handler classes. These classes provide factory methods to create a reader/writer class instance for a given file path or CDPL.Base.IStream/CDPL.Base.OStream instance and follow the naming scheme CDPL.<PN>.<FID><DT>InputHandler and CDPL.<PN>.<FID><DT>OutputHandler, respectively (for the meaning of <PN>, <FID> and <DT> see text above; examples: CDPL.Chem.SDFMoleculeInputHandler, CDPL.Chem.SMILESMolecularGraphOutputHandler). For each data format supported by the CDPL an input- and/or output-handler class instance is registered at a data type-specific singleton class named CDPL.<PN>.<DT>IOManager (for the meaning of <PN> and <DT> see text above; example: CDPL.Chem.MoleculeIOManager). Amongst others, the I/O manager classes provide methods to lookup a registered handler instance for a given file extension, mime-type or CDPL.Base.DataFormat object. This way it is possible to, e.g., write code that creates a reader class instance for the input of data from a file where the actual data format is determined lateron at runtime. In order to facilitate the writing of data format-independent code the CDPL provides special reader and writer classes that perform the runtime lookup of a suitable input/output handler and reader/writer class instantiation automatically. The classes follow the naming scheme CDPL.<PN>.<DT>Reader and CDPL.<PN>.<DT>Writer, respectively (examples: CDPL.Chem.MoleculeReader and CDPL.Chem.MolecularGraphWriter). The constructors of the classes expect the data source/sink to be provided as a CDPL.Base.IStream/CDPL.Base.OStream instance or specified as path to a file. If a file path is specified it is attempted to deduce the data format from the file name’s extension. Optionally, a characteristic file extension string or a CDPL.Base.DataFormat instance can be provided in case the file extension is missing or unknown to the CDPL. If the data source/sink is provided as a CDPL.Base.IStream/CDPL.Base.OStream instance then the explicit specification of the data format is mandatory.
4. Working with Molecules
4.1. In-memory Representation of Molecular Structures
The CDPL models molecular structures as undirected graphs where atoms represent the graph nodes and bonds the edges. Concrete data structures for the in-memory representation of atoms, bonds and molecular graphs implement a hierarchy of interfaces (abstract classes) that specify all necessary methods for common operations like atom/bond addition, removal, access, membership testing, counting, and so on.
The following table provides an overview of the most relevant interfaces and data structures provided by the CDPL for molecular data representation and processing:
Class Name |
Class Type |
Parent Class(es) |
Description |
|---|---|---|---|
Interface |
Represents an arbitrary entity that can have a position in 3D space |
||
Interface |
None |
Represents a collection of CDPL.Chem.Entity3D instances and specifies methods for read-only instance access and querying their number |
|
Interface |
Represents a collection of CDPL.Chem.Atom instances and specifies methods for read-only instance access, querying their number, collection membership testing and ordering |
||
Interface |
None |
Represents a collection of CDPL.Chem.Bond instances and specifies methods for read-only instance access, querying their number, collection membership testing and ordering |
|
Interface |
CDPL.Chem.Entity3D, CDPL.Chem.AtomContainer, CDPL.Chem.BondContainer |
Represents an atom in molecular structures/graphs and provides additional connectivity and ownership related methods |
|
Implementation |
Default implementation of the CDPL.Chem.Atom interface |
||
Interface |
Represents a bond connecting two atoms in molecular structures/graphs, specifies additional connectivity and ownership related methods |
||
Implementation |
Default implementation of the CDPL.Chem.Bond interface |
||
Interface |
CDPL.Chem.AtomContainer, CDPL.Chem.BondContainer, CDPL.Base.PropertyContainer |
Represents an arbitrary molecular graph described by lists of CDPL.Chem.Atom and CDPL.Chem.Bond instances |
|
Interface |
Extends the CDPL.Chem.MolecularGraph interface by methods for atom and bond creation as well as methods for assignment of and merging with other molecular graphs |
||
Implementation |
Default implementation of the CDPL.Chem.Molecule interface |
||
Implementation |
Stores references (not copies!) to CDPL.Chem.Atom and CDPL.Chem.Bond objects owned/managed by one or more CDPL.Chem.Molecule instances |
4.2. Representation of Molecule Substructures
From scratch, a molecular graph can only be constructed via an instance of class CDPL.Chem.Molecule. Adding atoms and bonds by calling dedicated methods (see next section) will create new CDPL.Chem.Atom and CDPL.Chem.Bond objects which from that point on are owned and managed by the creating CDPL.Chem.Molecule instance. For the specification of arbitrary sets of CDPL.Chem.Atom and CDPL.Chem.Bond objects that belong to one or more CDPL.Chem.Molecule instance(s) the CDPL.Chem package provides the class CDPL.Chem.Fragment. Like CDPL.Chem.Molecule, this class also offers methods for adding atoms and bonds except that the methods of CDPL.Chem.Fragment expect existing CDPL.Chem.Atom or CDPL.Chem.Bond instances as argument. These do not get stored as copies but as light-weight references to the original instances which can be retrieved lateron by methods for atom/bond access. CDPL.Chem.Molecule as well as CDPL.Chem.Fragment are subclasses of CDPL.Chem.MolecularGraph and instances of both can be processed in the same way by any code that operates on CDPL.Chem.MolecularGraph objects.
4.3. Basic Operations on Molecule Objects
Most of the classes for molecular structure representation, molecular data I/O and functions for basic processing reside in package CDPL.Chem.
import CDPL.Chem as Chem
By the import line above the code in the remainder of this tutorial can conveniently access all package contents via the prefix Chem.*.
Furthermore, the CDPL Python bindings implement the Rich Output of Chem.MolecularGraph instances in Jupyter notebooks. Rich output is activated by importing the CDPL.Vis package and will be used in the following code snippets to display the skeletal formula of molecular graphs simply by typing the variable name at the end of a code cell.
import CDPL.Vis
4.3.1. Creation
A Chem.Molecule object not yet having any atoms and bonds can be created by instantiating the class Chem.BasicMolecule (the provided default implementation of the`Chem.Molecule`_ interface):
mol = Chem.BasicMolecule()
4.3.2. Querying Atom and Bond Counts
The number of (explicit) atoms can be queried either by acessing the property numAtoms or by calling the method getNumAtoms() which are both provided by the Chem.AtomContainer interface:
mol.numAtoms
# or
#mol.getNumAtoms()
0
In the same manner, the number of explicit bonds can be retrieved by the property numBonds or by calling the method getNumBonds() of the Chem.BondContainer interface:
mol.numBonds
# or
#mol.getNumBonds()
0
4.3.3. Creating Atoms and Bonds
Atoms are created by calling the method addAtom() provided by the Chem.Molecule interface:
a = mol.addAtom()
The method returns a Chem.Atom object which is owned by the creating Chem.Molecule instance mol. The created atom does not yet possess any chemical properties like element, formal charge, and so on. The value of these properties needs to be set explicitly by invoking dedicated property functions which take the atom and desired value of the property as arguments. For example
Chem.setType(a, Chem.AtomType.C)
The Chem.setType() function will set the type property of the atom to the atomic number of carbon. The value of the type property can be retrieved by the associated function Chem.getType()
Chem.getType(a)
6
In a similar fashion, bonds are created by calling the method addBond() which expects the indices (zero-based) of the two atoms to connect as arguments:
# add second carbon atom
Chem.setType(mol.addAtom(), Chem.AtomType.C)
b = mol.addBond(0, 1)
The method returns a Chem.Bond object which is also owned and managed by the creating Chem.Molecule instance mol. As with atoms, the created bond does not yet have any properties. To set the bond order to a value of 2 (= double bond) the property function Chem.setOrder() needs to be called:
Chem.setOrder(b, 2)
A previously set bond order property value can be retrieved by the accompanying getter function Chem.getOrder():
Chem.getOrder(b)
2
mol
To create a more complex molecule, e.g. Pyridine, from the Ethene fragment that is currently described by mol the following lines will do the trick:
# create missing atoms and set atom types
Chem.setType(mol.addAtom(), Chem.AtomType.C)
Chem.setType(mol.addAtom(), Chem.AtomType.C)
Chem.setType(mol.addAtom(), Chem.AtomType.C)
Chem.setType(mol.addAtom(), Chem.AtomType.N)
# create missing bonds and set orders
Chem.setOrder(mol.addBond(1, 2), 1)
Chem.setOrder(mol.addBond(2, 3), 2)
Chem.setOrder(mol.addBond(3, 4), 1)
Chem.setOrder(mol.addBond(4, 5), 2)
Chem.setOrder(mol.addBond(5, 0), 1)
mol.numBonds
6
mol.numAtoms
6
mol
4.3.4. Copying Atoms and Bonds
A deep copy of a chemical structure described by a Chem.MolecularGraph instance can be created in several ways. The first option is to pass the Chem.MolecularGraph instance as argument to the constructur of class Chem.BasicMolecule:
mol_copy = Chem.BasicMolecule(mol)
mol_copy
The second possibility is to replace the current atoms and bonds of an existing Chem.Molecule object by calling the method assign() or copy():
mol_copy = Chem.BasicMolecule()
Chem.setType(mol_copy.addAtom(), Chem.AtomType.C)
print(mol_copy.numAtoms)
mol_copy.assign(mol)
# or
# mol_copy.copy(mol)
mol_copy
1
A third option is to call the method clone() of the Chem.MolecularGraph interface on the Chem.Molecule instance to copy:
mol_copy = mol.clone()
assert mol_copy.objectID != mol.objectID
mol_copy
It is also possible to concatenate molecular structures either by calling the method append() or by using the inplace addition operator +=:
mol_copy.append(mol)
mol_copy
mol_copy += mol_copy
mol_copy
4.3.5. Accessing Atoms and Bonds
Atom and bonds of a molecular structure represented by a Chem.MolecularGraph instance can be accessed by calling the methods getAtom() (Chem.AtomContainer interface) and getBond() (Chem.BondContainer interface), respectively. These methods expect the zero-based index of the atom/bond in the parent molecular graphs’s atom/bond list as argument. Valid atom/bond indices are in the range [0, getNumAtoms())/[0, getNumBonds()). Specifying an index outside the allowed range will trigger an exception.
Example: Counting atom types and bond orders
type_counts = {}
order_counts = {}
for i in range(0, mol.numAtoms):
atom = mol.getAtom(i)
atom_type = Chem.getType(atom)
if atom_type in type_counts:
type_counts[atom_type] += 1
else:
type_counts[atom_type] = 1
for i in range(0, mol.numBonds):
bond = mol.getBond(i)
bond_order = Chem.getOrder(bond)
if bond_order in order_counts:
order_counts[bond_order] += 1
else:
order_counts[bond_order] = 1
print(f'Atom types: {type_counts}')
print(f'Bond orders: {order_counts}')
Atom types: {6: 5, 7: 1}
Bond orders: {2: 3, 1: 3}
Atoms and bonds can also be accessed in a sequential manner by iterating over the corresponding atom and bond lists. The atom sequence can be retrieved via the Chem.MolecularGraph interface by calling the method getAtoms() or accessing the property atoms. The bond sequence by method getBonds() or property bonds. The following code is an alternative version of the one above that employs sequential atom/bond access:
type_counts = {}
order_counts = {}
for atom in mol.atoms:
atom_type = Chem.getType(atom)
if atom_type in type_counts:
type_counts[atom_type] += 1
else:
type_counts[atom_type] = 1
for bond in mol.bonds:
bond_order = Chem.getOrder(bond)
if bond_order in order_counts:
order_counts[bond_order] += 1
else:
order_counts[bond_order] = 1
print(f'Atom types: {type_counts}')
print(f'Bond orders: {order_counts}')
Atom types: {6: 5, 7: 1}
Bond orders: {2: 3, 1: 3}
4.3.6. Removing all Atoms and Bonds
Atoms, bonds and properties can be removed completely by calling the method clear():
print(f'Num. atoms before clear(): {mol_copy.numAtoms}')
print(f'Num. bonds before clear(): {mol_copy.numBonds}')
mol_copy.clear()
print(f'Num. atoms after clear(): {mol_copy.numAtoms}')
print(f'Num. bonds after clear(): {mol_copy.numBonds}')
Num. atoms before clear(): 24
Num. bonds before clear(): 24
Num. atoms after clear(): 0
Num. bonds after clear(): 0
4.3.7. Removing single Atoms and Bonds
Single atoms and bonds can be removed by calling the methods removeAtom() and removeBond(), respectively. The methods expect the zero-based index of the atom/bond in the molecule’s atom/bond list as argument. Valid atom/bond indices are in the range [0, getNumAtoms())/[0, getNumBonds()). Specifying an index outside the allowed range will raise an exception.
mol_copy.assign(mol)
print(f'Num. atoms before removeAtom(1): {mol_copy.numAtoms}')
print(f'Num. bonds before removeAtom(1): {mol_copy.numBonds}')
# remove 2nd atom
mol_copy.removeAtom(1)
print(f'Num. atoms after removeAtom(1): {mol_copy.numAtoms}')
print(f'Num. bonds after removeAtom(1): {mol_copy.numBonds}')
mol_copy
Num. atoms before removeAtom(1): 6
Num. bonds before removeAtom(1): 6
Num. atoms after removeAtom(1): 5
Num. bonds after removeAtom(1): 4
As can be seen, the removal of an atom automatically triggers the removal of all incident bonds. This is necesessary to maintain molecular graph integrity. Removal of a bond, on the other hand, will only affect the bond count:
mol_copy.assign(mol)
print(f'Num. atoms before removeBond(2): {mol_copy.numAtoms}')
print(f'Num. bonds before removeBond(2): {mol_copy.numBonds}')
# remove 3rd bond
mol_copy.removeBond(2)
print(f'Num. atoms after removeBond(2): {mol_copy.numAtoms}')
print(f'Num. bonds after removeBond(2): {mol_copy.numBonds}')
mol_copy
Num. atoms before removeBond(2): 6
Num. bonds before removeBond(2): 6
Num. atoms after removeBond(2): 6
Num. bonds after removeBond(2): 5
Warning
Chem.Atom or Chem.Bond instances that are removed from their parent Chem.Molecule instance become invalid and performing any operations on such instances (e.g. method calls via variables still referencing them) results in undefined behavior!
4.3.8. Removing multiple Atoms and Bonds
Multiple atoms and bonds can be removed at once via the help of a Chem.Fragment instance that specifies the
atoms and bonds to remove. After adding atoms and bonds to the Chem.Fragment`_ instance their removal is initiated
either by calling the method remove() with the fragment object as argument or by inplace subtraction (-=) of the fragment object:
mol_copy = mol.clone()
frag = Chem.Fragment()
frag.addAtom(mol_copy.getAtom(0))
frag.addBond(mol_copy.getBond(1)) # this will also add the bonded atoms!
print(f'Num. fragment atoms: {frag.numAtoms}')
print(f'Num. fragment bonds: {frag.numBonds}')
print(f'Num. atoms before remove(frag): {mol_copy.numAtoms}')
print(f'Num. bonds before remove(frag): {mol_copy.numBonds}')
mol_copy.remove(frag)
# or
#mol_copy -= frag
print(f'Num. atoms after remove(frag): {mol_copy.numAtoms}')
print(f'Num. bonds after remove(frag): {mol_copy.numBonds}')
mol_copy
Num. fragment atoms: 3
Num. fragment bonds: 1
Num. atoms before remove(frag): 6
Num. bonds before remove(frag): 6
Num. atoms after remove(frag): 3
Num. bonds after remove(frag): 2
4.3.9. Testing Atom and Bond Ownership
Whether a particular Chem.Atom instance belongs to a given Chem.Molecule instance can be checked either
by calling the method containsAtom() (Chem.AtomContainer interface) or by the membership
test operator ìn as follows:
mol_copy.assign(mol)
mol.containsAtom(mol.atoms[0])
True
mol.getAtom(0) in mol
True
mol.containsAtom(mol_copy.getAtom(0))
False
mol_copy.atoms[0] in mol
False
Similarly, a Chem.Bond instance membership test can be performed by calling the method containsBond()
(Chem.BondContainer interface) or by using the ìn operator:
mol.containsBond(mol.bonds[0])
True
mol.getBond(0) in mol
True
mol.containsBond(mol_copy.getBond(0))
False
mol_copy.bonds[0] in mol
False
4.3.10. Retrieving Atom and Bond Indices
The index of a Chem.Atom instance in the atom list of the parent Chem.Molecule instance can be retrieved by passing the atom as argument to the method getAtomIndex() (Chem.AtomContainer interface). In a similar manner, the index of a Chem.Bond instance can be determined by calling the method getBondIndex() (Chem.BondContainer interface):
mol.getAtomIndex(mol.getAtom(3))
3
mol.getBondIndex(mol.getBond(2))
2
Warning
The attempt to retrieve the Chem.Atom or Chem.Bond instance index on a Chem.Molecule instance that is not the parent will raise an exception!
Examples:
mol.getAtomIndex(mol_copy.atoms[0])
---------------------------------------------------------------------------
ItemNotFound Traceback (most recent call last)
<ipython-input-413-835c00aa411f> in <module>
----> 1 mol.getAtomIndex(mol_copy.atoms[0])
ItemNotFound: BasicMolecule: argument atom not part of the molecule
mol.getBondIndex(mol_copy.bonds[1])
---------------------------------------------------------------------------
ItemNotFound Traceback (most recent call last)
<ipython-input-414-ae6b58adf8f3> in <module>
----> 1 mol.getBondIndex(mol_copy.bonds[1])
ItemNotFound: BasicMolecule: argument bond not part of the molecule
4.3.11. Processing Bonds
Chem.Bond is a subclass of Chem.AtomContainer and methods/properties of the latter can thus be used to access the two bonded Chem.Atom objects in the same way as it was done for the parent Chem.Molecule instance:
bond = mol.getBond(2)
bond.numAtoms
2
mol.getAtomIndex(bond.getAtom(0))
2
mol.getAtomIndex(bond.getAtom(1))
3
Like class Chem.MolecularGraph, Chem.Bond provides the property atoms and the method getAtoms() which both give access to the atom pair sequence:
mol.getAtomIndex(bond.atoms[0])
2
mol.getAtomIndex(bond.getAtoms()[1])
3
Additionally, the first atom (index=0) can be retrieved directly by calling the method getBegin() or via the property begin:
mol.getAtomIndex(bond.getBegin())
2
mol.getAtomIndex(bond.begin)
2
The second atom (index=1) can be accessed via the property end or by calling the method getEnd():
mol.getAtomIndex(bond.getEnd())
3
mol.getAtomIndex(bond.end)
3
If one Chem.Atom instance is given the other instance referenced by the Chem.Bond object can be retrieved by the calling the method getNeighbor() as follows:
mol.getAtomIndex(bond.getNeighbor(bond.atoms[0]))
3
Warning
Passing a Chem.Atom instance as argument that is not a member of the bond will trigger an exception!
bond.getNeighbor(mol.atoms[0])
---------------------------------------------------------------------------
ItemNotFound Traceback (most recent call last)
<ipython-input-425-093f4eea5627> in <module>
----> 1 bond.getNeighbor(mol.atoms[0])
ItemNotFound: BasicBond: argument atom not a member
4.3.12. Processing Atom Connections
Chem.Atom sublasses both Chem.AtomContainer and Chem.BondContainer which together provide methods and properties that can be used to access incident bonds and connected atoms.
Example:
for atom in mol.atoms:
print(f'Atom index: {mol.getAtomIndex(atom)}')
print(f' Num. connected atoms: {atom.numAtoms}')
for i in range(atom.numAtoms):
con_atom = atom.getAtom(i)
con_bond = atom.getBond(i)
print(f' Connected atom index: {mol.getAtomIndex(con_atom)}')
print(f' Bond index: {mol.getBondIndex(con_bond)}')
Atom index: 0
Num. connected atoms: 2
Connected atom index: 1
Bond index: 0
Connected atom index: 5
Bond index: 5
Atom index: 1
Num. connected atoms: 2
Connected atom index: 0
Bond index: 0
Connected atom index: 2
Bond index: 1
Atom index: 2
Num. connected atoms: 2
Connected atom index: 1
Bond index: 1
Connected atom index: 3
Bond index: 2
Atom index: 3
Num. connected atoms: 2
Connected atom index: 2
Bond index: 2
Connected atom index: 4
Bond index: 3
Atom index: 4
Num. connected atoms: 2
Connected atom index: 3
Bond index: 3
Connected atom index: 5
Bond index: 4
Atom index: 5
Num. connected atoms: 2
Connected atom index: 4
Bond index: 4
Connected atom index: 0
Bond index: 5
Additionally, Chem.Atom provides the method getAtoms() and the property atoms for accessing the list of bonded Chem.Atom instances as well as the method getBonds() and the property bonds for corresponding Chem.Bond instance access.
The above code changed to use the mentioned properties:
for atom in mol.atoms:
print(f'Atom index: {mol.getAtomIndex(atom)}')
print(f' Num. connected atoms: {atom.numAtoms}')
for i in range(atom.numAtoms):
con_atom = atom.atoms[i]
con_bond = atom.bonds[i]
print(f' Connected atom index: {mol.getAtomIndex(con_atom)}')
print(f' Bond index: {mol.getBondIndex(con_bond)}')
Atom index: 0
Num. connected atoms: 2
Connected atom index: 1
Bond index: 0
Connected atom index: 5
Bond index: 5
Atom index: 1
Num. connected atoms: 2
Connected atom index: 0
Bond index: 0
Connected atom index: 2
Bond index: 1
Atom index: 2
Num. connected atoms: 2
Connected atom index: 1
Bond index: 1
Connected atom index: 3
Bond index: 2
Atom index: 3
Num. connected atoms: 2
Connected atom index: 2
Bond index: 2
Connected atom index: 4
Bond index: 3
Atom index: 4
Num. connected atoms: 2
Connected atom index: 3
Bond index: 3
Connected atom index: 5
Bond index: 4
Atom index: 5
Num. connected atoms: 2
Connected atom index: 4
Bond index: 4
Connected atom index: 0
Bond index: 5
The Chem.Bond instance that connects two specific atoms can be queried using the Chem.Atom method getBondToAtom(). The method is called on one of the Chem.Atom instances and expects the bonded other Chem.Atom instance as argument:
mol.getBondIndex(mol.atoms[0].getBondToAtom(mol.atoms[5]))
5
Warning
If a Chem.Bond instance connecting the Chem.Atom instance pair does not exist then an exception will be raised!
mol.atoms[0].getBondToAtom(mol.atoms[2])
---------------------------------------------------------------------------
ItemNotFound Traceback (most recent call last)
<ipython-input-429-8b35fac927c4> in <module>
----> 1 mol.atoms[0].getBondToAtom(mol.atoms[2])
ItemNotFound: BasicAtom: argument atom is not a bonded neighbor
Alternatively, the method findBondToAtom() can be used. In contrast to getBondToAtom() the method returns
None if a connecting Chem.Bond instance does not exist:
print(mol.atoms[0].findBondToAtom(mol.atoms[2]))
None
4.4. Basic Operations on Fragment Objects
Chem.Fragment (see section Representation of Molecule Substructures) implements the Chem.MolecularGraph interface and thus provides the same methods and properties as Chem.Molecule for accessing/processing the referenced Chem.Atom and Chem.Bond instances (see section Basic Operations on Molecule Objects). In the following subsections therefore only those methods of Chem.Fragment will be treated that are not present in class Chem.Molecule or for some other reasons deserve a more closer look.
4.4.1. Creation
An empty Chem.Fragment object not yet referencing any atoms and bonds can be created by:
frag = Chem.Fragment()
frag.numAtoms
0
Chem.Fragment also provides constructors that accept either a Chem.Fragment or a Chem.MolecularGraph instance as argument. These constructors create a Chem.Fragment object that will then reference the same Chem.Atom and Chem.Bond instances as the passed argument:
frag = Chem.Fragment(mol)
frag
As noted in section Representation of Molecule Substructures, Chem.Atom and Chem.Bond
instances added to a Chem.Fragment instance get stored as pointers (not as copies).
Membership tests for Chem.Atom and Chem.Bond instances retrieved from a Chem.Fragment will therefore
evaluate to True when carried out on the source Chem.MolecularGraph object:
atom = frag.getAtom(0)
frag.containsAtom(atom)
True
mol.containsAtom(atom)
True
bond = frag.getBond(0)
frag.containsBond(bond)
True
mol.containsBond(bond)
True
4.4.2. Adding single Atoms and Bonds
For adding individual Chem.Atom and Chem.Bond instances class Chem.Fragment provides the methods
addAtom() and
addBond(), respectively.
For molecular graph consistency reasons adding a Chem.Bond instance also adds the two
Chem.Atom instances referenced by the bond (if not added already). Furthermore, pointers to Chem.Atom and
Chem.Bond instances get stored only once. In case a given Chem.Atom or Chem.Bond instance has already
been added the methods will do nothing and just return False.
Examples:
# atom already present -> False
frag.addAtom(mol.atoms[0])
False
# bond already present -> False
frag.addBond(mol.bonds[0])
False
frag = Chem.Fragment()
print(f'Num. atoms before addBond(): {frag.numAtoms}')
frag.addBond(mol_copy.bonds[0])
print(f'Num. atoms after addBond(): {frag.numAtoms}')
print(f'Num. bonds after addBond(): {frag.numBonds}')
frag
Num. atoms before addBond(): 0
Num. atoms after addBond(): 2
Num. bonds after addBond(): 1
4.4.3. Adding multiple Atoms and Bonds
The current lists of Chem.Atom and Chem.Bond instances can be replaced by the method assign() which accepts either a Chem.Fragment or a Chem.MolecularGraph instance as argument:
frag.assign(mol)
frag
The current lists of Chem.Atom and Chem.Bond instances can be extended using the inplace addition operator += with a Chem.MolecularGraph instance specifying the atoms and bond to add:
frag += mol_copy
frag
Note that only Chem.Atom and Chem.Bond instance will be added that are not already part of the Chem.Fragment instance:
# fragment remains unaltered
frag += mol_copy
frag
4.4.4. Exchanging Atom and Bond Lists
The current lists of Chem.Atom and Chem.Bond instances of two Chem.Fragment instances can be mutually exchanged by calling the method swap() on one of the instances providing the other instance as argument:
frag2 = Chem.Fragment(mol)
frag.swap(frag2)
frag
frag2
4.4.5. Removing single Atoms and Bonds
Single atoms and bonds can be removed by calling the methods removeAtom() and removeBond(), respectively. The methods expect the Chem.Atom/Chem.Bond instance to remove or the zero-based index as argument. Valid atom/bond indices are in the range [0, getNumAtoms())/[0, getNumBonds()). Specifying an index outside the allowed range will raise an exception.
Examples:
print(f'Num. atoms before removeAtom(1): {frag.numAtoms}')
print(f'Num. bonds before removeAtom(1): {frag.numBonds}')
# remove 2nd atom
frag.removeAtom(1)
print(f'Num. atoms after removeAtom(1): {frag.numAtoms}')
print(f'Num. bonds after removeAtom(1): {frag.numBonds}')
frag
Num. atoms before removeAtom(1): 6
Num. bonds before removeAtom(1): 6
Num. atoms after removeAtom(1): 5
Num. bonds after removeAtom(1): 4
In order to maintain molecular graph consistency, removing an atom automatically triggers the removal of all incident bonds. Removal of a bond has no side effect on the atom count:
print(f'Num. atoms before removeBond(2): {frag.numAtoms}')
print(f'Num. bonds before removeBond(2): {frag.numBonds}')
# remove 3rd bond
frag.removeBond(frag.bonds[2])
print(f'Num. atoms after removeBond(2): {frag.numAtoms}')
print(f'Num. bonds after removeBond(2): {frag.numBonds}')
frag
Num. atoms before removeBond(2): 5
Num. bonds before removeBond(2): 4
Num. atoms after removeBond(2): 5
Num. bonds after removeBond(2): 3
When the removal of a Chem.Atom or Chem.Bond instance is attempted that is not part of the
Chem.Fragment instance then the corresponding methods return False to indicate that the removal
operation failed:
frag.removeAtom(mol_copy.getAtom(0))
False
frag.removeBond(mol_copy.getBond(0))
False
frag
4.4.6. Removing multiple Atoms and Bonds
Multiple Chem.Atom and Chem.Bond instances can be removed at once via inplace subtraction (-=) of a Chem.MolecularGraph instance:
frag.assign(mol)
frag += mol_copy
frag
frag2.clear()
frag2.addBond(mol_copy.getBond(0))
frag2.addBond(mol_copy.getBond(1))
frag -= frag2
frag
Attempting to remove Chem.Atom and Chem.Bond instances that are not part of the Chem.Fragment instance will have no effect:
frag -= frag2
frag
4.5. Reading Molecule Data
4.5.1. Parsing String Data
SMILES and SMARTS
For the parsing of SMILES strings the CDPL.Chem package provides the built-in utility function Chem.parseSMILES(). The function returns a Chem.BasicMolecule object representing the chemical structure encoded by the given SMILES string. For example:
mol = Chem.parseSMILES('c1c(C(=O)O)ccc(CNN)c1')
mol
A similar function called Chem.parseSMARTS() can be used to parse and and prepare SMARTS patterns for substructure searching:
mol = Chem.parseSMARTS('c1:c:[n,o,s]:c:c:1-[C:2](-,=[*])-,=O')
mol
Other formats
The general procedure for the construction of molecules from string data in one of the supported formats (including SMILES and SMARTS) is as follows:
Create an instance of class Base.StringIOStream that wraps the string and serves as input data source for the next steps.
Create a suitable Chem.MoleculeReaderBase subclass instance that will perform the format-specific decoding of the molecule data in step 3.
Call the read() method of the created data reader providing an instance of class Chem.BasicMolecule for the storage of the read molecular structure as argument.
Molecule data readers for a specific format (Step 2) can be created in two ways:
Via class Chem.MoleculeReader providing the Base.StringIOStream instance (Step 1) and a data format specifier (= file extension or one of the data format descriptors defined in class Chem.DataFormat) as constructor arguments.
Direct instantiation of a format-specific subclass of Chem.MoleculeReaderBase (e.g. Chem.MOL2MoleculeReader implementing the Sybyl MOL2 format input).
Example: Reading a molecule from a string providing data in MDL SDF format
import CDPL.Base as Base
sdf_data = """5950
12162506342D
13 12 0 1 0 0 0 0 0999 V2000
5.1350 -0.2500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
4.2690 1.2500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.5369 0.2500 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
3.4030 -0.2500 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
3.4030 -1.2500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
4.2690 0.2500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.4030 0.3700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
2.7830 -1.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
3.4030 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
4.0230 -1.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
2.0000 -0.0600 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
2.5369 0.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
5.6720 0.0600 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1 6 1 0 0 0 0
1 13 1 0 0 0 0
2 6 2 0 0 0 0
4 3 1 6 0 0 0
3 11 1 0 0 0 0
3 12 1 0 0 0 0
4 5 1 0 0 0 0
4 6 1 0 0 0 0
4 7 1 0 0 0 0
5 8 1 0 0 0 0
5 9 1 0 0 0 0
5 10 1 0 0 0 0
M END
> <PUBCHEM_COMPOUND_CID>
5950
$$$$
"""
ios = Base.StringIOStream(sdf_data)
reader = Chem.MoleculeReader(ios, 'sdf')
# or
#reader = Chem.MoleculeReader(ios, Chem.DataFormat.SDF)
# or
#reader = Chem.SDFMoleculeReader(ios)
reader.read(mol)
mol
4.5.2. Reading Data Files
Reading molecules from files also requires the creation of a Chem.MoleculeReaderBase subclass instance that performs the actual format-specific data decoding work. As with string data, several options exist:
Instantiation of class Chem.MoleculeReader passing the path to the file as constructor argument. When just a path is provided as argument then the data format will be determined automatically from the file extension. To override this behavior, a second argument specifying the actual file extension string to use (e.g. sdf, smi, mol2, ..) or one one of the data format descriptors defined in class Chem.DataFormat has to be provided.
Instantiation of class Chem.MoleculeReader passing an instance of class Base.FileIOStream that was created for the file as the first and and a format specifier as the second argument. The format specification can be a characteristic file extension or one of the data format descriptors defined in class Chem.DataFormat.
Direct instantiation of a format-specific subclass of Chem.MoleculeReaderBase (e.g. Chem.SDFMoleculeReader implementing reading MDL SD-file format data) that accepts an instance of class Base.FileIOStream as constructor argument.
Direct instantiation of a format-specific subclass of Chem.MoleculeReaderBase (e.g. Chem.FileSDFMoleculeReader) that accepts a file path as constructor argument.
# - Option 1 -
reader = Chem.MoleculeReader('/path/to/input/file.sdf')
# or
reader = Chem.MoleculeReader('/path/to/input/file', 'smi')
# or
reader = Chem.MoleculeReader('/path/to/input/file', Chem.DataFormat.SMILES)
# - Option 2 -
reader = Chem.MoleculeReader(Base.FileIOStream('/path/to/input/file'), 'sdf')
# or
reader = Chem.MoleculeReader(Base.FileIOStream('/path/to/input/file'), Chem.DataFormat.SDF)
# - Option 3 -
reader = Chem.MOL2MoleculeReader(Base.FileIOStream('/path/to/input/file'))
# - Option 4 -
reader = Chem.FileSDFMoleculeReader('/path/to/input/file')
4.5.3. Sequential Molecule Reading
Given a properly initialized Chem.MoleculeReaderBase subclass instance, molecules can be read in the order provided by the input data by repeatedly calling the read() method. If there are no more molecules to read, the return value of the method will evaluate to False:
smi_data = """c1n(ccn1)c1ccc(cc1)c1ccc(n1c1c(cc(cc1)C(=O)N)C)CCC(=O)[O-] 022_3QJ5_A
CNC(=O)[C@H](C(C)(C)C)NC(=O)[C@@H]([C@H](C)N([O-])C=O)CCCc1ccccc1 023_2WO9_B
N1N(C(c2c(C=1Nc1cc([nH]n1)C)ccc(N1CC[NH+](CC1)C)c2)=O)C(C)C 027_3PIX_A
"""
ios = Base.StringIOStream(smi_data)
reader = Chem.MoleculeReader(ios, 'smi')
mol_count = 0
while reader.read(mol_copy):
mol_count += 1
print(f'Read {mol_count} molecules')
Read 3 molecules
4.5.4. Random Molecule Access
There is a special version of the read() method of class Chem.MoleculeReaderBase which expects the index (zero-based) of the molecule to read as its first argument. This way molecules can be read in any order, no matter what their order is in the input data. The number of available molecules can be queried either by calling the method getNumRecords() or by accessing the property numRecords.
Example:
ios = Base.StringIOStream(smi_data)
reader = Chem.MoleculeReader(ios, 'smi')
num_mols = reader.getNumRecords()
# or
#num_mols = reader.numRecords
print(f'Number of input molecules: {num_mols}')
Number of input molecules: 3
# read the 2nd molecule
reader.read(1, mol_copy)
mol_copy
# read the 1st molecule
reader.read(0, mol_copy)
mol_copy
Warning
If the index is out of the valid range then a corresponding exception will be thrown!
# there is no 4th molecule
reader.read(3, mol_copy)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-460-4f5078ed4ed6> in <module>
1 # there is no 4th molecule
----> 2 reader.read(3, mol_copy)
IndexError: StreamDataReader: record index out of bounds
4.6. Essential Properties
The following tables list subsets of exported Chem.Entity3D, Chem.Atom, Chem.Bond and Chem.MolecularGraph properties which are essential for a proper description and follow-up processing of organic molecular structures. Along with each exported property key the CDPL.Chem package provides four functions that allow for a more comfortable and type-safe property value assignment and retrieval as well removal and existence testing (see section Dynamic Properties for further information).
Description |
Property Key |
Property Functions |
Value Type |
Default Value |
|---|---|---|---|---|
Bond order |
Chem.setOrder(), Chem.getOrder(), Chem.hasOrder(), Chem.clearOrder() |
int |
|
|
Stereo configuration descriptor |
Chem.setStereoDescriptor(), Chem.getStereoDescriptor(), Chem.hasStereoDescriptor(), Chem.clearStereoDescriptor() |
|
||
Ring system membership predicate |
Chem.setRingFlag(), Chem.getRingFlag(), Chem.hasRingFlag(), Chem.clearRingFlag() |
bool |
- |
|
Aromatic ring system membership predicate |
Chem.setAromaticityFlag(), Chem.getAromaticityFlag(), Chem.hasAromaticityFlag(), Chem.clearAromaticityFlag() |
bool |
- |
|
Stereo bond type in skeletal formulas |
Chem.set2DStereoFlag(), Chem.get2DStereoFlag(), Chem.has2DStereoFlag(), Chem.clear2DStereoFlag() |
int |
Description |
Property Key |
Property Functions |
Value Type |
Default Value |
|---|---|---|---|---|
Arbitrary structure name |
Chem.setName(), Chem.getName(), Chem.hasName(), Chem.clearName() |
str |
|
|
Smallest set of smallest rings (SSSR) |
Chem.setSSSR(), Chem.getSSSR(), Chem.hasSSSR(), Chem.clearSSSR() |
- |
||
Molecular graph components |
Chem.setComponents(), Chem.getComponents(), Chem.hasComponents(), Chem.clearComponents() |
- |
||
Aromatic atoms and bonds |
Chem.setAromaticSubstructure(), Chem.getAromaticSubstructure(), Chem.hasAromaticSubstructure(), Chem.clearAromaticSubstructure() |
- |
||
Arbitrary string data (e.g. from SD-file) |
Chem.setStructureData(), Chem.getStructureData(), Chem.hasStructureData(), Chem.clearStructureData() |
- |
4.7. Writing Molecule Data
4.7.1. Direct String Output
SMILES
For a direct generation of SMILES strings the CDPL.Chem package provides the built-in utility function Chem.generateSMILES(). The function expects a Chem.MolecularGraph instance representing the chemical structure as first argument. Further optional arguments allow to customize the SMILES output in several aspects.
Examples:
Chem.calcBasicProperties(mol, False) # calculate required properties, more on that later
Chem.generateSMILES(mol) # by default, non-canonical SMILES strinsg are generated
'OC(=O)[C@@H](N)C'
Chem.generateSMILES(mol, True) # second arg. True -> generate canonical SMILES
'C[C@@H](C(O)=O)N'
Chem.generateSMILES(mol, True, False) # third arg. False -> output also standard H-atoms
'[H][C@@](C(=O)O[H])(C([H])([H])[H])N([H])[H]'
InChI
InChI strings can be generated by means of the utility function Chem.generateINCHI(). The function likewise
expects a Chem.MolecularGraph instance as its first argument. A second optional argument of type str allows to provide settings for the InChI generation code (supported options are described here). The third argument controls the dimension of the atom coordinates (0 -> auto sel., 2 -> 2D or 3 -> 3D) that are output as part of the generated auxiliary information (if enabled by the provided settings, see second example).
Examples:
Chem.generateINCHI(mol)
'InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5,6)/t2-/m0/s1'
Chem.generateINCHI(mol, '/WarnOnEmptyStructure /NEWPSOFF', 0) # outputs InChI as above + auxiliary information
'InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5,6)/t2-/m0/s1 AuxInfo=1/1/N:5,4,6,3,1,2/E:(5,6)/it:im/rA:13OONCCCHHHHHHH/rB:;;n3;s4;s1d2s4;s4;s5;s5;s5;s3;s3;s1;/rC:5.135,-.25,0;4.269,1.25,0;2.5369,.25,0;3.403,-.25,0;3.403,-1.25,0;4.269,.25,0;3.403,.37,0;2.783,-1.25,0;3.403,-1.87,0;4.023,-1.25,0;2,-.06,0;2.5369,.87,0;5.672,.06,0;'
InChI Keys
Similarly, the InChI Key of a given Chem.MolecularGraph instance can be generated via the utility function Chem.generateINCHIKey():
Chem.generateINCHIKey(mol)
'QNAYBMKLOCPYGJ-REOHCLBHSA-N'
Other formats
The general procedure for the output of Chem.MolecularGraph instances as string data encoded in one of the supported formats (including SMILES and InChI) is as follows:
Create an instance of class Base.StringIOStream that wraps the string and serves as output data sink for the next steps.
Create a suitable Chem.MolecularGraphWriterBase subclass instance that will perform the format-specific encoding of the molecule data in step 3.
Call the write() method of the created data writer one or multiple times providing the Chem.MolecularGraph instance(s) to output as argument.
Call the close() method after all Chem.MolecularGraph instances have been written.
Molecular graph data writers for a specific format (Step 2) can be created in two ways:
Via class Chem.MolecularGraphWriter providing the Base.StringIOStream instance (Step 1) and a data format specifier (= one of the data format descriptors defined in class Chem.DataFormat) as constructor arguments.
Direct instantiation of a format-specific subclass of Chem.MolecularGraphWriterBase (e.g. Chem.MOL2MolecularGraphWriter implementing Sybyl MOL2 format output).
Example: Generating a string holding the MDL SDF record of a Chem.MolecularGraph instance
ios = Base.StringIOStream()
writer = Chem.MolecularGraphWriter(ios, 'sdf')
# or
#writer = Chem.MolecularGraphWriter(ios, Chem.DataFormat.SDF)
# or
#writer = Chem.SDFMolecularGraphWriterer(ios)
writer.write(mol)
writer.close()
sdf_str = ios.getvalue()
# or for retrieving the generated data as bytes object
#sdf_bytes = ios.getbytes()
print(sdf_str)
5950
12162506342D 0 0.00000 0.00000
13 12 0 1 0 999 V2000
5.1350 -0.2500 O 0 0 0 0 0 2 0 0 0
4.2690 1.2500 O 0 0 0 0 0 2 0 0 0
2.5369 0.2500 N 0 0 0 0 0 3 0 0 0
3.4030 -0.2500 C 0 0 1 0 0 4 0 0 0
3.4030 -1.2500 C 0 0 0 0 0 4 0 0 0
4.2690 0.2500 C 0 0 0 0 0 4 0 0 0
3.4030 0.3700 H 0 0 0 0 0 1 0 0 0
2.7830 -1.2500 H 0 0 0 0 0 1 0 0 0
3.4030 -1.8700 H 0 0 0 0 0 1 0 0 0
4.0230 -1.2500 H 0 0 0 0 0 1 0 0 0
2.0000 -0.0600 H 0 0 0 0 0 1 0 0 0
2.5369 0.8700 H 0 0 0 0 0 1 0 0 0
5.6720 0.0600 H 0 0 0 0 0 1 0 0 0
1 6 1 0 0 0
1 13 1 0 0 0
2 6 2 0 0 0
4 3 1 6 0 0
3 11 1 0 0 0
3 12 1 0 0 0
4 5 1 0 0 0
4 6 1 0 0 0
4 7 1 0 0 0
5 8 1 0 0 0
5 9 1 0 0 0
5 10 1 0 0 0
M END
> <PUBCHEM_COMPOUND_CID>
5950
$$$$
4.7.2. File Output
Writing Chem.MolecularGraph instance data to file storage also requires the creation of a Chem.MolecularGraphWriterBase subclass instance that performs the actual format-specific data encoding work. As with string data output, several options exist:
Instantiation of class Chem.MolecularGraphWriter passing the path to the file as constructor argument. When just a path is provided as argument then the data format will be determined automatically from the file extension. To override this behavior, a second argument specifying the actual file extension string to use (e.g. sdf, smi, mol2, ..) or one one of the data format descriptors defined in class Chem.DataFormat has to be provided.
Instantiation of class Chem.MolecularGraphWriter passing an instance of class Base.FileIOStream that was created for the file as the first and and a format specifier as the second argument. The format specification can be a characteristic file extension or one of the data format descriptors defined in class Chem.DataFormat.
Direct instantiation of a format-specific subclass of Chem.MolecularGraphWriterBase (e.g. Chem.SDFMolecularGraphWriter implementing writing MDL SD-file data) that accepts an instance of class Base.FileIOStream as constructor argument.
Direct instantiation of a format-specific subclass of Chem.MolecularGraphWriterBase (e.g. Chem.FileSDFMolecularGraphWriter) that accepts a file path as constructor argument.
# - Option 1 -
writer = Chem.MolecularGraphWriter('/path/to/output/file.sdf')
# or
writer = Chem.MolecularGraphWriter('/path/to/output/file', 'smi')
# or
writer = Chem.MolecularGraphWriter('/path/to/output/file', Chem.DataFormat.SMILES)
# - Option 2 -
writer = Chem.MolecularGraphWriter(Base.FileIOStream('/path/to/output/file', 'w'), 'sdf')
# or
writer = Chem.MolecularGraphWriter(Base.FileIOStream('/path/to/output/file', 'w'), Chem.DataFormat.SDF)
# - Option 3 -
writer = Chem.MOL2MolecularGraphWriter(Base.FileIOStream('/path/to/output/file', 'w'))
# - Option 4 -
writer = Chem.FileSDFMolecularGraphWriter('/path/to/output/file')
Once a data writer instance has been created, the write() method can be called one or multiple times with the Chem.MolecularGraph instance(s) to output as argument. After all Chem.MolecularGraph instances have been output the close() method needs to be called to flush all written data to disk and close the file (note: although this method will be called automatically by the destructor of the data writer, an explicit call will have an immediate effect and is preferred over a delayed destructor call by the garbage collector happening at an indeterminate point in time).
Example: SMILES output of two Chem.Molecule instances
Chem.calcBasicProperties(mol_copy, False) # calc. required properties
writer = Chem.MolecularGraphWriter('output.smi')
writer.write(mol)
writer.write(mol_copy)
writer.close()
with open('output.smi', 'r') as smi_file:
print(smi_file.read())
OC(=O)[C@@H](N)C 5950
c1n(ccn1)c1ccc(cc1)c1ccc(n1c1c(cc(cc1)C(=O)N)C)CCC(=O)[O-] 022_3QJ5_A