1. Molecular Input/Output
1.1. Reading molecules from an input file
A frequent task in cheminformatics is to read molecular data from files in different formats. For this purpose, the CDPL provides data readers for various file formats including SDF, SMILES, MOL, MOL2, CDF, and others.
Note
The CDF format is a binary format and CDPKit’s native file format that can be used to store molecule, reaction, and pharmacophore data in a space and processing time efficient manner.
The class Chem.MoleculeReader is rather versatile and can read molecules stored in any of the supported formats (SDF, SMILES, MOL, MOL2, CDF, etc.) thus eliminating the need to instantiate a format-specific reader implementation. Here is a simple example of how to read molecules from an input file:
Note
The module CDPL.Chem provides functionality for the processing of chemical data.
import CDPL.Chem as Chem
def processMoleculesFromFile(input_file: str) -> None:
"""
Reads molecules from the provided input file and outputs the number of atoms and bonds for each molecule.
Parameters:
- input_file (str): Path to the input file.
"""
# Create reader for input molecules (format specified by file extension)
reader = Chem.MoleculeReader(input_file)
# create an instance of the default implementation of the Chem.Molecule interface
mol = Chem.BasicMolecule()
# Iterate over each molecule and print atom and bond count
try:
while reader.read(mol):
try:
print('Molecule with', mol.numAtoms, 'atoms and', mol.numBonds, 'bonds')
except Exception as e:
sys.exit('Error: processing of molecule failed: ' + str(e))
except Exception as e: # handle exception raised in case of severe read errors
sys.exit('Error: reading molecule failed: ' + str(e))
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
processMoleculesFromFile("path_to_mol_file.mol2")
1.2. Retrieving structure data from MDL SD-files
Analyzing the data associated with a molecule in an SD-file can be done as shown in the following example:
Note
You can either use Chem.MoleculeReader or use the format-specific reader Chem.FileSDFMoleculeReader molecules from an SD-file. Usually, the more convenient and flexible way to read molecule data is to use Chem.MoleculeReader.
import CDPL.Chem as Chem
def processStructureDataFromSD(input_sd_file: str) -> None:
"""
Retrieves the structure data of each molecule in the provided SD file and outputs it to the console.
Parameters:
- input_sd_file (str): Path to the input SD file.
"""
# Create reader for MDL SD-files
reader = Chem.FileSDFMoleculeReader(input_sd_file)
# create an instance of the default implementation of the Chem.Molecule interface
mol = Chem.BasicMolecule()
# Iterate over each molecule in the file and retrieve structure data
try:
while reader.read(mol):
try:
if not Chem.hasStructureData(mol):
print('Error: no structure data available for molecule', Chem.getName(mol))
continue
struct_data = Chem.getStructureData(mol) # retrieve structure data
print('Structure data (%s entries) of molecule \'%s\':\n' % (str(len(struct_data)), Chem.getName(mol)))
for entry in struct_data: # iterate of structure data entries consisting of a header line and the actual data
print('Header:', entry.header)
print('Data:', entry.data)
except Exception as e:
sys.exit('Error: processing of molecule failed: ' + str(e))
except Exception as e: # handle exception raised in case of severe read errors
sys.exit('Error: reading molecule failed: ' + str(e))
##########################################################################
###########################################################################
###########################################################################
################################ Example usage:
processStructureDataFromSD("path_to_sd_file.sdf")
1.3. Writing molecules to an output file
Once you’ve processed or analyzed your molecules, you may want to save them to an output file. In the constructor of the class Chem.MolecularGraphWriter the format of the output file is determined by its file extension, making it easy to save molecules in different formats such as SDF, MOL, MOL2, CDF, and others without having to write format-specific code.
Note
Ensure that the file extension you provide matches the standard desired output format. For instance, use .sdf for MDL Structure-Data Files, .mol for MDL Molfiles, and so on.
Here’s a simple example of how a list of molecules can be written to an output file:
import CDPL.Chem as Chem
def molsToFiles(mols: list[Chem.BasicMolecule], output_file: str) -> None:
"""
Writes a list of molecules to the specified output file.
Parameters:
- mols (list): List of CDPKit molecules to write to the output file.
- output_file (str): Path to the output file.
"""
# Create a writer for the output molecules (format specified by file extension)
writer = Chem.MolecularGraphWriter(output_file)
for mol in mols:
writer.write(mol)
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
mols = [list of BasicMolecules] # Example list of BasicMolecules
molsToFiles(mols, "path_to_output_file.sdf")
1.4. Description of structural atom environments as SMILES strings
For the extraction of the local chemical environment of an atom in a molecular graph the CDPL provides the utility function Chem.getEnvironment. Fur further processing extracted environments can, e.g., be output as SMILES strings as shown in the following example.
import CDPL.Chem as Chem
def printAtomEnv(mols: list[Chem.BasicMolecule]) -> None:
"""
Extracts the atom environments of each atom in the provided list of molecules and outputs them as SMILES strings.
Parameters:
- mols (list): List of CDPKit molecules to process.
"""
for mol in mols:
Chem.calcImplicitHydrogenCounts(mol, False) # calculate implicit hydrogen counts and set corresponding property for all atoms
Chem.perceiveHybridizationStates(mol, False) # perceive atom hybridization states and set corresponding property for all atoms
Chem.perceiveSSSR(mol, False) # perceive smallest set of smallest rings and store as Chem.MolecularGraph property
Chem.setRingFlags(mol, False) # perceive cycles and set corresponding atom and bond properties
Chem.setAromaticityFlags(mol, False) # perceive aromaticity and set corresponding atom and bond properties
frag = Chem.Fragment() # for storing extracted atom environments
print('- Atom environments (radius = 3 bonds)')
for atom in mol.atoms:
Chem.getEnvironment(atom, mol, 3, frag) # extract environment of atom reaching out up to three bonds
Chem.perceiveComponents(frag, False) # perceive molecular graph components (required for SMILES generation)
smiles = Chem.generateSMILES(frag, False, False) # generate non-canonical SMILES string with explicit hydrogen atoms
print('Atom #%s: %s' % (str(mol.getAtomIndex(atom)), smiles))
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
mols = [list of BasicMolecules] # Example list of BasicMolecules
printAtomEnv(mols)
1.5. ChEMBL molecule standardization and parent structure extraction
The CDPL provides a convenient way to standardize molecules using its implementaion of the ChEMBL standardization pipeline [15]. This process ensures that molecules will be represented in a consistent and standardized manner when they enter downstream processing steps.
import CDPL.Chem as Chem
def standardize(chembl_proc: Chem.ChEMBLStandardizer, in_mol: Chem.Molecule, out_mol: Chem.Molecule, proc_excluded: bool, extract_parent: bool) -> Chem.ChEMBLStandardizer.ChangeFlags:
"""
Performs ChEMBL molecule standardization and parent structure extraction (optional) for a given input molecule using a provided Chem.ChEMBLStandardizer instance.
Parameters:
- chembl_proc (Chem.ChEMBLStandardizer): Instance of the Chem.ChEMBLStandardizer class.
- in_mol (Chem.Molecule): Input molecule to standardize.
- out_mol (Chem.Molecule): Output molecule to store the standardized molecule.
- proc_excluded (bool): If True, molecules flagged as excluded will be processed.
- extract_parent (bool): If True, the parent structure will be extracted.
Returns:
- Chem.ChEMBLStandardizer.ChangeFlags: Flags indicating the carried out modifications.
"""
# here, the standardization is carried out on a copy of the read input molecule
# (if only one molecule instance gets provided as argument, modifications will be made in-place)
change_flags = chembl_proc.standardize(in_mol, out_mol, proc_excluded)
if extract_parent: # perform parent structure extraction (optional)
change_flags &= ~Chem.ChEMBLStandardizer.EXCLUDED # clear excluded flag possibly set by the standardization
# procedure (might change after salt stripping)
change_flags |= chembl_proc.getParent(out_mol) # extract parent structure (in-place) and add information
# about the carried out modifcations
return change_flags
def getListOfChangesString(change_flags: Chem.ChEMBLStandardizer.ChangeFlags, verbose: bool = False) -> str:
"""
Returns a string listing the carried out modifications.
Parameters:
- change_flags (Chem.ChEMBLStandardizer.ChangeFlags): Flags indicating the carried out modifications.
- verbose (bool): If True, the string will contain a detailed list of the carried out modifications.
Returns:
- str: String listing the carried out modifications.
"""
if not verbose:
return None
changes = ' Carried out modifications:'
# List of possible changes
change_list = [
(Chem.ChEMBLStandardizer.EXPLICIT_HYDROGENS_REMOVED, 'Explicit hydrogens removed'),
(Chem.ChEMBLStandardizer.UNKNOWN_STEREO_STANDARDIZED, 'Undefined stereocenter information standardized'),
(Chem.ChEMBLStandardizer.BONDS_KEKULIZED, 'Kekule structure generated'),
(Chem.ChEMBLStandardizer.STRUCTURE_NORMALIZED, 'Functional groups normalized'),
(Chem.ChEMBLStandardizer.CHARGES_REMOVED, 'Number of charged atoms reduced'),
(Chem.ChEMBLStandardizer.TARTRATE_STEREO_CLEARED, 'Configuration of chiral tartrate atoms set to undefined'),
(Chem.ChEMBLStandardizer.STRUCTURE_2D_CORRECTED, '2D structure corrected'),
(Chem.ChEMBLStandardizer.ISOTOPE_INFO_CLEARED, 'Isotope information cleared'),
(Chem.ChEMBLStandardizer.SALT_COMPONENTS_REMOVED, 'Salt components removed'),
(Chem.ChEMBLStandardizer.SOLVENT_COMPONENTS_REMOVED, 'Solvent components removed'),
(Chem.ChEMBLStandardizer.DUPLICATE_COMPONENTS_REMOVED, 'Duplicate components removed')
]
for flag, description in change_list:
if change_flags & flag:
changes += '\n * ' + description
return changes
def getLogMessage(change_flags: Chem.ChEMBLStandardizer.ChangeFlags, proc_excluded: bool, extract_parent: bool, mol_id: str, verbose: bool = False) -> str:
"""
Returns a log message describing the carried out modifications.
Parameters:
- change_flags (Chem.ChEMBLStandardizer.ChangeFlags): Flags indicating the carried out modifications.
- proc_excluded (bool): If True, molecules flagged as excluded will be processed.
- extract_parent (bool): If True, the parent structure will be extracted.
- mol_id (str): Identifier of the molecule.
Returns:
- str: Log message describing the carried out modifications.
"""
if (change_flags & Chem.ChEMBLStandardizer.EXCLUDED) and proc_excluded:
return f'Molecule {mol_id}: discarded (flagged as excluded)'
if not proc_excluded and (change_flags & Chem.ChEMBLStandardizer.EXCLUDED):
return f'Molecule {mol_id}: forwarded unchanged (flagged as excluded)'
if change_flags:
return f'Molecule {mol_id}: modified\n{getListOfChangesString(change_flags, verbose)}'
return f'Molecule {mol_id}: forwarded unchanged'
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
mols = [list of BasicMolecules] # Example list of BasicMolecules
chembl_proc = Chem.ChEMBLStandardizer()
for mol in mols:
in_mol = mol
out_mol = Chem.BasicMolecule()
change_flags = standardize(chembl_proc, in_mol, out_mol, proc_excluded=True, extract_parent=True)
mol_id = Chem.getName(in_mol).strip() or f'Molecule_{mols.index(mol)}'
log_msg = getLogMessage(change_flags, proc_excluded=True, extract_parent=True, mol_id=mol_id, verbose=True)
print(log_msg)
1.6. Standardization/prediction of protonation states
In the realm of computational chemistry and molecular modeling, understanding and predicting the behavior of molecules often hinges on the finer details. One such critical detail is the protonation state of the functional groups in a molecule. Protonation states dictate formal charges of atoms and have an impact on molecular 3D structure and reactivity. Knowing the pre-dominant protonation states of functional groups under certain condition is therefore vital for methods predicting drug binding, enzymatic reactions, etc.
The class Chem.ProtonationStateStandardizer implements several protonation state generation algorithms. The following code snippet shows how to generate protonation/formal charge states of acidic/basic functional groups that are likely to occur under physiological conditions.
import CDPL.Chem as Chem
mols = [list of BasicMolecules] # Example list of BasicMolecules
# create and initialize an instance of the class Chem.ProtonationStateStandardizer which
# implements the protonation state generation algorithm
prot_state_gen = Chem.ProtonationStateStandardizer()
# process molecules one after the other
for mol in mols:
# compose a simple molecule identifier
mol_id = Chem.getName(mol).strip()
if mol_id == '':
mol_id = '#' + str(i) # fallback if name is empty
else:
mol_id = '\'%s\' (#%s)' % (mol_id, str(i))
try:
# protonate/deprotonate functional groups for phys. conditions
prot_state_gen.standardize(mol, Chem.ProtonationStateStandardizer.PHYSIOLOGICAL_CONDITION_STATE)
# enforce an update of the molecule components list (structure might have changed)
Chem.perceiveComponents(mol, True)
except Exception as e:
sys.exit('Error: processing or output of molecule %s failed: %s' % (mol_id, str(e)))
writer.close()
2. 3D Structure and Conformer Ensemble Generation
In chemistry, conformational isomerism is a form of stereoisomerism in which the isomers can be interconverted just by rotations about formally single bonds. While any two arrangements of atoms in a molecule that differ by rotation about single bonds can be referred to as different conformations, conformations that correspond to local minima on the potential energy surface are specifically called conformational isomers or conformers.
In the realm of cheminformatics and molecular modeling many methods and algorithms require 3D structures or even whole conformer ensembles as input. For a seamless integration with such methods CDPKit provides functionality in package CDPL.ConfGen that allows to generate both single low-energy 3D structures and diverse conformer ensembles solely from connection table information. The high-quality conformer ensemble generation tool included in CDPKit is called CONFORGE. For details on its implementation and performance see [5].
2.1. Generating single low-energy 3D structures
Here’s how you can generate a single low-energy 3D structure for a molecule:
import CDPL.Chem as Chem
import CDPL.ConfGen as ConfGen
def generate3dConformation(mol: Chem.Molecule, struct_gen: ConfGen.StructureGenerator) -> int:
"""
Generates a low-energy 3D structure of the argument molecule using the provided initialized ConfGen.StructureGenerator instance.
Parameters:
- mol (Chem.Molecule): Molecule to generate a 3D structure for.
- struct_gen (ConfGen.StructureGenerator): Instance of the ConfGen.StructureGenerator class.
Returns:
- int: Status code indicating the success of the 3D structure generation.
"""
# prepare the molecule for 3D structure generation
ConfGen.prepareForConformerGeneration(mol)
# generate the 3D structure
status = struct_gen.generate(mol)
# if sucessful, store the generated conformer ensemble as
# per atom 3D coordinates arrays (= the way conformers are represented in CDPKit)
if status == ConfGen.ReturnCode.SUCCESS:
struct_gen.setCoordinates(mol)
# return status code
return status
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
max_time = 3600 # Max. allowed molecule processing time in seconds (default: 3600 sec)
mols = [list of BasicMolecules] # Example list of BasicMolecules
# create writer for the generated 3D structures (format specified by file extension)
writer = Chem.MolecularGraphWriter("path_to_output_file.sdf")
# export only a single 3D structure (in case of multi-conf. input molecules)
Chem.setMultiConfExportParameter(writer, False)
# create and initialize an instance of the class ConfGen.StructureGenerator which will
# perform the actual 3D structure generation work
struct_gen = ConfGen.StructureGenerator()
struct_gen.settings.timeout = max_time * 1000 # apply the -t argument
# dictionary mapping status codes to human readable strings
status_to_str = { ConfGen.ReturnCode.UNINITIALIZED : 'uninitialized',
ConfGen.ReturnCode.TIMEOUT : 'max. processing time exceeded',
ConfGen.ReturnCode.ABORTED : 'aborted',
ConfGen.ReturnCode.FORCEFIELD_SETUP_FAILED : 'force field setup failed',
ConfGen.ReturnCode.FORCEFIELD_MINIMIZATION_FAILED : 'force field structure refinement failed',
ConfGen.ReturnCode.FRAGMENT_LIBRARY_NOT_SET : 'fragment library not available',
ConfGen.ReturnCode.FRAGMENT_CONF_GEN_FAILED : 'fragment conformer generation failed',
ConfGen.ReturnCode.FRAGMENT_CONF_GEN_TIMEOUT : 'fragment conformer generation timeout',
ConfGen.ReturnCode.FRAGMENT_ALREADY_PROCESSED : 'fragment already processed',
ConfGen.ReturnCode.TORSION_DRIVING_FAILED : 'torsion driving failed',
ConfGen.ReturnCode.CONF_GEN_FAILED : 'conformer generation failed' }
# process molecules one after the other
for mol in mols:
# compose a simple molecule identifier
mol_id = Chem.getName(mol).strip()
if mol_id == '':
mol_id = '#' + str(i) # fallback if name is empty
else:
mol_id = '\'%s\' (#%s)' % (mol_id, str(i))
try:
# generate 3D structure of the read molecule
status = generate3dConformation(mol, struct_gen)
# check for severe error reported by status code
if status == ConfGen.ReturnCode.SUCCESS:
# enforce the output of 3D coordinates in case of MDL file formats
Chem.setMDLDimensionality(mol, 3)
# output the generated 3D structure
if not writer.write(mol):
sys.exit('Error: writing 3D structure of molecule %s failed' % mol_id)
except Exception as e:
sys.exit('Error: 3D structure generation or output for molecule %s failed: %s' % (mol_id, str(e)))
writer.close()
2.2. Generating conformer ensembles
For some applications it is necessary to generate multiple conformations for a molecule, e.g. to assess its flexibility or investigate binding capabilities towards a particular target receptor.
The function generate_conformation_ensembles()
generates a conformer ensemble for a given molecule using an
instance of class ConfGen.ConformerGenerator.
The parameters min_rmsd, e_window, and max_confs are used to control the generated conformer ensemble’s diversity, energy and size.
They can be initialized at the beginning of the example script with the desired values.
import CDPL.Chem as Chem
import CDPL.ConfGen as ConfGen
def generateConformationEnsembles(mol: Chem.BasicMolecule, conf_gen: ConfGen.ConformerGenerator) -> (int, int):
"""
Generates a conformation ensemble for the argument molecule using the provided initialized ConfGen.ConformerGenerator instance.
Parameters:
- mol (Chem.BasicMolecule): Molecule to generate a conformation ensemble for.
- conf_gen (ConfGen.ConformerGenerator): Instance of the ConfGen.ConformerGenerator class.
Returns:
- int: Status code indicating the success of the conformation ensemble generation.
- int: Number of generated conformers.
"""
# prepare the molecule for conformer generation
ConfGen.prepareForConformerGeneration(mol)
# generate the conformer ensemble
status = conf_gen.generate(mol)
num_confs = conf_gen.getNumConformers()
# if sucessful, store the generated conformer ensemble as
# per atom 3D coordinates arrays (= the way conformers are represented in CDPKit)
if status == ConfGen.ReturnCode.SUCCESS or status == ConfGen.ReturnCode.TOO_MUCH_SYMMETRY:
conf_gen.setConformers(mol)
else:
num_confs = 0
# return status code and the number of generated conformers
return (status, num_confs)
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
# Settings
max_time = 3600 # Max. allowed molecule processing time in seconds (default: 3600 sec)
min_rmsd = 0.5 # Output conformer RMSD threshold (default: 0.5)
e_window = 20.0 # Output conformer energy window (default: 20.0)
max_confs = 100 # Max. output ensemble size (default: 100)
mols = [list of BasicMolecules] # Example list of BasicMolecules
# create writer for the generated conformer ensembles (format specified by file extension)
writer = Chem.MolecularGraphWriter("path_to_output_file.sdf")
# create and initialize an instance of the class ConfGen.ConformerGenerator which
# will perform the actual conformer ensemble generation work
conf_gen = ConfGen.ConformerGenerator()
conf_gen.settings.timeout = max_time * 1000 # apply the -t argument
conf_gen.settings.minRMSD = min_rmsd # apply the -r argument
conf_gen.settings.energyWindow = e_window # apply the -e argument
conf_gen.settings.maxNumOutputConformers = max_confs # apply the -n argument
# dictionary mapping status codes to human readable strings
status_to_str = { ConfGen.ReturnCode.UNINITIALIZED : 'uninitialized',
ConfGen.ReturnCode.TIMEOUT : 'max. processing time exceeded',
ConfGen.ReturnCode.ABORTED : 'aborted',
ConfGen.ReturnCode.FORCEFIELD_SETUP_FAILED : 'force field setup failed',
ConfGen.ReturnCode.FORCEFIELD_MINIMIZATION_FAILED : 'force field structure refinement failed',
ConfGen.ReturnCode.FRAGMENT_LIBRARY_NOT_SET : 'fragment library not available',
ConfGen.ReturnCode.FRAGMENT_CONF_GEN_FAILED : 'fragment conformer generation failed',
ConfGen.ReturnCode.FRAGMENT_CONF_GEN_TIMEOUT : 'fragment conformer generation timeout',
ConfGen.ReturnCode.FRAGMENT_ALREADY_PROCESSED : 'fragment already processed',
ConfGen.ReturnCode.TORSION_DRIVING_FAILED : 'torsion driving failed',
ConfGen.ReturnCode.CONF_GEN_FAILED : 'conformer generation failed' }
# process molecules one after the other
for mol in mols:
# compose a simple molecule identifier
mol_id = Chem.getName(mol).strip()
if mol_id == '':
mol_id = '#' + str(i) # fallback if name is empty
else:
mol_id = '\'%s\' (#%s)' % (mol_id, str(i))
try:
# generate conformer ensemble for read molecule
status, num_confs = generateConformationEnsembles(mol, conf_gen)
# output generated ensemble (if available)
if num_confs > 0:
if not writer.write(mol):
sys.exit('Error: output of conformer ensemble for molecule %s failed' % mol_id)
except Exception as e:
sys.exit('Error: conformer ensemble generation or output for molecule %s failed: %s' % (mol_id, str(e)))
writer.close()
3. Pharmacophore Generation and Processing
This section is about the generation and processing of pharmacophore models. The pharmacophore concept is widely used in drug design and cheminformatics and represents a versatile tool to understand and describe interactions between ligands and their biological targets.
Note
Functionality for the generation and processing of pharmacophore models resides in package CDPL.Pharm. Currently available pharmacophore input/output formats are LigandScout’s PML and CDPKit’s native CDF format.
3.1. Pharmacophore generation for single-conformer molecules
The following example script generates a pharmacophore model for each input molecule and outputs the pharmacophore data in PML format.
import CDPL.Chem as Chem
import CDPL.Pharm as Pharm
def generatePharmacophore(mol: Chem.Molecule) -> Pharm.Pharmacophore:
"""
Generates the pharmacophore of the molecule.
Parameters:
- mol (Chem.Molecule): Molecule to generate a pharmacophore for.
Returns:
- Pharm.Pharmacophore: Pharmacophore of the argument molecule.
"""
Pharm.prepareForPharmacophoreGeneration(mol) # first call utility function preparing the molecule for pharmacophore generation
ph4_gen = Pharm.DefaultPharmacophoreGenerator() # create an instance of the pharmacophore generator default implementation
ph4 = Pharm.BasicPharmacophore() # create an instance of the default implementation of the Pharm.Pharmacophore interface
ph4_name = Chem.getName(mol) # use the name of the input molecule as pharmacophore name
ph4_gen.generate(mol, ph4) # generate the pharmacophore
Pharm.setName(ph4, ph4_name) # set the pharmacophore name
return ph4
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
mols = [list of BasicMolecules] # Example list of BasicMolecules
# create writer for the generated pharmacophores (format specified by file extension)
writer = Pharm.FeatureContainerWriter("path_to_output_file.pml")
for mol in mols:
# compose a simple molecule identifier
mol_id = Chem.getName(mol).strip()
if mol_id == '':
mol_id = '#' + str(i) # fallback if name is empty
else:
mol_id = '\'%s\' (#%s)' % (mol_id, str(i))
try:
ph4 = generatePharmacophore(mol) # generate pharmacophore
if not writer.write(ph4): # output pharmacophore
sys.exit('Error: writing generated pharmacophore %s failed' % mol_id)
except Exception as e:
sys.exit('Error: pharmacophore generation or output for molecule %s failed: %s' % (mol_id, str(e)))
writer.close()
3.2. Pharmacophore generation for multi-conformer molecules
In the following example script a pharmacophore model for each conformer of the input molecule will be generated. The name of the generated pharmacophore is set to the name of the corresponding molecule plus a suffix specifying the conformer index.
import CDPL.Chem as Chem
import CDPL.Pharm as Pharm
def generatePharmacophore(mol: Chem.Molecule, conf_idx: int) -> Pharm.Pharmacophore:
"""
Generates the pharmacophore of the molecule using atom coordinates of the specified conformation.
Parameters:
- mol (Chem.Molecule): Molecule to generate a pharmacophore for.
- conf_idx (int): Index of the conformation to use for the pharmacophore generation.
Returns:
- Pharm.Pharmacophore: Pharmacophore of the argument molecule using coordinates of the specifies conformer.
"""
if conf_idx < 1: # for a new molecule
Pharm.prepareForPharmacophoreGeneration(mol) # first call utility function preparing the molecule for pharmacophore generation
ph4_gen = Pharm.DefaultPharmacophoreGenerator() # create an instance of the pharmacophore generator default implementation
ph4 = Pharm.BasicPharmacophore() # create an instance of the default implementation of the Pharm.Pharmacophore interface
ph4_name = Chem.getName(mol) # use the name of the input molecule as pharmacophore name
# use atom 3D coordinates of the specified conf.
ph4_gen.setAtom3DCoordinatesFunction(Chem.AtomConformer3DCoordinatesFunctor(conf_idx))
# append conformer index to the pharmacophore name
ph4_name += '#' + str(conf_idx)
ph4_gen.generate(mol, ph4) # generate the pharmacophore
Pharm.setName(ph4, ph4_name) # set the pharmacophore name
return ph4
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
mols = [list of BasicMolecules] # Example list of BasicMolecules
# create writer for the generated pharmacophores (format specified by file extension)
writer = Pharm.FeatureContainerWriter("path_to_output_file.pml")
for mol in mols:
# compose a simple molecule identifier
mol_id = Chem.getName(mol).strip()
if mol_id == '':
mol_id = '#' + str(i) # fallback if name is empty
else:
mol_id = '\'%s\' (#%s)' % (mol_id, str(i))
num_confs = Chem.getNumConformations(mol)
try:
for conf_idx in range(num_confs): # for each conformer
ph4 = generatePharmacophore(mol, conf_idx) # generate pharmacophore
if not writer.write(ph4): # output pharmacophore
sys.exit('Error: writing generated pharmacophore %s failed' % mol_id)
except Exception as e:
sys.exit('Error: pharmacophore generation or output for molecule %s failed: %s' % (mol_id, str(e)))
writer.close()
3.3. Generating ligand-receptor interaction pharmacophores
The following example shows how to generate 3D pharmacophore models that describe observed interactions between a molecule and proximal binding site residues. The script also demonstrates how to read and preprocess biological macromolecules.
Note
The receptor structure can be provided in the formats *.mol2, *.pdb, or *.mmtf
import CDPL.Chem as Chem
import CDPL.Pharm as Pharm
def processReceptorStructure(path: str, strip_res_list: bool) -> Chem.Molecule:
"""
Reads and preprocesses the specified receptor structure.
Parameters:
- path (str): Path to the receptor structure file.
- strip_res_list (bool): Whitespace separated list of PDB three-letter codes specifying residues to remove from the receptor structure (e.g. an existing ligand).
Returns:
- Chem.Molecule: Receptor structure.
"""
# create reader for receptor structure (format specified by file extension)
reader = Chem.MoleculeReader("path_to_receptor_structure_file.pdb")")
sup_fmts = [ Chem.DataFormat.MOL2,
Biomol.DataFormat.PDB,
Biomol.DataFormat.MMTF ]
# check if the format is supported by this script
if reader.getDataFormat() not in sup_fmts:
sys.exit('Error: receptor input file format \'%s\' not supported' % name_and_ext[1])
rec_mol = Chem.BasicMolecule() # create an instance of the default implementation of the
# Chem.Molecule interface that will store the receptor struct.
try:
if not reader.read(rec_mol): # read receptor structure
sys.exit('Error: reading receptor structure failed')
except Exception as e:
sys.exit('Error: reading receptor structure failed:\n' + str(e))
# preprocess the receptor structure (removal of residues and
# calculation of properties required by the pharm. generation procedure)
try:
# if structure comes from an MOL2 file, convert MOL2 residue data into PDB-style data
if reader.getDataFormat() == Chem.DataFormat.MOL2:
Biomol.convertMOL2ToPDBResidueInfo(rec_mol, True)
rem_atoms = False
# delete atoms belonging to residues that should be stripped
if strip_res_list:
atoms_to_rem = Chem.Fragment() # will store the atoms to delete
res_to_strip = { tlc.upper() for tlc in strip_res_list }
for atom in rec_mol.atoms: # identify and note atoms belonging to the stripped residues
if Biomol.getResidueCode(atom).upper() in res_to_strip:
atoms_to_rem.addAtom(atom)
if atoms_to_rem.numAtoms > 0:
rec_mol -= atoms_to_rem # delete atoms from the receptor structure
rem_atoms = True
# prepares the receptor structure for pharmacophore generation
Chem.perceiveSSSR(rec_mol, rem_atoms)
Chem.setRingFlags(rec_mol, rem_atoms)
Chem.calcImplicitHydrogenCounts(rec_mol, rem_atoms)
Chem.perceiveHybridizationStates(rec_mol, rem_atoms)
Chem.setAromaticityFlags(rec_mol, rem_atoms)
if Chem.makeHydrogenComplete(rec_mol): # make implicit hydrogens (if any) explicit
Chem.calcHydrogen3DCoordinates(rec_mol) # calculate 3D coordinates for the added expl. hydrogens
Biomol.setHydrogenResidueSequenceInfo(rec_mol, False) # set residue information for the added expl. hydrogens
MolProp.calcAtomHydrophobicities(rec_mol, False) # calculate atom hydrophobicity values (needed for hydrophobic
# pharm. feature generation)
except Exception as e:
sys.exit('Error: processing of receptor structure failed: ' + str(e))
return rec_mol
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
# Settings
strip_res_list = False # Whitespace separated list of PDB three-letter codes specifying residues to remove from the receptor structure (e.g. an existing ligand)
gen_x_vols = False # Generate exclusion volume spheres on pharm. feature atoms of interacting residues
lig_mols = [list of BasicMolecules] # Example list of ligand molecules
rec_mol = processReceptorStructure(path_to_receptor_structure_file, strip_res_list) # read and preprocess the receptor structure
ph4_writer = Pharm.FeatureContainerWriter("path_to_pha_file.pml") # create writer for the generated pharmacophores
# (format specified by file extension)
ia_ph4 = Pharm.BasicPharmacophore() # create an instance of the default implementation of the Pharm.Pharmacophore
# interface that will store the generated pharmacophores
ph4_gen = Pharm.InteractionPharmacophoreGenerator() # create an instance of the pharmacophore generator
ph4_gen.addExclusionVolumes(gen_x_vols) # specify whether to generate exclusion volume spheres
# on pharm. feature atoms of interacting residues
# process ligand molecules one after the other
for lig_mol in lig_mols:
mol_id = Chem.getName(lig_mol).strip() # compose a simple ligand identifier for messages
if mol_id == '':
mol_id = '#' + str(i) # fallback if name is empty or not available
else:
mol_id = '\'%s\' (#%s)' % (mol_id, str(i))
try:
Pharm.prepareForPharmacophoreGeneration(lig_mol) # make ligand ready for pharm. generation
ph4_gen.generate(lig_mol, rec_mol, ia_ph4, True) # generate the pharmacophore (True = extract ligand environment residues on-the-fly)
try:
if not ph4_writer.write(ia_ph4): # output pharmacophore
sys.exit('Error: writing interaction pharmacophore of molecule %s failed: %s' % (mol_id, str(e)))
except Exception as e: # handle exception raised in case of severe write errors
sys.exit('Error: writing interaction pharmacophore of molecule %s failed: %s' % (mol_id, str(e)))
except Exception as e: # handle exception raised in case of severe processing errors
sys.exit('Error: interaction pharmacophore generation for molecule %s failed: %s' % (mol_id, str(e)))
ph4_writer.close()
3.4. Alignment of molecules to a reference pharmacophore
The following example shows how to align a set of molecules to a reference pharmacophore.
The function readRefPharmacophore()
reads the reference pharmacophore from a specified file.
The function genPharmacophore()
generates and returns the pharmacophore of a molecule.
The function clearFeatureOrientations()
removes feature orientation information and sets the feature geometry to Pharm.FeatureGeometry.SPHERE
.
The following variables control the alignment process and the reported results:
pos_only: Controls whether only the position of features is considered during alignment
min_pose_rmsd: Controls the minimum required RMSD between two consecutively output molecule alignment poses
num_out_almnts: Is used to control the number of top-ranked alignment solutions to output per molecule (default: best alignment solution only)
exhaustive: Specifies whether an exhaustive alignment search should be performed
import CDPL.Chem as Chem
import CDPL.Pharm as Pharm
def readRefPharmacophore(filename: str) -> Pharm.Pharmacophore:
"""
Reads and returns the specified alignment reference pharmacophore.
Parameters:
- filename (str): Name of the file storing the reference pharmacophore.
Returns:
- Pharm.Pharmacophore: Reference pharmacophore.
"""
# create pharmacophore reader instance
reader = Pharm.PharmacophoreReader(filename)
# create an instance of the default implementation of the Pharm.Pharmacophore interface
ph4 = Pharm.BasicPharmacophore()
try:
if not reader.read(ph4): # read reference pharmacophore
sys.exit('Error: reading reference pharmacophore failed')
except Exception as e: # handle exception raised in case of severe read errors
sys.exit('Error: reading reference pharmacophore failed: ' + str(e))
return ph4
def generatePharmacophore(mol: Chem.Molecule) -> Pharm.Pharmacophore:
"""
Generates the pharmacophore of the given molecule.
Parameters:
- mol (Chem.Molecule): Molecule to generate a pharmacophore for.
Returns:
- Pharm.Pharmacophore: Pharmacophore of the argument molecule.
"""
Pharm.prepareForPharmacophoreGeneration(mol) # call utility function preparing the molecule for pharmacophore generation
ph4_gen = Pharm.DefaultPharmacophoreGenerator() # create an instance of the pharmacophore generator default implementation
ph4 = Pharm.BasicPharmacophore() # create an instance of the default implementation of the Pharm.Pharmacophore interface
ph4_gen.generate(mol, ph4) # generate the pharmacophore
return ph4
def clearFeatureOrientations(ph4: Pharm.BasicPharmacophore) -> None:
"""
Removes feature orientation informations and sets the feature geometry to Pharm.FeatureGeometry.SPHERE.
Parameters:
- ph4 (Pharm.BasicPharmacophore): Pharmacophore to edit.
"""
for ftr in ph4:
Pharm.clearOrientation(ftr)
Pharm.setGeometry(ftr, Pharm.FeatureGeometry.SPHERE)
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
# Settings
pos_only = True # True = only position of features is considered during alignment
num_out_almnts = 1 # Number of top-ranked alignment solutions to output per molecule (default: best alignment solution only)
min_pose_rmsd = 0.0 # Minimum required RMSD between two consecutively output molecule alignment poses
exhaustive = False # Perform an exhaustive alignment search (default: False)
mols = [list of BasicMolecules] # Example list of BasicMolecules
# read the reference pharmacophore
ref_ph4 = readRefPharmacophore("path_to_reference_pharmacophore.pml")
# create writer for aligned molecules (format specified by file extension)
mol_writer = Chem.MolecularGraphWriter("path_to_output_file.sdf")
# create instance of class implementing the pharmacophore alignment algorithm
almnt = Pharm.PharmacophoreAlignment(True) # True = aligned features have to be within the tolerance spheres of the ref. features
if pos_only: # clear feature orientation information
clearFeatureOrientations(ref_ph4)
almnt.addFeatures(ref_ph4, True) # set reference features (True = first set = reference)
almnt.performExhaustiveSearch(exhaustive) # set minimum number of top. mapped feature pairs
# create pharmacophore fit score calculator instance
almnt_score = Pharm.PharmacophoreFitScore()
# process molecules one after the other
for mol in mols:
# compose a simple molecule identifier
mol_id = Chem.getName(mol).strip()
if mol_id == '':
mol_id = '#' + str(i) # fallback if name is empty
else:
mol_id = '\'%s\' (#%s)' % (mol_id, str(i))
try:
mol_ph4 = generatePharmacophore(mol) # generate input molecule pharmacophore
if pos_only: # clear feature orientation information
clearFeatureOrientations(mol_ph4)
almnt.clearEntities(False) # clear features of previously aligned pharmacophore
almnt.addFeatures(mol_ph4, False) # specify features of the pharmacophore to align
almnt_solutions = [] # stores the found alignment solutions
while almnt.nextAlignment(): # iterate over all alignment solutions that can be found
score = almnt_score(ref_ph4, mol_ph4, almnt.getTransform()) # calculate alignment score
xform = Math.Matrix4D(almnt.getTransform()) # make a copy of the alignment transformation (mol. ph4 -> ref. ph4)
almnt_solutions.append((score, xform))
saved_coords = Math.Vector3DArray() # create data structure for storing 3D coordinates
Chem.get3DCoordinates(mol, saved_coords) # save the original atom coordinates
struct_data = None
if Chem.hasStructureData(mol): # get existing structure data if available
struct_data = Chem.getStructureData(mol)
else: # otherwise create and set new structure data
struct_data = Chem.StringDataBlock()
Chem.setStructureData(mol, strut)
# add alignment score entry to struct. data
struct_data.addEntry('<PharmFitScore>', '')
output_cnt = 0
last_pose = None
# order solutions by desc. alignment score
almnt_solutions = sorted(almnt_solutions, key=lambda entry: entry[0], reverse=True)
# output molecule alignment poses until the max. number of best output solutions has been reached
for solution in almnt_solutions:
if output_cnt == num_out_almnts:
break
curr_pose = Math.Vector3DArray(saved_coords)
Math.transform(curr_pose, solution[1]) # transform atom coordinates
# check whether the current pose is 'different enough' from
# the last pose to qualify for output
if min_pose_rmsd > 0.0 and last_pose and Math.calcRMSD(last_pose, curr_pose) < min_pose_rmsd:
continue
# apply the transformed atom coordinates
Chem.set3DCoordinates(mol, curr_pose)
# store alignment score in the struct. data entry
struct_data[len(struct_data) - 1].setData(format(solution[0], '.4f'))
try:
if not mol_writer.write(mol): # output the alignment pose of the molecule
sys.exit('Error: writing alignment pose of molecule %s failed: %s' % (mol_id, str(e)))
except Exception as e: # handle exception raised in case of severe write errors
sys.exit('Error: writing alignment pose of molecule %s failed: %s' % (mol_id, str(e)))
last_pose = curr_pose
output_cnt += 1
except Exception as e:
sys.exit('Error: pharmacophore alignment of molecule %s failed: %s' % (mol_id, str(e)))
mol_writer.close()
3.5. Retrieving information about pharmacophores and features
The script below demonstrates how to retrieve basic properties of pharmacophore features.
The function print_pharmacophore_properties()
outputs all (available) properties of the features stored
in the given feature container. Built-in feature properties are feature type, geometry, tolerance, weight, and
hydrophobicity.
import CDPL.Chem as Chem
import CDPL.Pharm as Pharm
def print_pharmacophore_properties(ph4: Pharm.FeatureContainer) -> None:
"""
Outputs all (available) properties of the features stored in the given feature container.
Parameters:
- ph4 (Pharm.FeatureContainer): Feature container to process.
"""
ftr_type_str = { Pharm.FeatureType.UNKNOWN : 'UNKNOWN',
Pharm.FeatureType.HYDROPHOBIC : 'HYDROPHOBIC',
Pharm.FeatureType.AROMATIC : 'AROMATIC',
Pharm.FeatureType.NEGATIVE_IONIZABLE : 'NEGATIVE_IONIZABLE',
Pharm.FeatureType.POSITIVE_IONIZABLE : 'POSITIVE_IONIZABLE',
Pharm.FeatureType.H_BOND_DONOR : 'H_BOND_DONOR',
Pharm.FeatureType.H_BOND_ACCEPTOR : 'H_BOND_ACCEPTOR',
Pharm.FeatureType.HALOGEN_BOND_DONOR : 'HALOGEN_BOND_DONOR',
Pharm.FeatureType.HALOGEN_BOND_ACCEPTOR : 'HALOGEN_BOND_ACCEPTOR',
Pharm.FeatureType.EXCLUSION_VOLUME : 'EXCLUSION_VOLUME' }
geom_str = { Pharm.FeatureGeometry.UNDEF : 'UNDEF',
Pharm.FeatureGeometry.SPHERE : 'SPHERE',
Pharm.FeatureGeometry.VECTOR : 'VECTOR',
Pharm.FeatureGeometry.PLANE : 'PLANE' }
print('Composition of pharmacophore \'%s\':' % Pharm.getName(ph4))
for i in range(0, len(ph4)):
ftr = ph4[i]
print(' - Feature #%s:' % str(i))
print(' - Type: %s' % ftr_type_str[Pharm.getType(ftr)])
print(' - Geometry: %s' % geom_str[Pharm.getGeometry(ftr)])
print(' - Tolerance: %s' % Pharm.getTolerance(ftr))
print(' - Weight: %s' % Pharm.getWeight(ftr))
print(' - Optional: %s' % Pharm.getOptionalFlag(ftr))
print(' - Disabled: %s' % Pharm.getDisabledFlag(ftr))
print(' - Length: %s' % Pharm.getLength(ftr))
print(' - Hydrophobicity: %s' % Pharm.getHydrophobicity(ftr))
if Chem.has3DCoordinates(ftr): # Pharm.Feature derives from Chem.Entity3D - therefore a function from the Chem package is used here!
print(' - Position: %s' % Chem.get3DCoordinates(ftr))
if Pharm.hasOrientation(ftr):
print(' - Orientation: %s' % Pharm.getOrientation(ftr))
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
# create reader for input pharmacophores (format specified by file extension)
reader = Pharm.PharmacophoreReader("path_to_input_file.pml")
# create an instance of the default implementation of the Pharm.Pharmacophore interface
ph4 = Pharm.BasicPharmacophore()
# process pharmacophores one after the other until the end of input has been reached
try:
while reader.read(ph4):
try:
print_pharmacophore_properties(ph4)
except Exception as e:
sys.exit('Error: processing of pharmacophore failed: ' + str(e))
except Exception as e: # handle exception raised in case of severe read errors
sys.exit('Error: reading pharmacophore failed: ' + str(e))
4. Calculation of Atom Properties
The CDPL provides a wide panel of pre-defined atom and bond properties. All of these properties can be retrieved and set/calculated by corresponding function calls as demonstrated in the following example scripts.
4.1. Calculation of atom classification properties
Atomic properties provide basic chemical information about each atom of a molecule. Such properties are e.g. chemical element, formal charge, implicit hydrogen counts, hybridization states, aromaticity, and so on. The following code snippet shows how to calculate and retrieve properties that provide higher-order information about the atoms in a molecular graph.
Note
The following examples use functionality provided by the CDPL.Chem and CDPL.MolProp package.
import sys
import os
import CDPL.Chem as Chem
import CDPL.MolProp as MolProp
def outputProperties(molgraph: Chem.MolecularGraph) -> None:
"""
Outputs the corresponding properties of each atom of the provided molecular graph.
Parameters:
- molgraph (Chem.MolecularGraph): Molecular graph to process.
"""
Chem.calcImplicitHydrogenCounts(molgraph, False) # calculate implicit hydrogen counts and set corresponding property for all atoms
Chem.perceiveHybridizationStates(molgraph, False) # perceive atom hybridization states and set corresponding property for all atoms
Chem.perceiveSSSR(molgraph, False) # perceive smallest set of smallest rings and store as Chem.MolecularGraph property
Chem.setRingFlags(molgraph, False) # perceive cycles and set corresponding atom and bond properties
Chem.setAromaticityFlags(molgraph, False) # perceive aromaticity and set corresponding atom and bond properties
MolProp.perceiveHBondDonorAtomTypes(molgraph, False) # perceive H-bond donor atom types and set corresponding atom properties
MolProp.perceiveHBondAcceptorAtomTypes(molgraph, False) # perceive H-bond acceptor atom types and set corresponding atom properties
hba_type_str = { MolProp.HBondAcceptorAtomType.UNDEF : 'UNDEF',
MolProp.HBondAcceptorAtomType.NONE : 'NONE',
MolProp.HBondAcceptorAtomType.O_H2O : 'O_H2O',
MolProp.HBondAcceptorAtomType.O_UREA : 'O_UREA',
MolProp.HBondAcceptorAtomType.O_BARBITURIC_ACID : 'O_BARBITURIC_ACID',
MolProp.HBondAcceptorAtomType.O_URIC_ACID : 'O_URIC_ACID',
MolProp.HBondAcceptorAtomType.O_ETHER : 'O_ETHER',
MolProp.HBondAcceptorAtomType.O_AMIDE : 'O_AMIDE',
MolProp.HBondAcceptorAtomType.O_N_OXIDE : 'O_N_OXIDE',
MolProp.HBondAcceptorAtomType.O_ACID : 'O_ACID',
MolProp.HBondAcceptorAtomType.O_ESTER : 'O_ESTER',
MolProp.HBondAcceptorAtomType.O_SULFOXIDE : 'O_SULFOXIDE',
MolProp.HBondAcceptorAtomType.O_NITRO : 'O_NITRO',
MolProp.HBondAcceptorAtomType.O_SELEN_OXIDE : 'O_SELEN_OXIDE',
MolProp.HBondAcceptorAtomType.O_ALDEHYD : 'O_ALDEHYD',
MolProp.HBondAcceptorAtomType.O_KETONE : 'O_KETONE',
MolProp.HBondAcceptorAtomType.O_ALCOHOL : 'O_ALCOHOL',
MolProp.HBondAcceptorAtomType.N_NH3 : 'N_NH3',
MolProp.HBondAcceptorAtomType.N_DIAMINE : 'N_DIAMINE',
MolProp.HBondAcceptorAtomType.N_MONO_DI_NITRO_ANILINE : 'N_MONO_DI_NITRO_ANILINE',
MolProp.HBondAcceptorAtomType.N_TRI_NITRO_ANILINE : 'N_TRI_NITRO_ANILINE',
MolProp.HBondAcceptorAtomType.N_HALOGENO_ANILINE : 'N_HALOGENO_ANILINE',
MolProp.HBondAcceptorAtomType.N_ANILINE : 'N_ANILINE',
MolProp.HBondAcceptorAtomType.N_NITRILE : 'N_NITRILE',
MolProp.HBondAcceptorAtomType.N_AZOLE : 'N_AZOLE',
MolProp.HBondAcceptorAtomType.N_AMINE : 'N_AMINE',
MolProp.HBondAcceptorAtomType.N_AMIDINE : 'N_AMIDINE',
MolProp.HBondAcceptorAtomType.N_AZO : 'N_AZO',
MolProp.HBondAcceptorAtomType.N_AZINE : 'N_AZINE',
MolProp.HBondAcceptorAtomType.N_DIAZINE : 'N_DIAZINE',
MolProp.HBondAcceptorAtomType.N_IMINE : 'N_IMINE',
MolProp.HBondAcceptorAtomType.S_SULFIDE : 'S_SULFIDE',
MolProp.HBondAcceptorAtomType.S_THIOUREA : 'S_THIOUREA',
MolProp.HBondAcceptorAtomType.P_MONO_DI_PHOSPHINE : 'P_MONO_DI_PHOSPHINE',
MolProp.HBondAcceptorAtomType.P_TRI_PHOSPHINE : 'P_TRI_PHOSPHINE' }
hbd_type_str = { MolProp.HBondDonorAtomType.UNDEF : 'UNDEF',
MolProp.HBondDonorAtomType.NONE : 'NONE',
MolProp.HBondDonorAtomType.I_HI : 'I_HI',
MolProp.HBondDonorAtomType.BR_HBR : 'BR_HBR',
MolProp.HBondDonorAtomType.CL_HCL : 'CL_HCL',
MolProp.HBondDonorAtomType.S_HSCN : 'S_HSCN',
MolProp.HBondDonorAtomType.F_HF : 'F_HF',
MolProp.HBondDonorAtomType.H_H2 : 'H_H2',
MolProp.HBondDonorAtomType.C_HCN : 'C_HCN',
MolProp.HBondDonorAtomType.C_ETHINE : 'C_ETHINE',
MolProp.HBondDonorAtomType.N_HN3 : 'N_HN3',
MolProp.HBondDonorAtomType.N_NH3 : 'N_NH3',
MolProp.HBondDonorAtomType.N_NH4 : 'N_NH4',
MolProp.HBondDonorAtomType.N_AMINE : 'N_AMINE',
MolProp.HBondDonorAtomType.N_AMINIUM : 'N_AMINIUM',
MolProp.HBondDonorAtomType.N_ANILINE : 'N_ANILINE',
MolProp.HBondDonorAtomType.N_MONO_DI_NITRO_ANILINE : 'N_MONO_DI_NITRO_ANILINE',
MolProp.HBondDonorAtomType.N_TRI_NITRO_ANILINE : 'N_TRI_NITRO_ANILINE',
MolProp.HBondDonorAtomType.N_PYRROLE : 'N_PYRROLE',
MolProp.HBondDonorAtomType.N_AMIDE : 'N_AMIDE',
MolProp.HBondDonorAtomType.N_IMINE : 'N_IMINE',
MolProp.HBondDonorAtomType.N_IMINIUM : 'N_IMINIUM',
MolProp.HBondDonorAtomType.S_H2S : 'S_H2S',
MolProp.HBondDonorAtomType.S_HS : 'S_HS',
MolProp.HBondDonorAtomType.S_THIOL : 'S_THIOL',
MolProp.HBondDonorAtomType.O_H3PO4 : 'O_H3PO4',
MolProp.HBondDonorAtomType.O_H2CO3 : 'O_H2CO3',
MolProp.HBondDonorAtomType.O_HCO3 : 'O_HCO3',
MolProp.HBondDonorAtomType.O_H2O2 : 'O_H2O2',
MolProp.HBondDonorAtomType.O_H2O : 'O_H2O',
MolProp.HBondDonorAtomType.O_CF3SO3H : 'O_CF3SO3H',
MolProp.HBondDonorAtomType.O_HCLO4 : 'O_HCLO4',
MolProp.HBondDonorAtomType.O_H2SO4 : 'O_H2SO4',
MolProp.HBondDonorAtomType.O_HNO3 : 'O_HNO3',
MolProp.HBondDonorAtomType.O_HSO4 : 'O_HSO4',
MolProp.HBondDonorAtomType.O_HNO2 : 'O_HNO2',
MolProp.HBondDonorAtomType.O_NH2OH : 'O_NH2OH',
MolProp.HBondDonorAtomType.O_H2PO4 : 'O_H2PO4',
MolProp.HBondDonorAtomType.O_H3BO3 : 'O_H3BO3',
MolProp.HBondDonorAtomType.O_H4SIO4 : 'O_H4SIO4',
MolProp.HBondDonorAtomType.O_HPO4 : 'O_HPO4',
MolProp.HBondDonorAtomType.O_H2BO3 : 'O_H2BO3',
MolProp.HBondDonorAtomType.O_HO : 'O_HO',
MolProp.HBondDonorAtomType.O_SULFONIC_ACID : 'O_SULFONIC_ACID',
MolProp.HBondDonorAtomType.O_MONO_DI_NITRO_PHENOL : 'O_MONO_DI_NITRO_PHENOL',
MolProp.HBondDonorAtomType.O_HALOGENO_ALCOHOL : 'O_HALOGENO_ALCOHOL',
MolProp.HBondDonorAtomType.O_ALCOHOL : 'O_ALCOHOL',
MolProp.HBondDonorAtomType.O_TRI_NITRO_PHENOL : 'O_TRI_NITRO_PHENOL',
MolProp.HBondDonorAtomType.O_HALOGENO_PHENOL : 'O_HALOGENO_PHENOL',
MolProp.HBondDonorAtomType.O_PHENOL : 'O_PHENOL',
MolProp.HBondDonorAtomType.O_CARBOXYLIC_ACID : 'O_CARBOXYLIC_ACID',
MolProp.HBondDonorAtomType.O_HALOGENO_CARBOXYCLIC_ACID : 'O_HALOGENO_CARBOXYCLIC_ACID',
MolProp.HBondDonorAtomType.O_ENOL : 'O_ENOL',
MolProp.HBondDonorAtomType.O_OXIME : 'O_OXIME',
MolProp.HBondDonorAtomType.O_CL5_PHENOL : 'O_CL5_PHENOL' }
for atom in molgraph.atoms:
print('- Atom #%s' % str(molgraph.getAtomIndex(atom)))
print('\tIs std. hydrogen: %s' % str(MolProp.isOrdinaryHydrogen(atom, molgraph)))
print('\tIs heavy atom: %s' % str(MolProp.isHeavy(atom)))
print('\tIs unsaturated: %s' % str(MolProp.isUnsaturated(atom, molgraph)))
print('\tIs H-bond acceptor: %s' % str(MolProp.isHBondAcceptor(atom, molgraph)))
print('\tH-bond acceptor type: %s' % hba_type_str[MolProp.getHBondAcceptorType(atom)])
print('\tIs H-bond donor: %s' % str(MolProp.isHBondDonor(atom, molgraph)))
print('\tH-bond donor type: %s' % hbd_type_str[MolProp.getHBondDonorType(atom)])
print('\tIs carbonyl carbon: %s' % str(MolProp.isCarbonylLikeAtom(atom, molgraph, True, True)))
print('\tIs amide carbon: %s' % str(MolProp.isAmideCenterAtom(atom, molgraph, True, True)))
print('\tIs amide nitrogen: %s' % str(MolProp.isAmideNitrogen(atom, molgraph, True, True)))
print('\tIs invertible nitrogen: %s' % str(MolProp.isInvertibleNitrogen(atom, molgraph)))
print('\tIs planar nitrogen: %s' % str(MolProp.isPlanarNitrogen(atom, molgraph)))
if len(sys.argv) < 2:
sys.exit('Usage: %s <input mol. file>' % sys.argv[0])
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
mols = [list of BasicMolecules] # Example list of BasicMolecules
# process molecules one after the other
for mol in mols:
try:
outputProperties(mol)
except Exception as e:
sys.exit('Error: processing of molecule failed: ' + str(e))
4.2. Calculation of connectivity properties
The code snippet below shows to calculate various atom properties that depend on their connectivity to other atoms of the molecular graph such as the number of connected carbon atoms, heteroatoms, halogens, heavy atoms, chain atoms, ring atoms, aromatic atoms, and many more.
import sys
import os
import CDPL.Chem as Chem
import CDPL.MolProp as MolProp
def outputProperties(molgraph: Chem.MolecularGraph) -> None:
"""
Outputs the corresponding properties of each atom of the provided molecular graph.
Parameters:
- molgraph (Chem.MolecularGraph): Molecular graph to process.
"""
Chem.calcImplicitHydrogenCounts(molgraph, False) # calculate implicit hydrogen counts and set corresponding property for all atoms
Chem.perceiveHybridizationStates(molgraph, False) # perceive atom hybridization states and set corresponding property for all atoms
Chem.perceiveSSSR(molgraph, False) # perceive smallest set of smallest rings and store as Chem.MolecularGraph property
Chem.setRingFlags(molgraph, False) # perceive cycles and set corresponding atom and bond properties
Chem.setAromaticityFlags(molgraph, False) # perceive aromaticity and set corresponding atom and bond properties
vsepr_geom_str = { MolProp.CoordinationGeometry.UNDEF : 'UNDEF',
MolProp.CoordinationGeometry.NONE : 'NONE',
MolProp.CoordinationGeometry.LINEAR : 'LINEAR',
MolProp.CoordinationGeometry.TRIGONAL_PLANAR : 'TRIGONAL_PLANAR',
MolProp.CoordinationGeometry.TETRAHEDRAL : 'TETRAHEDRAL',
MolProp.CoordinationGeometry.TRIGONAL_BIPYRAMIDAL : 'TRIGONAL_BIPYRAMIDAL',
MolProp.CoordinationGeometry.OCTAHEDRAL : 'OCTAHEDRAL',
MolProp.CoordinationGeometry.PENTAGONAL_BIPYRAMIDAL : 'PENTAGONAL_BIPYRAMIDAL',
MolProp.CoordinationGeometry.SQUARE_ANTIPRISMATIC : 'SQUARE_ANTIPRISMATIC',
MolProp.CoordinationGeometry.BENT : 'BENT',
MolProp.CoordinationGeometry.TRIGONAL_PYRAMIDAL : 'TRIGONAL_PYRAMIDAL',
MolProp.CoordinationGeometry.SQUARE_PLANAR : 'SQUARE_PLANAR',
MolProp.CoordinationGeometry.SQUARE_PYRAMIDAL : 'SQUARE_PYRAMIDAL',
MolProp.CoordinationGeometry.T_SHAPED : 'T_SHAPED',
MolProp.CoordinationGeometry.SEESAW : 'SEESAW',
MolProp.CoordinationGeometry.PENTAGONAL_PYRAMIDAL : 'PENTAGONAL_PYRAMIDAL',
MolProp.CoordinationGeometry.PENTAGONAL_PLANAR : 'PENTAGONAL_PLANAR' }
for atom in molgraph.atoms:
print('- Atom #%s' % str(molgraph.getAtomIndex(atom)))
print('\tNum. connected std. hydrogens (incl. impl. H): %s' % str(MolProp.getOrdinaryHydrogenCount(atom, molgraph)))
print('\tNum. connected carbon atoms: %s' % str(MolProp.getExplicitAtomCount(atom, molgraph, Chem.AtomType.C)))
print('\tNum. connected heteroatoms: %s' % str(MolProp.getExplicitAtomCount(atom, molgraph, Chem.AtomType.HET, False)))
print('\tNum. connected halogens: %s' % str(MolProp.getExplicitAtomCount(atom, molgraph, Chem.AtomType.X, False)))
print('\tNum. connected heavy atoms: %s' % str(MolProp.getHeavyAtomCount(atom, molgraph)))
print('\tNum. connected chain atoms (excl. impl. H): %s' % str(MolProp.getExplicitChainAtomCount(atom, molgraph)))
print('\tNum. connected chain atoms (incl. impl. H): %s' % str(MolProp.getChainAtomCount(atom, molgraph)))
print('\tNum. connected ring atoms: %s' % str(MolProp.getRingAtomCount(atom, molgraph)))
print('\tNum. connected aromatic atoms: %s' % str(MolProp.getAromaticAtomCount(atom, molgraph)))
print('\tNum. incident bonds (excl. impl. H): %s' % str(MolProp.getExplicitBondCount(atom, molgraph)))
print('\tNum. incident bonds (incl. impl. H): %s' % str(MolProp.getBondCount(atom, molgraph)))
print('\tNum. incident single bonds (excl. impl. H): %s' % str(MolProp.getExplicitBondCount(atom, molgraph, 1)))
print('\tNum. incident single bonds (incl. impl. H): %s' % str(MolProp.getBondCount(atom, molgraph, 1)))
print('\tNum. incident double bonds: %s' % str(MolProp.getBondCount(atom, molgraph, 2)))
print('\tNum. incident triple bonds: %s' % str(MolProp.getBondCount(atom, molgraph, 3)))
print('\tNum. incident chain bonds (excl. impl. H): %s' % str(MolProp.getExplicitChainBondCount(atom, molgraph)))
print('\tNum. incident chain bonds (incl. impl. H): %s' % str(MolProp.getChainBondCount(atom, molgraph)))
print('\tNum. incident ring bonds (incl. impl. H): %s' % str(MolProp.getRingBondCount(atom, molgraph)))
print('\tNum. incident aromatic bonds (incl. impl. H): %s' % str(MolProp.getAromaticBondCount(atom, molgraph)))
print('\tNum. incident heavy atom bonds (incl. impl. H): %s' % str(MolProp.getHeavyBondCount(atom, molgraph)))
print('\tNum. incident rotatable bonds (incl. impl. H): %s' % str(MolProp.getRotatableBondCount(atom, molgraph, False, False)))
print('\tValence (excl. impl. H): %s' % str(MolProp.calcExplicitValence(atom, molgraph)))
print('\tValence (incl. impl. H): %s' % str(MolProp.calcValence(atom, molgraph)))
print('\tSteric number: %s' % str(MolProp.calcStericNumber(atom, molgraph)))
print('\tVSEPR coordination geometry: %s' % vsepr_geom_str[MolProp.getVSEPRCoordinationGeometry(atom, molgraph)])
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
mols = [list of BasicMolecules] # Example list of BasicMolecules
# process molecules one after the other
for mol in mols:
try:
outputProperties(mol)
except Exception as e:
sys.exit('Error: processing of molecule failed: ' + str(e))
5. Calculation of Molecular Structure Descriptors
The calculation of molecule structure descriptors is one of the fundamental operations in cheminformatics. Such descriptors e.g. allow for a modeling and prediction of various structure-dependent properties with mathematical methods. However, they can also be put to use for numerous other applications such as:
Search for molecules that are structurally similar to a query molcule
Pre-filtering step for substructure searching in large chemical databases
Identifying the presence of particular functional groups or fragments in molecules
Note
The CDPL.Descr package provides functionality for the generation of various well-known molecule descriptors and fingerprints.
5.1. Extended connectivity fingerprints (ECFPs)
Morgan circular fingerprints, also known as extended connectivity fingerprints (ECFPs) [3], are a type of structural fingerprint that encodes the local chemical environment of each atom in a molecule as a particular bit in a bitset. They are widely used in cheminformatics for various tasks, such as similarity searching, virtual screening, and machine learning.
The following code snippet calculates and outputs the ECFP fingerprints of the provided molecules. The ECFP generation process can be influenced by the following parameters:
num_bits: The number of bits of the fingerprint (default: 1024)
radius: Max. atom environment radius in number of bonds (default: 2)
inc_hs: Whether to include explicit hydrogens (by default, the fingerprint is generated for the H-deplete molecular graph)
inc_config: Whether to include atom chirality (by default, the fingerprint is generated for the H-deplete molecular graph)
import CDPL.Descr as Descr
import CDPL.Util as Util
def genECFP(mol: Chem.Molecule, num_bits: int, radius: int, inc_hs: bool, inc_config: bool) -> Util.BitSet:
"""
Generates the binary ECFP for the given molecule.
Parameters:
- mol (Chem.Molecule): Molecule to process.
- num_bits (int): Number of bits of the fingerprint.
- radius (int): Max. atom environment radius in number of bonds.
- inc_hs (bool): Whether to include explicit hydrogens.
- inc_config (bool): Whether to include atom chirality.
Returns:
- Util.BitSet: The generated fingerprint.
"""
Chem.calcImplicitHydrogenCounts(mol, False) # calculate implicit hydrogen counts (if not yet done)
Chem.perceiveHybridizationStates(mol, False) # perceive atom hybridization states and set corresponding property for all atoms
Chem.setRingFlags(mol, False) # perceive cycles and set corresponding atom and bond properties
Chem.perceiveSSSR(mol, False) # perceive smallest set of smallest rings and store as Chem.MolecularGraph property
Chem.setAromaticityFlags(mol, False) # perceive aromaticity and set corresponding atom and bond properties
ecfp_gen = Descr.CircularFingerprintGenerator() # create ECFP generator instance
if inc_config:
ecfp_gen.includeChirality(True) # allow atom chirality to have an impact on the ECFP generation
Chem.calcAtomStereoDescriptors(mol, False) # calculate atom stereo descriptors and set corresponding property for all atoms
if inc_hs:
ecfp_gen.includeHydrogens(True) # include explicit hydrogens in the ECFP generation
Chem.makeHydrogenComplete(mol) # make any implicit hydrogens explicit
fp = Util.BitSet() # create fingerprint bitset
fp.resize(num_bits) # set desired fingerprint size
ecfp_gen.setNumIterations(radius) # set num. iterations (=atom. env. radius)
ecfp_gen.generate(mol) # extract chracteristic structural features
ecfp_gen.setFeatureBits(fp) # set bits associated with the extracted structural features
# if needed, fp could be converted into a numpy single precision float array as follows:
# fp = numpy.array(fp, dtype=numpy.float32)
return fp
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
# Settings
num_bits = 1024 # fingerprint size
radius = 4 # atom environment radius
inc_hs = True # include explicit hydrogens
inc_config = True # include atom chirality
mols = [list of BasicMolecules] # Example list of BasicMolecules
# process molecules one after the other
for mol in mols:
try:
fp = genECFP(mol, num_bits, radius, inc_hs, inc_config)
# do something useful with the fingerprint
except Exception as e:
sys.exit('Error: processing of molecule failed: ' + str(e))
out_file.close()
5.2. FAME atom environment fingerprints
This type of fingerprint encodes the local environment of individual atoms up to a configurable maximum bond-path length. The descriptor was developed for the classification of atoms in the well-known site of metabolism prediction software FAME [16].
The following code snippet calculates and outputs the FAME descriptor for each atom of the provided molecules with the parameter radius specifying the max. atom environment radius in number of bonds (default: 2).
import CDPL.Chem as Chem
def genFAMEDescriptor(ctr_atom: Chem.Atom, molgraph: Chem.MolecularGraph, radius: int) -> numpy.array:
"""
Generates the FAME descriptor for the given atom of the provided molecule.
Parameters:
- ctr_atom (Chem.Atom): Atom for which the FAME descriptor is to be calculated.
- molgraph (Chem.MolecularGraph): Molecule to process.
- radius (int): Max. atom environment radius in number of bonds.
Returns:
- numpy.array: The generated FAME descriptor.
"""
env = Chem.Fragment() # for storing of extracted environment atoms
descr = numpy.zeros((Chem.SybylAtomType.MAX_TYPE + 1) * (radius + 1))
Chem.getEnvironment(ctr_atom, molgraph, radius, env) # extract environment of center atom reaching
# out up to 'radius' bonds
for atom in env.atoms: # iterate over extracted environment atoms
sybyl_type = Chem.getSybylType(atom) # retrieve Sybyl type of environment atom
top_dist = Chem.getTopologicalDistance(ctr_atom, atom, molgraph) # get top. distance between center atom and environment atom
descr[top_dist * (Chem.SybylAtomType.MAX_TYPE + 1) + sybyl_type] += 1 # instead of 1 (= Sybyl type presence) also any other numeric atom
# property could be summed up here
return descr
def procMolecule(molgraph: Chem.MolecularGraph) -> None:
"""
Processes the provided molecule.
Parameters:
- molgraph (Chem.MolecularGraph): Molecule to process.
"""
Chem.calcImplicitHydrogenCounts(molgraph, False) # calculate implicit hydrogen counts and set corresponding property for all atoms
Chem.perceiveHybridizationStates(molgraph, False) # perceive atom hybridization states and set corresponding property for all atoms
Chem.perceiveSSSR(molgraph, False) # perceive smallest set of smallest rings and store as Chem.MolecularGraph property
Chem.setRingFlags(molgraph, False) # perceive cycles and set corresponding atom and bond properties
Chem.setAromaticityFlags(molgraph, False) # perceive aromaticity and set corresponding atom and bond properties
Chem.perceiveSybylAtomTypes(molgraph, False) # perceive Sybyl atom types and set corresponding property for all atoms
Chem.calcTopologicalDistanceMatrix(molgraph, False) # calculate topological distance matrix and store as Chem.MolecularGraph property
for atom in molgraph.atoms:
descr = genFAMEDescriptor(atom, molgraph, 5) # generate atom environment descriptor using a radius of five bonds
print(descr)
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
mols = [list of BasicMolecules] # Example list of BasicMolecules
# process molecules one after the other
for mol in mols:
try:
procMolecule(mol)
except Exception as e:
sys.exit('Error: processing of molecule failed: ' + str(e))
6. Force Field Calculations
Force fields are computational methods that estimate the forces between atoms within molecules and also between molecules. Since they allow for a computationally inexpensive, but nevertheless relatively accurate calculation of the potential energy of a system of atoms, they are essential for many tasks in the field of molecular modeling such as conformer generation and molecular dynamics (MD) simulations. One of the widely recognized force fields is the Merck Molecular Force Field (MMFF94) [6] which has been designed to be applicable to a broad range of organic molecules and thus found widespread use.
6.1. Calculation of MMFF94 atom charges
The MMFF94 force field provides a method and associated parameter set for the calculation of partial atomic charges. They are not only crucial for the estimation of intra- and intermolecular electrostatic interactions but can also be used, e.g., as atom descriptors for ML-based model building.
The following code snippet calculates and outputs the MMFF94 charges of the atoms for a given list of molecules:
Note
Force field related functionality is provided via the CDPL.ForceField package.
import CDPL.Chem as Chem
import CDPL.ForceField as ForceField
def calc_and_output_charges(mol: Chem.BasicMolecule) -> None:
"""
Calculates and outputs the MMFF94 charges of the atoms for the provided molecule.
Parameters:
mol (Chem.BasicMolecule): The molecule for which the MMFF94 charges are to be calculated.
"""
# Various preprocessing steps to prepare the molecule
Chem.calcImplicitHydrogenCounts(mol, False) # calculate implicit hydrogen counts and set corresponding property for all atoms
Chem.makeHydrogenComplete(mol) # make all implicit hydrogens explicit
Chem.perceiveHybridizationStates(mol, False) # perceive atom hybridization states and set corresponding property for all atoms
Chem.perceiveSSSR(mol, False) # perceive smallest set of smallest rings and store as Chem.MolecularGraph property
Chem.setRingFlags(mol, False) # perceive cycles and set corresponding atom and bond properties
Chem.setAromaticityFlags(mol, False) # perceive aromaticity and set corresponding atom and bond properties
ForceField.perceiveMMFF94AromaticRings(mol, False) # perceive aromatic rings according to the MMFF94 aroamticity model and store data as Chem.MolecularGraph property
ForceField.assignMMFF94AtomTypes(mol, False, False) # perceive MMFF94 atom types (tolerant mode) set corresponding property for all atoms
ForceField.assignMMFF94BondTypeIndices(mol, False, False) # perceive MMFF94 bond types (tolerant mode) set corresponding property for all bonds
ForceField.calcMMFF94AtomCharges(mol, False, False) # calculate MMFF94 atom charges (tolerant mode) set corresponding property for all atoms
print('- MMFF94 partial charges')
for atom in mol.atoms:
print('Atom #%s: %s' % (str(atom.getIndex()), str(ForceField.getMMFF94Charge(atom))))
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
mols = [list_of_BasicMolecules]
for mol in mols:
calc_and_output_charges(mol)
7. Substructure Searching and Matching
Substructure searching (a subgraph isomorphism problem) is one of the most fundamental operations in cheminformatics and allows for the identification of molecules that contain a specific structural motif or pattern. Substructure searching needs to be performed in various application areas, such as:
Drug Discovery: Identifying molecules that contain a particular pharmacophore or motif
Chemical Database Querying: Filtering large chemical databases to retrieve molecules of interest
Reactivity Analysis: Identifying the presence of particular functional groups or fragments in molecules
7.1. Filtering molecules based on a SMARTS pattern
The SMARTS notation (SMiles ARbitrary Target Specification) is a language used to describe structural patterns. It’s an extension of the SMILES notation and allows for more complex and specific pattern descriptions. The CDPL provides substructure searching functionality via the class Chem.SubstructureSearch. The code snippet below demonstrates how this class can be used to identify molecules that match a specific structural motif described by a SMARTS pattern.
import CDPL.Chem as Chem
def filterMoleculesBySmarts(input_file: str, output_file: str, smarts_pattern: str, quiet: bool = False) -> None:
"""
Filters molecules from the input file that match the provided SMARTS pattern and writes them to the output file.
Parameters:
input_file (str): Path to the input file containing the molecules to be filtered.
output_file (str): Path to the output file to which the filtered molecules are written.
smarts_pattern (str): SMARTS pattern describing the structural motif to be matched.
quiet (bool): If set to True, no progress information is printed to the console.
"""
try:
sub_srch_ptn = Chem.parseSMARTS(smarts_pattern)
Chem.initSubstructureSearchQuery(sub_srch_ptn, False)
except Exception as e:
print(f'Error: parsing of SMARTS pattern failed: {str(e)}')
return
substr_srch = Chem.SubstructureSearch(sub_srch_ptn)
reader = Chem.MoleculeReader(input_file)
writer = Chem.MolecularGraphWriter(output_file)
mol = Chem.BasicMolecule()
i = 1
try:
while reader.read(mol):
mol_id = Chem.getName(mol).strip() or f'Molecule_{i}'
Chem.initSubstructureSearchTarget(mol, False)
if substr_srch.mappingExists(mol):
if not quiet:
print(f' -> substructure found, forwarding molecule {mol_id} to output file')
writer.write(mol)
elif not quiet:
print(f' -> substructure not found in molecule {mol_id}')
i += 1
except Exception as e:
print(f'Error: {str(e)}')
writer.close()
###########################################################################
###########################################################################
###########################################################################
################################ Example usage:
filterMoleculesBySmarts("input.sdf", "output.sdf", "[#6]1:[#6]:[#6]:[#6]:[#6]:[#6]:1")