Molecular Input/Output
=======================

Reading molecules from an input file
------------------------------------


A frequent task in cheminformatics is to read molecular data from files in different formats.
For this purpose, the CDPL provides data readers for various file formats including *SDF*, *SMILES*, *MOL*, *MOL2*, *CDF*, and others.

.. note::
    The CDF format is a binary format and CDPKit's native file format that can be used to store molecule, reaction, and pharmacophore data in a space and processing time efficient manner.

The class `Chem.MoleculeReader <../cdpl_api_doc/python_api_doc/classCDPL_1_1Chem_1_1MoleculeReader.html>`_ is rather versatile and can read molecules stored in
any of the supported formats (SDF, SMILES, MOL, MOL2, CDF, etc.) thus eliminating the need to instantiate a format-specific reader implementation.
Here is a simple example of how to read molecules from an input file:

.. note::
    The module `CDPL.Chem <../cdpl_api_doc/python_api_doc/namespaceCDPL_1_1Chem.html>`_ provides functionality for the processing of chemical data.

.. code-block:: python

   import CDPL.Chem as Chem

    def processMoleculesFromFile(input_file: str) -> None:
        """
        Reads molecules from the provided input file and outputs the number of atoms and bonds for each molecule.

        Parameters:
        - input_file (str): Path to the input file.
        """
        # Create reader for input molecules (format specified by file extension)
        reader = Chem.MoleculeReader(input_file)

        # create an instance of the default implementation of the Chem.Molecule interface
        mol = Chem.BasicMolecule()
        
        # Iterate over each molecule and print atom and bond count
        try:
            while reader.read(mol): 
                try:
                    print('Molecule with', mol.numAtoms, 'atoms and', mol.numBonds, 'bonds')
                except Exception as e:
                    sys.exit('Error: processing of molecule failed: ' + str(e))
                    
        except Exception as e: # handle exception raised in case of severe read errors
            sys.exit('Error: reading molecule failed: ' + str(e))

            
    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:
    
    processMoleculesFromFile("path_to_mol_file.mol2")

Retrieving structure data from MDL SD-files
-------------------------------------------

Analyzing the data associated with a molecule in an SD-file can be done as shown 
in the following example:

.. note::
    You can either use `Chem.MoleculeReader <../cdpl_api_doc/python_api_doc/classCDPL_1_1Chem_1_1MoleculeReader.html>`_
    or use the format-specific reader `Chem.FileSDFMoleculeReader <../cdpl_api_doc/python_api_doc/classCDPL_1_1Chem_1_1FileSDFMoleculeReader.html>`_
    molecules from an SD-file. Usually, the more convenient and flexible way to read molecule data is to use Chem.MoleculeReader.

.. code-block:: python

   import CDPL.Chem as Chem

   def processStructureDataFromSD(input_sd_file: str) -> None:
   """
    Retrieves the structure data of each molecule in the provided SD file and outputs it to the console.

    Parameters:
    - input_sd_file (str): Path to the input SD file.
    """
        # Create reader for MDL SD-files
        reader = Chem.FileSDFMoleculeReader(input_sd_file)

        # create an instance of the default implementation of the Chem.Molecule interface
        mol = Chem.BasicMolecule()

        # Iterate over each molecule in the file and retrieve structure data
        try:
            while reader.read(mol): 
                try:
                    if not Chem.hasStructureData(mol):
                        print('Error: no structure data available for molecule', Chem.getName(mol))
                        continue
                    
                    struct_data = Chem.getStructureData(mol) # retrieve structure data

                    print('Structure data (%s entries) of molecule \'%s\':\n' % (str(len(struct_data)), Chem.getName(mol)))
                    
                    for entry in struct_data: # iterate of structure data entries consisting of a header line and the actual data
                        print('Header:', entry.header)
                        print('Data:', entry.data)
                except Exception as e:
                    sys.exit('Error: processing of molecule failed: ' + str(e))
                    
        except Exception as e: # handle exception raised in case of severe read errors
            sys.exit('Error: reading molecule failed: ' + str(e))

            
    ##########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:
    
    processStructureDataFromSD("path_to_sd_file.sdf")

Writing molecules to an output file
-----------------------------------

Once you've processed or analyzed your molecules, you may want to save them to an output file. 
In the constructor of the class `Chem.MolecularGraphWriter <../cdpl_api_doc/python_api_doc/classCDPL_1_1Chem_1_1MolecularGraphWriter.html>`_ the format of the output
file is determined by its file extension, making it easy to save molecules in different formats such as SDF, MOL, MOL2, CDF, and others without having to write
format-specific code.

.. note::
   Ensure that the file extension you provide matches the standard desired output format. For instance, use `.sdf` for MDL Structure-Data Files, `.mol` for MDL Molfiles, and so on.

Here's a simple example of how a list of molecules can be written to an output file:

.. code-block:: python

   import CDPL.Chem as Chem

   def molsToFiles(mols: list[Chem.BasicMolecule], output_file: str) -> None:
       """
       Writes a list of molecules to the specified output file.

       Parameters:
       - mols (list): List of CDPKit molecules to write to the output file.
       - output_file (str): Path to the output file.
       """
       # Create a writer for the output molecules (format specified by file extension)
       writer = Chem.MolecularGraphWriter(output_file)

       for mol in mols:
           writer.write(mol)

           
    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:

    mols = [list of BasicMolecules]  # Example list of BasicMolecules
    molsToFiles(mols, "path_to_output_file.sdf")


Description of structural atom environments as SMILES strings
-------------------------------------------------------------

For the extraction of the local chemical environment of an atom in a molecular graph the CDPL provides the utility function
`Chem.getEnvironment <../cdpl_api_doc/python_api_doc/namespaceCDPL_1_1Chem.html#a342acefab39f6928df4c67c2d86c209a>`_.
Fur further processing extracted environments can, e.g., be output as SMILES strings as shown in the following example.

.. code-block:: python

    import CDPL.Chem as Chem

    def printAtomEnv(mols: list[Chem.BasicMolecule]) -> None:
        """
        Extracts the atom environments of each atom in the provided list of molecules and outputs them as SMILES strings.

        Parameters:
        - mols (list): List of CDPKit molecules to process.
        """

        for mol in mols:
            Chem.calcImplicitHydrogenCounts(mol, False)  # calculate implicit hydrogen counts and set corresponding property for all atoms
            Chem.perceiveHybridizationStates(mol, False) # perceive atom hybridization states and set corresponding property for all atoms
            Chem.perceiveSSSR(mol, False)                # perceive smallest set of smallest rings and store as Chem.MolecularGraph property
            Chem.setRingFlags(mol, False)                # perceive cycles and set corresponding atom and bond properties
            Chem.setAromaticityFlags(mol, False)         # perceive aromaticity and set corresponding atom and bond properties

            frag = Chem.Fragment()                       # for storing extracted atom environments

            print('- Atom environments (radius = 3 bonds)')

            for atom in mol.atoms:
                Chem.getEnvironment(atom, mol, 3, frag)     # extract environment of atom reaching out up to three bonds
                Chem.perceiveComponents(frag, False)        # perceive molecular graph components (required for SMILES generation)

                smiles = Chem.generateSMILES(frag, False, False) # generate non-canonical SMILES string with explicit hydrogen atoms

                print('Atom #%s: %s' % (str(mol.getAtomIndex(atom)), smiles))

                
    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:
    
    mols = [list of BasicMolecules] # Example list of BasicMolecules
    printAtomEnv(mols)

ChEMBL molecule standardization and parent structure extraction
---------------------------------------------------------------

The CDPL provides a convenient way to standardize molecules using its implementaion of the ChEMBL standardization pipeline :cite:`Bento2020`. 
This process ensures that molecules will be represented in a consistent and standardized manner when they enter downstream processing steps.

.. code-block:: python

    import CDPL.Chem as Chem

    def standardize(chembl_proc: Chem.ChEMBLStandardizer, in_mol: Chem.Molecule, out_mol: Chem.Molecule, proc_excluded: bool, extract_parent: bool) -> Chem.ChEMBLStandardizer.ChangeFlags:
        """
        Performs ChEMBL molecule standardization and parent structure extraction (optional) for a given input molecule using a provided Chem.ChEMBLStandardizer instance.
        
        Parameters:
        - chembl_proc (Chem.ChEMBLStandardizer): Instance of the Chem.ChEMBLStandardizer class.
        - in_mol (Chem.Molecule): Input molecule to standardize.
        - out_mol (Chem.Molecule): Output molecule to store the standardized molecule.
        - proc_excluded (bool): If True, molecules flagged as excluded will be processed.
        - extract_parent (bool): If True, the parent structure will be extracted.

        Returns:
        - Chem.ChEMBLStandardizer.ChangeFlags: Flags indicating the carried out modifications.
        """
        # here, the standardization is carried out on a copy of the read input molecule
        # (if only one molecule instance gets provided as argument, modifications will be made in-place)
        change_flags = chembl_proc.standardize(in_mol, out_mol, proc_excluded)

        if extract_parent: # perform parent structure extraction (optional)
            change_flags &= ~Chem.ChEMBLStandardizer.EXCLUDED  # clear excluded flag possibly set by the standardization
                                                           # procedure (might change after salt stripping)
            change_flags |= chembl_proc.getParent(out_mol)     # extract parent structure (in-place) and add information
                                                           # about the carried out modifcations
        return change_flags

    def getListOfChangesString(change_flags: Chem.ChEMBLStandardizer.ChangeFlags, verbose: bool = False) -> str:
        """
        Returns a string listing the carried out modifications.

        Parameters:
        - change_flags (Chem.ChEMBLStandardizer.ChangeFlags): Flags indicating the carried out modifications.
        - verbose (bool): If True, the string will contain a detailed list of the carried out modifications.

        Returns:
        - str: String listing the carried out modifications.
        """
        if not verbose:
            return None

        changes = '   Carried out modifications:'

        # List of possible changes
        change_list = [
            (Chem.ChEMBLStandardizer.EXPLICIT_HYDROGENS_REMOVED, 'Explicit hydrogens removed'),
            (Chem.ChEMBLStandardizer.UNKNOWN_STEREO_STANDARDIZED, 'Undefined stereocenter information standardized'),
            (Chem.ChEMBLStandardizer.BONDS_KEKULIZED, 'Kekule structure generated'),
            (Chem.ChEMBLStandardizer.STRUCTURE_NORMALIZED, 'Functional groups normalized'),
            (Chem.ChEMBLStandardizer.CHARGES_REMOVED, 'Number of charged atoms reduced'),
            (Chem.ChEMBLStandardizer.TARTRATE_STEREO_CLEARED, 'Configuration of chiral tartrate atoms set to undefined'),
            (Chem.ChEMBLStandardizer.STRUCTURE_2D_CORRECTED, '2D structure corrected'),
            (Chem.ChEMBLStandardizer.ISOTOPE_INFO_CLEARED, 'Isotope information cleared'),
            (Chem.ChEMBLStandardizer.SALT_COMPONENTS_REMOVED, 'Salt components removed'),
            (Chem.ChEMBLStandardizer.SOLVENT_COMPONENTS_REMOVED, 'Solvent components removed'),
            (Chem.ChEMBLStandardizer.DUPLICATE_COMPONENTS_REMOVED, 'Duplicate components removed')
        ]

        for flag, description in change_list:
            if change_flags & flag:
                changes += '\n    * ' + description

        return changes

    def getLogMessage(change_flags: Chem.ChEMBLStandardizer.ChangeFlags, proc_excluded: bool, extract_parent: bool, mol_id: str, verbose: bool = False) -> str:
        """
        Returns a log message describing the carried out modifications.

        Parameters:
        - change_flags (Chem.ChEMBLStandardizer.ChangeFlags): Flags indicating the carried out modifications.
        - proc_excluded (bool): If True, molecules flagged as excluded will be processed.
        - extract_parent (bool): If True, the parent structure will be extracted.
        - mol_id (str): Identifier of the molecule.

        Returns:
        - str: Log message describing the carried out modifications.
        """
        if (change_flags & Chem.ChEMBLStandardizer.EXCLUDED) and proc_excluded:
            return f'Molecule {mol_id}: discarded (flagged as excluded)'

        if not proc_excluded and (change_flags & Chem.ChEMBLStandardizer.EXCLUDED):
            return f'Molecule {mol_id}: forwarded unchanged (flagged as excluded)'

        if change_flags:
            return f'Molecule {mol_id}: modified\n{getListOfChangesString(change_flags, verbose)}'

        return f'Molecule {mol_id}: forwarded unchanged'

        
    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:
    mols = [list of BasicMolecules]  # Example list of BasicMolecules

    chembl_proc = Chem.ChEMBLStandardizer()

    for mol in mols:
        in_mol = mol
        out_mol = Chem.BasicMolecule()
        change_flags = standardize(chembl_proc, in_mol, out_mol, proc_excluded=True, extract_parent=True)
        mol_id = Chem.getName(in_mol).strip() or f'Molecule_{mols.index(mol)}'
        log_msg = getLogMessage(change_flags, proc_excluded=True, extract_parent=True, mol_id=mol_id, verbose=True)
        print(log_msg)


Standardization/prediction of protonation states
------------------------------------------------

In the realm of computational chemistry and molecular modeling, understanding and predicting the behavior of molecules
often hinges on the finer details. One such critical detail is the protonation state of the functional groups in a molecule. 
Protonation states dictate formal charges of atoms and have an impact on molecular 3D structure and reactivity. Knowing the
pre-dominant protonation states of functional groups under certain condition is therefore vital for methods predicting
drug binding, enzymatic reactions, etc.  

The class `Chem.ProtonationStateStandardizer <../cdpl_api_doc/python_api_doc/classCDPL_1_1Chem_1_1ProtonationStateStandardizer.html>`_
implements several protonation state generation algorithms.
The following code snippet shows how to generate protonation/formal charge states of acidic/basic functional groups that are likely
to occur under physiological conditions.

.. code-block:: python

    import CDPL.Chem as Chem

    mols = [list of BasicMolecules]  # Example list of BasicMolecules 

    # create and initialize an instance of the class Chem.ProtonationStateStandardizer which
    # implements the protonation state generation algorithm
    prot_state_gen = Chem.ProtonationStateStandardizer()
    
    # process molecules one after the other
    for mol in mols:
        # compose a simple molecule identifier
        mol_id = Chem.getName(mol).strip() 

        if mol_id == '':
            mol_id = '#' + str(i) # fallback if name is empty
        else:
            mol_id = '\'%s\' (#%s)' % (mol_id, str(i))

        try:
            # protonate/deprotonate functional groups for phys. conditions
            prot_state_gen.standardize(mol, Chem.ProtonationStateStandardizer.PHYSIOLOGICAL_CONDITION_STATE)

            # enforce an update of the molecule components list (structure might have changed)
            Chem.perceiveComponents(mol, True)
                
        except Exception as e:
            sys.exit('Error: processing or output of molecule %s failed: %s' % (mol_id, str(e)))

    writer.close()


3D Structure and Conformer Ensemble Generation
==============================================

In chemistry, conformational isomerism is a form of stereoisomerism in which the isomers can be interconverted just by rotations about formally single
bonds. While any two arrangements of atoms in a molecule that differ by rotation about single bonds can be referred to as different conformations,
conformations that correspond to local minima on the potential energy surface are specifically called conformational isomers or conformers.

In the realm of cheminformatics and molecular modeling many methods and algorithms require 3D structures or even whole
conformer ensembles as input. 
For a seamless integration with such methods CDPKit provides functionality in package `CDPL.ConfGen <../cdpl_api_doc/python_api_doc/namespaceCDPL_1_1ConfGen.html>`_ that allows 
to generate both single low-energy 3D structures and diverse conformer ensembles solely from connection table information.
The high-quality conformer ensemble generation tool included in CDPKit is called *CONFORGE*. For details on its implementation and performance
see :cite:`doi:10.1021/acs.jcim.3c00563`.

Generating single low-energy 3D structures
------------------------------------------

Here's how you can generate a single low-energy 3D structure for a molecule:

.. code-block:: python

   import CDPL.Chem as Chem
   import CDPL.ConfGen as ConfGen

    def generate3dConformation(mol: Chem.Molecule, struct_gen: ConfGen.StructureGenerator) -> int:
        """
        Generates a low-energy 3D structure of the argument molecule using the provided initialized ConfGen.StructureGenerator instance.

        Parameters:
        - mol (Chem.Molecule): Molecule to generate a 3D structure for.
        - struct_gen (ConfGen.StructureGenerator): Instance of the ConfGen.StructureGenerator class.

        Returns:
        - int: Status code indicating the success of the 3D structure generation.
        """
        # prepare the molecule for 3D structure generation
        ConfGen.prepareForConformerGeneration(mol) 

        # generate the 3D structure
        status = struct_gen.generate(mol)             

        # if sucessful, store the generated conformer ensemble as
        # per atom 3D coordinates arrays (= the way conformers are represented in CDPKit)
        if status == ConfGen.ReturnCode.SUCCESS:
            struct_gen.setCoordinates(mol)                
            
        # return status code
        return status

        
    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:

    max_time = 3600 # Max. allowed molecule processing time in seconds (default: 3600 sec)

    mols = [list of BasicMolecules]  # Example list of BasicMolecules

    # create writer for the generated 3D structures (format specified by file extension)
    writer = Chem.MolecularGraphWriter("path_to_output_file.sdf") 

    # export only a single 3D structure (in case of multi-conf. input molecules)
    Chem.setMultiConfExportParameter(writer, False)
    
    # create and initialize an instance of the class ConfGen.StructureGenerator which will
    # perform the actual 3D structure generation work
    struct_gen = ConfGen.StructureGenerator()

    struct_gen.settings.timeout = max_time * 1000 # apply the -t argument

    # dictionary mapping status codes to human readable strings
    status_to_str = { ConfGen.ReturnCode.UNINITIALIZED                  : 'uninitialized',
                      ConfGen.ReturnCode.TIMEOUT                        : 'max. processing time exceeded',
                      ConfGen.ReturnCode.ABORTED                        : 'aborted',
                      ConfGen.ReturnCode.FORCEFIELD_SETUP_FAILED        : 'force field setup failed',
                      ConfGen.ReturnCode.FORCEFIELD_MINIMIZATION_FAILED : 'force field structure refinement failed',
                      ConfGen.ReturnCode.FRAGMENT_LIBRARY_NOT_SET       : 'fragment library not available',
                      ConfGen.ReturnCode.FRAGMENT_CONF_GEN_FAILED       : 'fragment conformer generation failed',
                      ConfGen.ReturnCode.FRAGMENT_CONF_GEN_TIMEOUT      : 'fragment conformer generation timeout',
                      ConfGen.ReturnCode.FRAGMENT_ALREADY_PROCESSED     : 'fragment already processed',
                      ConfGen.ReturnCode.TORSION_DRIVING_FAILED         : 'torsion driving failed',
                      ConfGen.ReturnCode.CONF_GEN_FAILED                : 'conformer generation failed' }
    
    # process molecules one after the other 
    for mol in mols:
        # compose a simple molecule identifier
        mol_id = Chem.getName(mol).strip() 

        if mol_id == '':
            mol_id = '#' + str(i) # fallback if name is empty
        else:
            mol_id = '\'%s\' (#%s)' % (mol_id, str(i))

        try:
            # generate 3D structure of the read molecule
            status = generate3dConformation(mol, struct_gen) 

            # check for severe error reported by status code
            if status == ConfGen.ReturnCode.SUCCESS:
                # enforce the output of 3D coordinates in case of MDL file formats
                Chem.setMDLDimensionality(mol, 3)

                # output the generated 3D structure                    
                if not writer.write(mol):   
                    sys.exit('Error: writing 3D structure of molecule %s failed' % mol_id)
                        
        except Exception as e:
            sys.exit('Error: 3D structure generation or output for molecule %s failed: %s' % (mol_id, str(e)))

    writer.close()

Generating conformer ensembles
------------------------------

For some applications it is necessary to generate multiple conformations for a molecule, e.g. to assess its flexibility or investigate binding 
capabilities towards a particular target receptor.

The function ``generate_conformation_ensembles()`` generates a conformer ensemble for a given molecule using an
instance of class `ConfGen.ConformerGenerator <../cdpl_api_doc/python_api_doc/classCDPL_1_1ConfGen_1_1ConformerGenerator.html>`_.
The parameters *min_rmsd*, *e_window*, and *max_confs* are used to control the generated conformer ensemble's diversity, energy and size.
They can be initialized at the beginning of the example script with the desired values. 

.. code-block:: python

   import CDPL.Chem as Chem
   import CDPL.ConfGen as ConfGen

    def generateConformationEnsembles(mol: Chem.BasicMolecule, conf_gen: ConfGen.ConformerGenerator) -> (int, int):
        """
        Generates a conformation ensemble for the argument molecule using the provided initialized ConfGen.ConformerGenerator instance.
        
        Parameters:
        - mol (Chem.BasicMolecule): Molecule to generate a conformation ensemble for.
        - conf_gen (ConfGen.ConformerGenerator): Instance of the ConfGen.ConformerGenerator class.

        Returns:
        - int: Status code indicating the success of the conformation ensemble generation.
        - int: Number of generated conformers.
        """
        # prepare the molecule for conformer generation
        ConfGen.prepareForConformerGeneration(mol) 

        # generate the conformer ensemble
        status = conf_gen.generate(mol)             
        num_confs = conf_gen.getNumConformers()
        
        # if sucessful, store the generated conformer ensemble as
        # per atom 3D coordinates arrays (= the way conformers are represented in CDPKit)
        if status == ConfGen.ReturnCode.SUCCESS or status == ConfGen.ReturnCode.TOO_MUCH_SYMMETRY:
            conf_gen.setConformers(mol)                
        else:
            num_confs = 0
            
        # return status code and the number of generated conformers
        return (status, num_confs)


    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:

    # Settings
    max_time = 3600 # Max. allowed molecule processing time in seconds (default: 3600 sec)
    min_rmsd = 0.5 # Output conformer RMSD threshold (default: 0.5)
    e_window = 20.0 # Output conformer energy window (default: 20.0)
    max_confs = 100 # Max. output ensemble size (default: 100)

    mols = [list of BasicMolecules]  # Example list of BasicMolecules

    # create writer for the generated conformer ensembles (format specified by file extension)
    writer = Chem.MolecularGraphWriter("path_to_output_file.sdf") 

    # create and initialize an instance of the class ConfGen.ConformerGenerator which
    # will perform the actual conformer ensemble generation work
    conf_gen = ConfGen.ConformerGenerator()

    conf_gen.settings.timeout = max_time * 1000          # apply the -t argument
    conf_gen.settings.minRMSD = min_rmsd                 # apply the -r argument
    conf_gen.settings.energyWindow = e_window            # apply the -e argument
    conf_gen.settings.maxNumOutputConformers = max_confs # apply the -n argument

    # dictionary mapping status codes to human readable strings
    status_to_str = { ConfGen.ReturnCode.UNINITIALIZED                  : 'uninitialized',
                      ConfGen.ReturnCode.TIMEOUT                        : 'max. processing time exceeded',
                      ConfGen.ReturnCode.ABORTED                        : 'aborted',
                      ConfGen.ReturnCode.FORCEFIELD_SETUP_FAILED        : 'force field setup failed',
                      ConfGen.ReturnCode.FORCEFIELD_MINIMIZATION_FAILED : 'force field structure refinement failed',
                      ConfGen.ReturnCode.FRAGMENT_LIBRARY_NOT_SET       : 'fragment library not available',
                      ConfGen.ReturnCode.FRAGMENT_CONF_GEN_FAILED       : 'fragment conformer generation failed',
                      ConfGen.ReturnCode.FRAGMENT_CONF_GEN_TIMEOUT      : 'fragment conformer generation timeout',
                      ConfGen.ReturnCode.FRAGMENT_ALREADY_PROCESSED     : 'fragment already processed',
                      ConfGen.ReturnCode.TORSION_DRIVING_FAILED         : 'torsion driving failed',
                      ConfGen.ReturnCode.CONF_GEN_FAILED                : 'conformer generation failed' }

   
    # process molecules one after the other
    for mol in mols:
        # compose a simple molecule identifier
        mol_id = Chem.getName(mol).strip() 

        if mol_id == '':
            mol_id = '#' + str(i) # fallback if name is empty
        else:
            mol_id = '\'%s\' (#%s)' % (mol_id, str(i))

        try:
            # generate conformer ensemble for read molecule
            status, num_confs = generateConformationEnsembles(mol, conf_gen) 

            # output generated ensemble (if available)
            if num_confs > 0:
                if not writer.write(mol):   
                    sys.exit('Error: output of conformer ensemble for molecule %s failed' % mol_id)
                        
        except Exception as e:
            sys.exit('Error: conformer ensemble generation or output for molecule %s failed: %s' % (mol_id, str(e)))

    writer.close()


Pharmacophore Generation and Processing
=======================================

This section is about the generation and processing of pharmacophore models.
The pharmacophore concept is widely used in drug design and cheminformatics and represents a versatile tool to understand and
describe interactions between ligands and their biological targets.

.. note::
    Functionality for the generation and processing of pharmacophore models resides in package
    `CDPL.Pharm <../cdpl_api_doc/python_api_doc/namespaceCDPL_1_1Pharm.html>`_. 
    Currently available pharmacophore input/output formats are `LigandScout's <https://www.inteligand.com/ligandscout>`_ *PML* and
    CDPKit's native *CDF* format.

Pharmacophore generation for single-conformer molecules
-------------------------------------------------------

The following example script generates a pharmacophore model for each input molecule
and outputs the pharmacophore data in PML format.

.. code-block:: python

    import CDPL.Chem as Chem
    import CDPL.Pharm as Pharm

    def generatePharmacophore(mol: Chem.Molecule) -> Pharm.Pharmacophore:
        """
        Generates the pharmacophore of the molecule.
        
        Parameters:
        - mol (Chem.Molecule): Molecule to generate a pharmacophore for.

        Returns:
        - Pharm.Pharmacophore: Pharmacophore of the argument molecule.
        """
        Pharm.prepareForPharmacophoreGeneration(mol)    # first call utility function preparing the molecule for pharmacophore generation
            
        ph4_gen = Pharm.DefaultPharmacophoreGenerator() # create an instance of the pharmacophore generator default implementation
        ph4 = Pharm.BasicPharmacophore()                # create an instance of the default implementation of the Pharm.Pharmacophore interface
        ph4_name = Chem.getName(mol)                    # use the name of the input molecule as pharmacophore name
        
        ph4_gen.generate(mol, ph4)          # generate the pharmacophore
        Pharm.setName(ph4, ph4_name)        # set the pharmacophore name

        return ph4

        
    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:
    mols = [list of BasicMolecules]  # Example list of BasicMolecules

    # create writer for the generated pharmacophores (format specified by file extension)
    writer = Pharm.FeatureContainerWriter("path_to_output_file.pml")
 
    for mol in mols:
        # compose a simple molecule identifier
        mol_id = Chem.getName(mol).strip() 

        if mol_id == '':
            mol_id = '#' + str(i) # fallback if name is empty
        else:
            mol_id = '\'%s\' (#%s)' % (mol_id, str(i))

        try:
            ph4 = generatePharmacophore(mol)         # generate pharmacophore

            if not writer.write(ph4):   # output pharmacophore
                sys.exit('Error: writing generated pharmacophore %s failed' % mol_id)
                        
        except Exception as e:
           sys.exit('Error: pharmacophore generation or output for molecule %s failed: %s' % (mol_id, str(e)))

    writer.close()

Pharmacophore generation for multi-conformer molecules
------------------------------------------------------

In the following example script a pharmacophore model for each conformer of the input molecule will be generated.
The name of the generated pharmacophore is set to the name of the corresponding molecule plus a suffix specifying the
conformer index.

.. code-block:: python

    import CDPL.Chem as Chem
    import CDPL.Pharm as Pharm

    def generatePharmacophore(mol: Chem.Molecule, conf_idx: int) -> Pharm.Pharmacophore:
        """
        Generates the pharmacophore of the molecule using atom coordinates of the specified conformation.
        
        Parameters:
        - mol (Chem.Molecule): Molecule to generate a pharmacophore for.
        - conf_idx (int): Index of the conformation to use for the pharmacophore generation.

        Returns:
        - Pharm.Pharmacophore: Pharmacophore of the argument molecule using coordinates of the specifies conformer.
        """
        if conf_idx < 1:                                    # for a new molecule
            Pharm.prepareForPharmacophoreGeneration(mol)    # first call utility function preparing the molecule for pharmacophore generation
            
        ph4_gen = Pharm.DefaultPharmacophoreGenerator()     # create an instance of the pharmacophore generator default implementation
        ph4 = Pharm.BasicPharmacophore()                    # create an instance of the default implementation of the Pharm.Pharmacophore interface
        ph4_name = Chem.getName(mol)                        # use the name of the input molecule as pharmacophore name

        # use atom 3D coordinates of the specified conf.
        ph4_gen.setAtom3DCoordinatesFunction(Chem.AtomConformer3DCoordinatesFunctor(conf_idx)) 

        # append conformer index to the pharmacophore name
        ph4_name += '#' + str(conf_idx)
            
        ph4_gen.generate(mol, ph4)          # generate the pharmacophore
        Pharm.setName(ph4, ph4_name)        # set the pharmacophore name

        return ph4

        
    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:
    mols = [list of BasicMolecules]  # Example list of BasicMolecules

    # create writer for the generated pharmacophores (format specified by file extension)
    writer = Pharm.FeatureContainerWriter("path_to_output_file.pml")

    for mol in mols:
       # compose a simple molecule identifier
       mol_id = Chem.getName(mol).strip() 

       if mol_id == '':
           mol_id = '#' + str(i) # fallback if name is empty
       else:
           mol_id = '\'%s\' (#%s)' % (mol_id, str(i))

       num_confs = Chem.getNumConformations(mol)

       try:
           for conf_idx in range(num_confs):               # for each conformer
               ph4 = generatePharmacophore(mol, conf_idx)  # generate pharmacophore

               if not writer.write(ph4):   # output pharmacophore
                   sys.exit('Error: writing generated pharmacophore %s failed' % mol_id)
                        
       except Exception as e:
           sys.exit('Error: pharmacophore generation or output for molecule %s failed: %s' % (mol_id, str(e)))

    writer.close()
    
Generating ligand-receptor interaction pharmacophores
-----------------------------------------------------

The following example shows how to generate 3D pharmacophore models that describe observed interactions between
a molecule and proximal binding site residues.
The script also demonstrates how to read and preprocess biological macromolecules.

.. note::
    The receptor structure can be provided in the formats \*.mol2, \*.pdb, or \*.mmtf

.. code-block:: python

    import CDPL.Chem as Chem
    import CDPL.Pharm as Pharm

    def processReceptorStructure(path: str, strip_res_list: bool) -> Chem.Molecule:
        """
        Reads and preprocesses the specified receptor structure.

        Parameters:
        - path (str): Path to the receptor structure file.
        - strip_res_list (bool): Whitespace separated list of PDB three-letter codes specifying residues to remove from the receptor structure (e.g. an existing ligand).

        Returns:
        - Chem.Molecule: Receptor structure.

        """

        # create reader for receptor structure (format specified by file extension)
        reader = Chem.MoleculeReader("path_to_receptor_structure_file.pdb")") 
        
        sup_fmts = [ Chem.DataFormat.MOL2,
                    Biomol.DataFormat.PDB,
                    Biomol.DataFormat.MMTF ]

        # check if the format is supported by this script 
        if reader.getDataFormat() not in sup_fmts:   
            sys.exit('Error: receptor input file format \'%s\' not supported' % name_and_ext[1])

        rec_mol = Chem.BasicMolecule()    # create an instance of the default implementation of the
                                          # Chem.Molecule interface that will store the receptor struct.
        try:
            if not reader.read(rec_mol):  # read receptor structure
                sys.exit('Error: reading receptor structure failed')

        except Exception as e:
            sys.exit('Error: reading receptor structure failed:\n' + str(e))            

        # preprocess the receptor structure (removal of residues and
        # calculation of properties required by the pharm. generation procedure)
        try:
            # if structure comes from an MOL2 file, convert MOL2 residue data into PDB-style data
            if reader.getDataFormat() == Chem.DataFormat.MOL2: 
                Biomol.convertMOL2ToPDBResidueInfo(rec_mol, True)

            rem_atoms = False

            # delete atoms belonging to residues that should be stripped
            if strip_res_list:            
                atoms_to_rem = Chem.Fragment() # will store the atoms to delete
                res_to_strip = { tlc.upper() for tlc in strip_res_list }
            
                for atom in rec_mol.atoms:     # identify and note atoms belonging to the stripped residues
                    if Biomol.getResidueCode(atom).upper() in res_to_strip:
                        atoms_to_rem.addAtom(atom)

                if atoms_to_rem.numAtoms > 0:
                    rec_mol -= atoms_to_rem    # delete atoms from the receptor structure
                    rem_atoms = True

            # prepares the receptor structure for pharmacophore generation
            Chem.perceiveSSSR(rec_mol, rem_atoms)
            Chem.setRingFlags(rec_mol, rem_atoms)
            Chem.calcImplicitHydrogenCounts(rec_mol, rem_atoms)
            Chem.perceiveHybridizationStates(rec_mol, rem_atoms)
            Chem.setAromaticityFlags(rec_mol, rem_atoms)

            if Chem.makeHydrogenComplete(rec_mol):                    # make implicit hydrogens (if any) explicit
                Chem.calcHydrogen3DCoordinates(rec_mol)               # calculate 3D coordinates for the added expl. hydrogens
                Biomol.setHydrogenResidueSequenceInfo(rec_mol, False) # set residue information for the added expl. hydrogens

            MolProp.calcAtomHydrophobicities(rec_mol, False)          # calculate atom hydrophobicity values (needed for hydrophobic
                                                                    # pharm. feature generation)
        except Exception as e:
            sys.exit('Error: processing of receptor structure failed: ' + str(e))            

        return rec_mol

        
    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:
    
    # Settings
    strip_res_list = False # Whitespace separated list of PDB three-letter codes specifying residues to remove from the receptor structure (e.g. an existing ligand)
    gen_x_vols = False     # Generate exclusion volume spheres on pharm. feature atoms of interacting residues
    
    
    lig_mols = [list of BasicMolecules]  # Example list of ligand molecules

    rec_mol = processReceptorStructure(path_to_receptor_structure_file, strip_res_list)  # read and preprocess the receptor structure
    ph4_writer = Pharm.FeatureContainerWriter("path_to_pha_file.pml")                    # create writer for the generated pharmacophores
                                                                                         # (format specified by file extension)

    ia_ph4 = Pharm.BasicPharmacophore()     # create an instance of the default implementation of the Pharm.Pharmacophore
                                            # interface that will store the generated pharmacophores

    ph4_gen = Pharm.InteractionPharmacophoreGenerator() # create an instance of the pharmacophore generator

    ph4_gen.addExclusionVolumes(gen_x_vols) # specify whether to generate exclusion volume spheres 
                                            # on pharm. feature atoms of interacting residues

    # process ligand molecules one after the other 
    for lig_mol in lig_mols:
        mol_id = Chem.getName(lig_mol).strip() # compose a simple ligand identifier for messages

        if mol_id == '':
            mol_id = '#' + str(i)  # fallback if name is empty or not available
        else:
            mol_id = '\'%s\' (#%s)' % (mol_id, str(i))

        try:
            Pharm.prepareForPharmacophoreGeneration(lig_mol) # make ligand ready for pharm. generation

            ph4_gen.generate(lig_mol, rec_mol, ia_ph4, True) # generate the pharmacophore (True = extract ligand environment residues on-the-fly)

            try:
                if not ph4_writer.write(ia_ph4): # output pharmacophore
                    sys.exit('Error: writing interaction pharmacophore of molecule %s failed: %s' % (mol_id, str(e)))

            except Exception as e:               # handle exception raised in case of severe write errors
                sys.exit('Error: writing interaction pharmacophore of molecule %s failed: %s' % (mol_id, str(e)))
                
        except Exception as e:                   # handle exception raised in case of severe processing errors
            sys.exit('Error: interaction pharmacophore generation for molecule %s failed: %s' % (mol_id, str(e)))

    ph4_writer.close()

Alignment of molecules to a reference pharmacophore
---------------------------------------------------

The following example shows how to align a set of molecules to a reference pharmacophore.

The function ``readRefPharmacophore()`` reads the reference pharmacophore from a specified file.
The function ``genPharmacophore()`` generates and returns the pharmacophore of a molecule.
The function ``clearFeatureOrientations()`` removes feature orientation information and sets the feature geometry to ``Pharm.FeatureGeometry.SPHERE``.

The following variables control the alignment process and the reported results:

- *pos_only*: Controls whether only the position of features is considered during alignment
- *min_pose_rmsd*: Controls the minimum required RMSD between two consecutively output molecule alignment poses
- *num_out_almnts*: Is used to control the number of top-ranked alignment solutions to output per molecule (default: best alignment solution only)
- *exhaustive*: Specifies whether an exhaustive alignment search should be performed

.. code-block:: python

    import CDPL.Chem as Chem
    import CDPL.Pharm as Pharm

    def readRefPharmacophore(filename: str) -> Pharm.Pharmacophore:
        """
        Reads and returns the specified alignment reference pharmacophore.

        Parameters:
        - filename (str): Name of the file storing the reference pharmacophore.

        Returns:
        - Pharm.Pharmacophore: Reference pharmacophore.
        """
        # create pharmacophore reader instance
        reader = Pharm.PharmacophoreReader(filename)

        # create an instance of the default implementation of the Pharm.Pharmacophore interface
        ph4 = Pharm.BasicPharmacophore()

        try:
            if not reader.read(ph4): # read reference pharmacophore
                sys.exit('Error: reading reference pharmacophore failed')
                    
        except Exception as e: # handle exception raised in case of severe read errors
            sys.exit('Error: reading reference pharmacophore failed: ' + str(e))

        return ph4

    def generatePharmacophore(mol: Chem.Molecule) -> Pharm.Pharmacophore:
        """
        Generates the pharmacophore of the given molecule.

        Parameters:
        - mol (Chem.Molecule): Molecule to generate a pharmacophore for.

        Returns:
        - Pharm.Pharmacophore: Pharmacophore of the argument molecule.
        """

        Pharm.prepareForPharmacophoreGeneration(mol)       # call utility function preparing the molecule for pharmacophore generation
            
        ph4_gen = Pharm.DefaultPharmacophoreGenerator()    # create an instance of the pharmacophore generator default implementation
        ph4 = Pharm.BasicPharmacophore()                   # create an instance of the default implementation of the Pharm.Pharmacophore interface

        ph4_gen.generate(mol, ph4)                         # generate the pharmacophore

        return ph4

    def clearFeatureOrientations(ph4: Pharm.BasicPharmacophore) -> None:
        """
        Removes feature orientation informations and sets the feature geometry to Pharm.FeatureGeometry.SPHERE.
        
        Parameters:
        - ph4 (Pharm.BasicPharmacophore): Pharmacophore to edit.
        """
        for ftr in ph4:
            Pharm.clearOrientation(ftr)
            Pharm.setGeometry(ftr, Pharm.FeatureGeometry.SPHERE)

            
    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:

    # Settings
    pos_only = True     # True = only position of features is considered during alignment
    num_out_almnts = 1  # Number of top-ranked alignment solutions to output per molecule (default: best alignment solution only)
    min_pose_rmsd = 0.0 # Minimum required RMSD between two consecutively output molecule alignment poses
    exhaustive = False  # Perform an exhaustive alignment search (default: False)

    mols = [list of BasicMolecules]  # Example list of BasicMolecules

    # read the reference pharmacophore
    ref_ph4 = readRefPharmacophore("path_to_reference_pharmacophore.pml") 

    # create writer for aligned molecules (format specified by file extension)
    mol_writer = Chem.MolecularGraphWriter("path_to_output_file.sdf") 

    # create instance of class implementing the pharmacophore alignment algorithm
    almnt = Pharm.PharmacophoreAlignment(True) # True = aligned features have to be within the tolerance spheres of the ref. features

    if pos_only:                          # clear feature orientation information
        clearFeatureOrientations(ref_ph4)
    
    almnt.addFeatures(ref_ph4, True)               # set reference features (True = first set = reference)
    almnt.performExhaustiveSearch(exhaustive) # set minimum number of top. mapped feature pairs
    
    # create pharmacophore fit score calculator instance
    almnt_score = Pharm.PharmacophoreFitScore()
    
    # process molecules one after the other
    for mol in mols:
        # compose a simple molecule identifier
        mol_id = Chem.getName(mol).strip() 

        if mol_id == '':
            mol_id = '#' + str(i)  # fallback if name is empty
        else:
            mol_id = '\'%s\' (#%s)' % (mol_id, str(i))

        try:
            mol_ph4 = generatePharmacophore(mol)    # generate input molecule pharmacophore

            if pos_only:                            # clear feature orientation information
                clearFeatureOrientations(mol_ph4)

            almnt.clearEntities(False)         # clear features of previously aligned pharmacophore
            almnt.addFeatures(mol_ph4, False)  # specify features of the pharmacophore to align

            almnt_solutions = []               # stores the found alignment solutions
                
            while almnt.nextAlignment():                                     # iterate over all alignment solutions that can be found
                score = almnt_score(ref_ph4, mol_ph4, almnt.getTransform())  # calculate alignment score
                xform = Math.Matrix4D(almnt.getTransform())                  # make a copy of the alignment transformation (mol. ph4 -> ref. ph4) 

                almnt_solutions.append((score, xform))

            saved_coords = Math.Vector3DArray()      # create data structure for storing 3D coordinates

            Chem.get3DCoordinates(mol, saved_coords) # save the original atom coordinates

            struct_data = None

            if Chem.hasStructureData(mol):           # get existing structure data if available
                struct_data = Chem.getStructureData(mol)
            else:                                    # otherwise create and set new structure data
                struct_data = Chem.StringDataBlock()

                Chem.setStructureData(mol, strut)

            # add alignment score entry to struct. data
            struct_data.addEntry('<PharmFitScore>', '') 
                
            output_cnt = 0
            last_pose = None
                
            # order solutions by desc. alignment score
            almnt_solutions = sorted(almnt_solutions, key=lambda entry: entry[0], reverse=True)

            # output molecule alignment poses until the max. number of best output solutions has been reached
            for solution in almnt_solutions:
                if output_cnt == num_out_almnts:
                    break

                curr_pose = Math.Vector3DArray(saved_coords)

                Math.transform(curr_pose, solution[1])  # transform atom coordinates

                # check whether the current pose is 'different enough' from
                # the last pose to qualify for output
                if min_pose_rmsd > 0.0 and last_pose and Math.calcRMSD(last_pose, curr_pose) < min_pose_rmsd:
                    continue

                # apply the transformed atom coordinates
                Chem.set3DCoordinates(mol, curr_pose)  

                # store alignment score in the struct. data entry
                struct_data[len(struct_data) - 1].setData(format(solution[0], '.4f'))     
                    
                try:
                    if not mol_writer.write(mol): # output the alignment pose of the molecule
                        sys.exit('Error: writing alignment pose of molecule %s failed: %s' % (mol_id, str(e)))

                except Exception as e: # handle exception raised in case of severe write errors
                    sys.exit('Error: writing alignment pose of molecule %s failed: %s' % (mol_id, str(e)))

                last_pose = curr_pose
                output_cnt += 1

        except Exception as e:
            sys.exit('Error: pharmacophore alignment of molecule %s failed: %s' % (mol_id, str(e)))

    mol_writer.close()

Retrieving information about pharmacophores and features
--------------------------------------------------------

The script below demonstrates how to retrieve basic properties of pharmacophore features.
The function ``print_pharmacophore_properties()`` outputs all (available) properties of the features stored
in the given feature container. Built-in feature properties are feature type, geometry, tolerance, weight, and
hydrophobicity.

.. code-block:: python

    import CDPL.Chem as Chem
    import CDPL.Pharm as Pharm

    def print_pharmacophore_properties(ph4: Pharm.FeatureContainer) -> None: 
        """
        Outputs all (available) properties of the features stored in the given feature container.

        Parameters:
        - ph4 (Pharm.FeatureContainer): Feature container to process.
        """
        ftr_type_str = { Pharm.FeatureType.UNKNOWN               : 'UNKNOWN',
                         Pharm.FeatureType.HYDROPHOBIC           : 'HYDROPHOBIC',
                         Pharm.FeatureType.AROMATIC              : 'AROMATIC',
                         Pharm.FeatureType.NEGATIVE_IONIZABLE    : 'NEGATIVE_IONIZABLE',
                         Pharm.FeatureType.POSITIVE_IONIZABLE    : 'POSITIVE_IONIZABLE',
                         Pharm.FeatureType.H_BOND_DONOR          : 'H_BOND_DONOR',
                         Pharm.FeatureType.H_BOND_ACCEPTOR       : 'H_BOND_ACCEPTOR',
                         Pharm.FeatureType.HALOGEN_BOND_DONOR    : 'HALOGEN_BOND_DONOR',
                         Pharm.FeatureType.HALOGEN_BOND_ACCEPTOR : 'HALOGEN_BOND_ACCEPTOR',
                         Pharm.FeatureType.EXCLUSION_VOLUME      : 'EXCLUSION_VOLUME' }
    
        geom_str = { Pharm.FeatureGeometry.UNDEF   : 'UNDEF',
                     Pharm.FeatureGeometry.SPHERE  : 'SPHERE',
                     Pharm.FeatureGeometry.VECTOR  : 'VECTOR',
                     Pharm.FeatureGeometry.PLANE   : 'PLANE' }

        print('Composition of pharmacophore \'%s\':' % Pharm.getName(ph4))

        for i in range(0, len(ph4)):
            ftr = ph4[i]

            print(' - Feature #%s:' % str(i))
            print('  - Type: %s' % ftr_type_str[Pharm.getType(ftr)])
            print('  - Geometry: %s' % geom_str[Pharm.getGeometry(ftr)])
            print('  - Tolerance: %s' % Pharm.getTolerance(ftr))
            print('  - Weight: %s' % Pharm.getWeight(ftr))
            print('  - Optional: %s' % Pharm.getOptionalFlag(ftr))
            print('  - Disabled: %s' % Pharm.getDisabledFlag(ftr))
            print('  - Length: %s' % Pharm.getLength(ftr))
            print('  - Hydrophobicity: %s' % Pharm.getHydrophobicity(ftr))

            if Chem.has3DCoordinates(ftr):         # Pharm.Feature derives from Chem.Entity3D - therefore a function from the Chem package is used here!
                print('  - Position: %s' % Chem.get3DCoordinates(ftr))
    
            if Pharm.hasOrientation(ftr):
                print('  - Orientation: %s' % Pharm.getOrientation(ftr))


    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:

    # create reader for input pharmacophores (format specified by file extension)
    reader = Pharm.PharmacophoreReader("path_to_input_file.pml") 

    # create an instance of the default implementation of the Pharm.Pharmacophore interface
    ph4 = Pharm.BasicPharmacophore()

    # process pharmacophores one after the other until the end of input has been reached
    try:
        while reader.read(ph4):
            try:
                print_pharmacophore_properties(ph4)
            except Exception as e:
                sys.exit('Error: processing of pharmacophore failed: ' + str(e))
                
    except Exception as e: # handle exception raised in case of severe read errors
        sys.exit('Error: reading pharmacophore failed: ' + str(e))



Calculation of Atom Properties
==============================

The CDPL provides a wide panel of pre-defined atom and bond properties. All of these properties can be retrieved and set/calculated
by corresponding function calls as demonstrated in the following example scripts.

Calculation of atom classification properties
---------------------------------------------

Atomic properties provide basic chemical information about each atom of a molecule. Such properties are
e.g. chemical element, formal charge, implicit hydrogen counts, hybridization states, aromaticity, and so on.
The following code snippet shows how to calculate and retrieve properties that provide higher-order information about
the atoms in a molecular graph.

.. note::
   The following examples use functionality provided by the `CDPL.Chem <../cdpl_api_doc/python_api_doc/namespaceCDPL_1_1Chem.html>`_ and
   `CDPL.MolProp <../cdpl_api_doc/python_api_doc/namespaceCDPL_1_1MolProp.html>`_ package.

.. code-block:: python

    import sys
    import os

    import CDPL.Chem as Chem
    import CDPL.MolProp as MolProp

    def outputProperties(molgraph: Chem.MolecularGraph) -> None:
        """
        Outputs the corresponding properties of each atom of the provided molecular graph.

        Parameters:
        - molgraph (Chem.MolecularGraph): Molecular graph to process.
        """
        Chem.calcImplicitHydrogenCounts(molgraph, False)  # calculate implicit hydrogen counts and set corresponding property for all atoms
        Chem.perceiveHybridizationStates(molgraph, False) # perceive atom hybridization states and set corresponding property for all atoms
        Chem.perceiveSSSR(molgraph, False)                # perceive smallest set of smallest rings and store as Chem.MolecularGraph property
        Chem.setRingFlags(molgraph, False)                # perceive cycles and set corresponding atom and bond properties
        Chem.setAromaticityFlags(molgraph, False)         # perceive aromaticity and set corresponding atom and bond properties
        MolProp.perceiveHBondDonorAtomTypes(molgraph, False) # perceive H-bond donor atom types and set corresponding atom properties
        MolProp.perceiveHBondAcceptorAtomTypes(molgraph, False) # perceive H-bond acceptor atom types and set corresponding atom properties

        hba_type_str = { MolProp.HBondAcceptorAtomType.UNDEF                   : 'UNDEF',
                        MolProp.HBondAcceptorAtomType.NONE                    : 'NONE',
                        MolProp.HBondAcceptorAtomType.O_H2O                   : 'O_H2O',
                        MolProp.HBondAcceptorAtomType.O_UREA                  : 'O_UREA',
                        MolProp.HBondAcceptorAtomType.O_BARBITURIC_ACID       : 'O_BARBITURIC_ACID',
                        MolProp.HBondAcceptorAtomType.O_URIC_ACID             : 'O_URIC_ACID',
                        MolProp.HBondAcceptorAtomType.O_ETHER                 : 'O_ETHER',
                        MolProp.HBondAcceptorAtomType.O_AMIDE                 : 'O_AMIDE',
                        MolProp.HBondAcceptorAtomType.O_N_OXIDE               : 'O_N_OXIDE',
                        MolProp.HBondAcceptorAtomType.O_ACID                  : 'O_ACID',
                        MolProp.HBondAcceptorAtomType.O_ESTER                 : 'O_ESTER',
                        MolProp.HBondAcceptorAtomType.O_SULFOXIDE             : 'O_SULFOXIDE',
                        MolProp.HBondAcceptorAtomType.O_NITRO                 : 'O_NITRO',
                        MolProp.HBondAcceptorAtomType.O_SELEN_OXIDE           : 'O_SELEN_OXIDE',
                        MolProp.HBondAcceptorAtomType.O_ALDEHYD               : 'O_ALDEHYD',
                        MolProp.HBondAcceptorAtomType.O_KETONE                : 'O_KETONE',
                        MolProp.HBondAcceptorAtomType.O_ALCOHOL               : 'O_ALCOHOL',
                        MolProp.HBondAcceptorAtomType.N_NH3                   : 'N_NH3',
                        MolProp.HBondAcceptorAtomType.N_DIAMINE               : 'N_DIAMINE',
                        MolProp.HBondAcceptorAtomType.N_MONO_DI_NITRO_ANILINE : 'N_MONO_DI_NITRO_ANILINE',
                        MolProp.HBondAcceptorAtomType.N_TRI_NITRO_ANILINE     : 'N_TRI_NITRO_ANILINE',
                        MolProp.HBondAcceptorAtomType.N_HALOGENO_ANILINE      : 'N_HALOGENO_ANILINE',
                        MolProp.HBondAcceptorAtomType.N_ANILINE               : 'N_ANILINE',
                        MolProp.HBondAcceptorAtomType.N_NITRILE               : 'N_NITRILE',
                        MolProp.HBondAcceptorAtomType.N_AZOLE                 : 'N_AZOLE',
                        MolProp.HBondAcceptorAtomType.N_AMINE                 : 'N_AMINE',
                        MolProp.HBondAcceptorAtomType.N_AMIDINE               : 'N_AMIDINE',
                        MolProp.HBondAcceptorAtomType.N_AZO                   : 'N_AZO',
                        MolProp.HBondAcceptorAtomType.N_AZINE                 : 'N_AZINE',
                        MolProp.HBondAcceptorAtomType.N_DIAZINE               : 'N_DIAZINE',
                        MolProp.HBondAcceptorAtomType.N_IMINE                 : 'N_IMINE',
                        MolProp.HBondAcceptorAtomType.S_SULFIDE               : 'S_SULFIDE',
                        MolProp.HBondAcceptorAtomType.S_THIOUREA              : 'S_THIOUREA',
                        MolProp.HBondAcceptorAtomType.P_MONO_DI_PHOSPHINE     : 'P_MONO_DI_PHOSPHINE',
                        MolProp.HBondAcceptorAtomType.P_TRI_PHOSPHINE         : 'P_TRI_PHOSPHINE' }

        hbd_type_str = { MolProp.HBondDonorAtomType.UNDEF                       : 'UNDEF',
                        MolProp.HBondDonorAtomType.NONE                        : 'NONE',
                        MolProp.HBondDonorAtomType.I_HI                        : 'I_HI',
                        MolProp.HBondDonorAtomType.BR_HBR                      : 'BR_HBR',
                        MolProp.HBondDonorAtomType.CL_HCL                      : 'CL_HCL',
                        MolProp.HBondDonorAtomType.S_HSCN                      : 'S_HSCN',
                        MolProp.HBondDonorAtomType.F_HF                        : 'F_HF',
                        MolProp.HBondDonorAtomType.H_H2                        : 'H_H2',
                        MolProp.HBondDonorAtomType.C_HCN                       : 'C_HCN',
                        MolProp.HBondDonorAtomType.C_ETHINE                    : 'C_ETHINE',
                        MolProp.HBondDonorAtomType.N_HN3                       : 'N_HN3',
                        MolProp.HBondDonorAtomType.N_NH3                       : 'N_NH3',
                        MolProp.HBondDonorAtomType.N_NH4                       : 'N_NH4',
                        MolProp.HBondDonorAtomType.N_AMINE                     : 'N_AMINE',
                        MolProp.HBondDonorAtomType.N_AMINIUM                   : 'N_AMINIUM',
                        MolProp.HBondDonorAtomType.N_ANILINE                   : 'N_ANILINE',
                        MolProp.HBondDonorAtomType.N_MONO_DI_NITRO_ANILINE     : 'N_MONO_DI_NITRO_ANILINE',
                        MolProp.HBondDonorAtomType.N_TRI_NITRO_ANILINE         : 'N_TRI_NITRO_ANILINE',
                        MolProp.HBondDonorAtomType.N_PYRROLE                   : 'N_PYRROLE',
                        MolProp.HBondDonorAtomType.N_AMIDE                     : 'N_AMIDE',
                        MolProp.HBondDonorAtomType.N_IMINE                     : 'N_IMINE',
                        MolProp.HBondDonorAtomType.N_IMINIUM                   : 'N_IMINIUM',
                        MolProp.HBondDonorAtomType.S_H2S                       : 'S_H2S',
                        MolProp.HBondDonorAtomType.S_HS                        : 'S_HS',
                        MolProp.HBondDonorAtomType.S_THIOL                     : 'S_THIOL',
                        MolProp.HBondDonorAtomType.O_H3PO4                     : 'O_H3PO4',
                        MolProp.HBondDonorAtomType.O_H2CO3                     : 'O_H2CO3',
                        MolProp.HBondDonorAtomType.O_HCO3                      : 'O_HCO3',
                        MolProp.HBondDonorAtomType.O_H2O2                      : 'O_H2O2',
                        MolProp.HBondDonorAtomType.O_H2O                       : 'O_H2O',
                        MolProp.HBondDonorAtomType.O_CF3SO3H                   : 'O_CF3SO3H',
                        MolProp.HBondDonorAtomType.O_HCLO4                     : 'O_HCLO4',
                        MolProp.HBondDonorAtomType.O_H2SO4                     : 'O_H2SO4',
                        MolProp.HBondDonorAtomType.O_HNO3                      : 'O_HNO3',
                        MolProp.HBondDonorAtomType.O_HSO4                      : 'O_HSO4',
                        MolProp.HBondDonorAtomType.O_HNO2                      : 'O_HNO2',
                        MolProp.HBondDonorAtomType.O_NH2OH                     : 'O_NH2OH',
                        MolProp.HBondDonorAtomType.O_H2PO4                     : 'O_H2PO4',
                        MolProp.HBondDonorAtomType.O_H3BO3                     : 'O_H3BO3',
                        MolProp.HBondDonorAtomType.O_H4SIO4                    : 'O_H4SIO4',
                        MolProp.HBondDonorAtomType.O_HPO4                      : 'O_HPO4',
                        MolProp.HBondDonorAtomType.O_H2BO3                     : 'O_H2BO3',
                        MolProp.HBondDonorAtomType.O_HO                        : 'O_HO',
                        MolProp.HBondDonorAtomType.O_SULFONIC_ACID             : 'O_SULFONIC_ACID',
                        MolProp.HBondDonorAtomType.O_MONO_DI_NITRO_PHENOL      : 'O_MONO_DI_NITRO_PHENOL',
                        MolProp.HBondDonorAtomType.O_HALOGENO_ALCOHOL          : 'O_HALOGENO_ALCOHOL',
                        MolProp.HBondDonorAtomType.O_ALCOHOL                   : 'O_ALCOHOL',
                        MolProp.HBondDonorAtomType.O_TRI_NITRO_PHENOL          : 'O_TRI_NITRO_PHENOL',
                        MolProp.HBondDonorAtomType.O_HALOGENO_PHENOL           : 'O_HALOGENO_PHENOL',
                        MolProp.HBondDonorAtomType.O_PHENOL                    : 'O_PHENOL',
                        MolProp.HBondDonorAtomType.O_CARBOXYLIC_ACID           : 'O_CARBOXYLIC_ACID',
                        MolProp.HBondDonorAtomType.O_HALOGENO_CARBOXYCLIC_ACID : 'O_HALOGENO_CARBOXYCLIC_ACID',
                        MolProp.HBondDonorAtomType.O_ENOL                      : 'O_ENOL',
                        MolProp.HBondDonorAtomType.O_OXIME                     : 'O_OXIME',
                        MolProp.HBondDonorAtomType.O_CL5_PHENOL                : 'O_CL5_PHENOL' }
        
        for atom in molgraph.atoms:
            print('- Atom #%s' % str(molgraph.getAtomIndex(atom)))
            print('\tIs std. hydrogen: %s' % str(MolProp.isOrdinaryHydrogen(atom, molgraph)))
            print('\tIs heavy atom: %s' % str(MolProp.isHeavy(atom)))
            print('\tIs unsaturated: %s' % str(MolProp.isUnsaturated(atom, molgraph)))
            print('\tIs H-bond acceptor: %s' % str(MolProp.isHBondAcceptor(atom, molgraph)))
            print('\tH-bond acceptor type: %s' % hba_type_str[MolProp.getHBondAcceptorType(atom)])
            print('\tIs H-bond donor: %s' % str(MolProp.isHBondDonor(atom, molgraph)))
            print('\tH-bond donor type: %s' % hbd_type_str[MolProp.getHBondDonorType(atom)])
            print('\tIs carbonyl carbon: %s' % str(MolProp.isCarbonylLikeAtom(atom, molgraph, True, True)))
            print('\tIs amide carbon: %s' % str(MolProp.isAmideCenterAtom(atom, molgraph, True, True)))
            print('\tIs amide nitrogen: %s' % str(MolProp.isAmideNitrogen(atom, molgraph, True, True)))
            print('\tIs invertible nitrogen: %s' % str(MolProp.isInvertibleNitrogen(atom, molgraph)))
            print('\tIs planar nitrogen: %s' % str(MolProp.isPlanarNitrogen(atom, molgraph)))
            
            if len(sys.argv) < 2:
                sys.exit('Usage: %s <input mol. file>' % sys.argv[0])


    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:
    
    mols = [list of BasicMolecules]  # Example list of BasicMolecules

    # process molecules one after the other
    for mol in mols:
        try:
            outputProperties(mol)
        except Exception as e:
            sys.exit('Error: processing of molecule failed: ' + str(e))

Calculation of connectivity properties
---------------------------------------

The code snippet below shows to calculate various atom properties that depend on their connectivity to other atoms of
the molecular graph such as the number of connected carbon atoms, heteroatoms, halogens, heavy atoms, chain atoms,
ring atoms, aromatic atoms, and many more.

.. code-block:: python

    import sys
    import os

    import CDPL.Chem as Chem
    import CDPL.MolProp as MolProp
    
    def outputProperties(molgraph: Chem.MolecularGraph) -> None:
        """
        Outputs the corresponding properties of each atom of the provided molecular graph.

        Parameters:
        - molgraph (Chem.MolecularGraph): Molecular graph to process.
        """
        Chem.calcImplicitHydrogenCounts(molgraph, False)  # calculate implicit hydrogen counts and set corresponding property for all atoms
        Chem.perceiveHybridizationStates(molgraph, False) # perceive atom hybridization states and set corresponding property for all atoms
        Chem.perceiveSSSR(molgraph, False)                # perceive smallest set of smallest rings and store as Chem.MolecularGraph property
        Chem.setRingFlags(molgraph, False)                # perceive cycles and set corresponding atom and bond properties
        Chem.setAromaticityFlags(molgraph, False)         # perceive aromaticity and set corresponding atom and bond properties

        vsepr_geom_str = { MolProp.CoordinationGeometry.UNDEF                  : 'UNDEF',
                    MolProp.CoordinationGeometry.NONE                   : 'NONE',
                    MolProp.CoordinationGeometry.LINEAR                 : 'LINEAR',
                    MolProp.CoordinationGeometry.TRIGONAL_PLANAR        : 'TRIGONAL_PLANAR',
                    MolProp.CoordinationGeometry.TETRAHEDRAL            : 'TETRAHEDRAL',
                    MolProp.CoordinationGeometry.TRIGONAL_BIPYRAMIDAL   : 'TRIGONAL_BIPYRAMIDAL',
                    MolProp.CoordinationGeometry.OCTAHEDRAL             : 'OCTAHEDRAL',
                    MolProp.CoordinationGeometry.PENTAGONAL_BIPYRAMIDAL : 'PENTAGONAL_BIPYRAMIDAL',
                    MolProp.CoordinationGeometry.SQUARE_ANTIPRISMATIC   : 'SQUARE_ANTIPRISMATIC',
                    MolProp.CoordinationGeometry.BENT                   : 'BENT',
                    MolProp.CoordinationGeometry.TRIGONAL_PYRAMIDAL     : 'TRIGONAL_PYRAMIDAL',
                    MolProp.CoordinationGeometry.SQUARE_PLANAR          : 'SQUARE_PLANAR',
                    MolProp.CoordinationGeometry.SQUARE_PYRAMIDAL       : 'SQUARE_PYRAMIDAL',
                    MolProp.CoordinationGeometry.T_SHAPED               : 'T_SHAPED',
                    MolProp.CoordinationGeometry.SEESAW                 : 'SEESAW',
                    MolProp.CoordinationGeometry.PENTAGONAL_PYRAMIDAL   : 'PENTAGONAL_PYRAMIDAL',
                    MolProp.CoordinationGeometry.PENTAGONAL_PLANAR      : 'PENTAGONAL_PLANAR' }
        
        for atom in molgraph.atoms:
            print('- Atom #%s' % str(molgraph.getAtomIndex(atom)))
            print('\tNum. connected std. hydrogens (incl. impl. H): %s' % str(MolProp.getOrdinaryHydrogenCount(atom, molgraph)))
            print('\tNum. connected carbon atoms: %s' % str(MolProp.getExplicitAtomCount(atom, molgraph, Chem.AtomType.C)))
            print('\tNum. connected heteroatoms: %s' % str(MolProp.getExplicitAtomCount(atom, molgraph, Chem.AtomType.HET, False)))
            print('\tNum. connected halogens: %s' % str(MolProp.getExplicitAtomCount(atom, molgraph, Chem.AtomType.X, False)))
            print('\tNum. connected heavy atoms: %s' % str(MolProp.getHeavyAtomCount(atom, molgraph)))
            print('\tNum. connected chain atoms (excl. impl. H): %s' % str(MolProp.getExplicitChainAtomCount(atom, molgraph)))
            print('\tNum. connected chain atoms (incl. impl. H): %s' % str(MolProp.getChainAtomCount(atom, molgraph)))
            print('\tNum. connected ring atoms: %s' % str(MolProp.getRingAtomCount(atom, molgraph)))
            print('\tNum. connected aromatic atoms: %s' % str(MolProp.getAromaticAtomCount(atom, molgraph)))
            print('\tNum. incident bonds (excl. impl. H): %s' % str(MolProp.getExplicitBondCount(atom, molgraph)))
            print('\tNum. incident bonds (incl. impl. H): %s' % str(MolProp.getBondCount(atom, molgraph)))
            print('\tNum. incident single bonds (excl. impl. H): %s' % str(MolProp.getExplicitBondCount(atom, molgraph, 1)))
            print('\tNum. incident single bonds (incl. impl. H): %s' % str(MolProp.getBondCount(atom, molgraph, 1)))
            print('\tNum. incident double bonds: %s' % str(MolProp.getBondCount(atom, molgraph, 2)))
            print('\tNum. incident triple bonds: %s' % str(MolProp.getBondCount(atom, molgraph, 3)))
            print('\tNum. incident chain bonds (excl. impl. H): %s' % str(MolProp.getExplicitChainBondCount(atom, molgraph)))
            print('\tNum. incident chain bonds (incl. impl. H): %s' % str(MolProp.getChainBondCount(atom, molgraph)))
            print('\tNum. incident ring bonds (incl. impl. H): %s' % str(MolProp.getRingBondCount(atom, molgraph)))
            print('\tNum. incident aromatic bonds (incl. impl. H): %s' % str(MolProp.getAromaticBondCount(atom, molgraph)))
            print('\tNum. incident heavy atom bonds (incl. impl. H): %s' % str(MolProp.getHeavyBondCount(atom, molgraph)))
            print('\tNum. incident rotatable bonds (incl. impl. H): %s' % str(MolProp.getRotatableBondCount(atom, molgraph, False, False)))
            print('\tValence (excl. impl. H): %s' % str(MolProp.calcExplicitValence(atom, molgraph)))
            print('\tValence (incl. impl. H): %s' % str(MolProp.calcValence(atom, molgraph)))
            print('\tSteric number: %s' % str(MolProp.calcStericNumber(atom, molgraph)))
            print('\tVSEPR coordination geometry: %s' % vsepr_geom_str[MolProp.getVSEPRCoordinationGeometry(atom, molgraph)])


    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:
    
    mols = [list of BasicMolecules]  # Example list of BasicMolecules
    
    # process molecules one after the other
    for mol in mols:
        try:
           outputProperties(mol)
        except Exception as e:
           sys.exit('Error: processing of molecule failed: ' + str(e))


Calculation of Molecular Structure Descriptors
==============================================

The calculation of molecule structure descriptors is one of the fundamental operations in cheminformatics. 
Such descriptors e.g. allow for a modeling and prediction of various structure-dependent properties
with mathematical methods. However, they can also be put to use for numerous other applications such as:

- Search for molecules that are structurally similar to a query molcule
- Pre-filtering step for substructure searching in large chemical databases
- Identifying the presence of particular functional groups or fragments in molecules

.. note::
    The `CDPL.Descr <../cdpl_api_doc/python_api_doc/namespaceCDPL_1_1Descr.html>`_ package provides functionality for the generation
    of various well-known molecule descriptors and fingerprints.

Extended connectivity fingerprints (ECFPs)
------------------------------------------

Morgan circular fingerprints, also known as extended connectivity fingerprints (ECFPs) :cite:`doi:10.1021/ci100050t`, 
are a type of structural fingerprint that encodes the local chemical environment of each atom in a 
molecule as a particular bit in a bitset. They are widely used in cheminformatics for various tasks, such as similarity searching, 
virtual screening, and machine learning.

The following code snippet calculates and outputs the ECFP fingerprints of the provided molecules.
The ECFP generation process can be influenced by the following parameters: 

- *num_bits*: The number of bits of the fingerprint (default: 1024)
- *radius*: Max. atom environment radius in number of bonds (default: 2)
- *inc_hs*: Whether to include explicit hydrogens (by default, the fingerprint is generated for the H-deplete molecular graph)
- *inc_config*: Whether to include atom chirality (by default, the fingerprint is generated for the H-deplete molecular graph)

.. code:: python

    import CDPL.Descr as Descr
    import CDPL.Util as Util

    def genECFP(mol: Chem.Molecule, num_bits: int, radius: int, inc_hs: bool, inc_config: bool) -> Util.BitSet:
        """
        Generates the binary ECFP for the given molecule.

        Parameters:
        - mol (Chem.Molecule): Molecule to process.
        - num_bits (int): Number of bits of the fingerprint.
        - radius (int): Max. atom environment radius in number of bonds.
        - inc_hs (bool): Whether to include explicit hydrogens.
        - inc_config (bool): Whether to include atom chirality.

        Returns:
        - Util.BitSet: The generated fingerprint.
        """

        Chem.calcImplicitHydrogenCounts(mol, False)        # calculate implicit hydrogen counts (if not yet done)
        Chem.perceiveHybridizationStates(mol, False)       # perceive atom hybridization states and set corresponding property for all atoms
        Chem.setRingFlags(mol, False)                      # perceive cycles and set corresponding atom and bond properties
        Chem.perceiveSSSR(mol, False)                      # perceive smallest set of smallest rings and store as Chem.MolecularGraph property
        Chem.setAromaticityFlags(mol, False)               # perceive aromaticity and set corresponding atom and bond properties
        
        ecfp_gen = Descr.CircularFingerprintGenerator()    # create ECFP generator instance

        if inc_config:
            ecfp_gen.includeChirality(True)                # allow atom chirality to have an impact on the ECFP generation
            Chem.calcAtomStereoDescriptors(mol, False)     # calculate atom stereo descriptors and set corresponding property for all atoms

        if inc_hs:        
            ecfp_gen.includeHydrogens(True)                # include explicit hydrogens in the ECFP generation
            Chem.makeHydrogenComplete(mol)                 # make any implicit hydrogens explicit
            
        fp = Util.BitSet()                                 # create fingerprint bitset
        fp.resize(num_bits)                                # set desired fingerprint size

        ecfp_gen.setNumIterations(radius)                  # set num. iterations (=atom. env. radius)
        ecfp_gen.generate(mol)                             # extract chracteristic structural features
        ecfp_gen.setFeatureBits(fp)                        # set bits associated with the extracted structural features

        # if needed, fp could be converted into a numpy single precision float array as follows:
        # fp = numpy.array(fp, dtype=numpy.float32)
        
        return fp

        
    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:

    # Settings
    num_bits = 1024   # fingerprint size
    radius = 4        # atom environment radius
    inc_hs = True     # include explicit hydrogens
    inc_config = True # include atom chirality

    mols = [list of BasicMolecules]  # Example list of BasicMolecules

    # process molecules one after the other
    for mol in mols:
        try:
            fp = genECFP(mol, num_bits, radius, inc_hs, inc_config)

            # do something useful with the fingerprint

        except Exception as e:
            sys.exit('Error: processing of molecule failed: ' + str(e))
   
    out_file.close()

FAME atom environment fingerprints
----------------------------------

This type of fingerprint encodes the local environment of individual atoms up to a configurable maximum bond-path length.
The descriptor was developed for the classification of atoms in the well-known site of metabolism prediction
software *FAME* :cite:`doi:10.1021/acs.jcim.9b00376`.

The following code snippet calculates and outputs the FAME descriptor for each atom of the provided molecules with
the parameter *radius* specifying the max. atom environment radius in number of bonds (default: 2).

.. code:: python

    import CDPL.Chem as Chem

    def genFAMEDescriptor(ctr_atom: Chem.Atom, molgraph: Chem.MolecularGraph, radius: int) -> numpy.array:
        """
        Generates the FAME descriptor for the given atom of the provided molecule.

        Parameters:
        - ctr_atom (Chem.Atom): Atom for which the FAME descriptor is to be calculated.
        - molgraph (Chem.MolecularGraph): Molecule to process.        
        - radius (int): Max. atom environment radius in number of bonds.

        Returns:
        - numpy.array: The generated FAME descriptor.
        """

        env = Chem.Fragment()                                                      # for storing of extracted environment atoms
        descr = numpy.zeros((Chem.SybylAtomType.MAX_TYPE + 1) * (radius + 1))
        
        Chem.getEnvironment(ctr_atom, molgraph, radius, env)                       # extract environment of center atom reaching
                                                                                   # out up to 'radius' bonds
        for atom in env.atoms:                                                     # iterate over extracted environment atoms
            sybyl_type = Chem.getSybylType(atom)                                   # retrieve Sybyl type of environment atom
            top_dist = Chem.getTopologicalDistance(ctr_atom, atom, molgraph)       # get top. distance between center atom and environment atom
            descr[top_dist * (Chem.SybylAtomType.MAX_TYPE + 1) + sybyl_type] += 1  # instead of 1 (= Sybyl type presence) also any other numeric atom
                                                                                   # property could be summed up here
        return descr
            
    def procMolecule(molgraph: Chem.MolecularGraph) -> None: 
        """
        Processes the provided molecule.

        Parameters:
        - molgraph (Chem.MolecularGraph): Molecule to process.
        """
        Chem.calcImplicitHydrogenCounts(molgraph, False)     # calculate implicit hydrogen counts and set corresponding property for all atoms
        Chem.perceiveHybridizationStates(molgraph, False)    # perceive atom hybridization states and set corresponding property for all atoms
        Chem.perceiveSSSR(molgraph, False)                   # perceive smallest set of smallest rings and store as Chem.MolecularGraph property
        Chem.setRingFlags(molgraph, False)                   # perceive cycles and set corresponding atom and bond properties
        Chem.setAromaticityFlags(molgraph, False)            # perceive aromaticity and set corresponding atom and bond properties
        Chem.perceiveSybylAtomTypes(molgraph, False)         # perceive Sybyl atom types and set corresponding property for all atoms
        Chem.calcTopologicalDistanceMatrix(molgraph, False)  # calculate topological distance matrix and store as Chem.MolecularGraph property
    
        for atom in molgraph.atoms:
            descr = genFAMEDescriptor(atom, molgraph, 5)     # generate atom environment descriptor using a radius of five bonds

            print(descr)

    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:

    mols = [list of BasicMolecules]  # Example list of BasicMolecules
    
    # process molecules one after the other
    for mol in mols: 
        try:
            procMolecule(mol)
        except Exception as e:
            sys.exit('Error: processing of molecule failed: ' + str(e))
                

Force Field Calculations
==========================

Force fields are computational methods that estimate the forces between atoms within molecules and also between molecules.
Since they allow for a computationally inexpensive, but nevertheless relatively accurate calculation of the potential energy of a system of atoms,
they are essential for many tasks in the field of molecular modeling such as conformer generation and molecular dynamics (MD) simulations.
One of the widely recognized force fields is the *Merck Molecular Force Field* (MMFF94) 
:cite:`https://doi.org/10.1002/(SICI)1096-987X(199604)17:5/6<490::AID-JCC1>3.0.CO;2-P` which has been designed to be applicable to a
broad range of organic molecules and thus found widespread use.

Calculation of MMFF94 atom charges
----------------------------------

The MMFF94 force field provides a method and associated parameter set for the calculation of partial atomic charges. 
They are not only crucial for the estimation of intra- and intermolecular electrostatic interactions but can also be
used, e.g., as atom descriptors for ML-based model building. 

The following code snippet calculates and outputs the MMFF94 charges of the atoms for a given list of molecules:

.. note::
    Force field related functionality is provided via the `CDPL.ForceField <../cdpl_api_doc/python_api_doc/namespaceCDPL_1_1ForceField.html>`_ package.

.. code-block:: python

    import CDPL.Chem as Chem
    import CDPL.ForceField as ForceField

     def calc_and_output_charges(mol: Chem.BasicMolecule) -> None:
        """
        Calculates and outputs the MMFF94 charges of the atoms for the provided molecule.

        Parameters:
            mol (Chem.BasicMolecule): The molecule for which the MMFF94 charges are to be calculated.
        """
        # Various preprocessing steps to prepare the molecule
        Chem.calcImplicitHydrogenCounts(mol, False)  # calculate implicit hydrogen counts and set corresponding property for all atoms
        Chem.makeHydrogenComplete(mol)               # make all implicit hydrogens explicit
        Chem.perceiveHybridizationStates(mol, False) # perceive atom hybridization states and set corresponding property for all atoms
        Chem.perceiveSSSR(mol, False)                # perceive smallest set of smallest rings and store as Chem.MolecularGraph property
        Chem.setRingFlags(mol, False)                # perceive cycles and set corresponding atom and bond properties
        Chem.setAromaticityFlags(mol, False)         # perceive aromaticity and set corresponding atom and bond properties
    
        ForceField.perceiveMMFF94AromaticRings(mol, False)        # perceive aromatic rings according to the MMFF94 aroamticity model and store data as Chem.MolecularGraph property
        ForceField.assignMMFF94AtomTypes(mol, False, False)       # perceive MMFF94 atom types (tolerant mode) set corresponding property for all atoms
        ForceField.assignMMFF94BondTypeIndices(mol, False, False) # perceive MMFF94 bond types (tolerant mode) set corresponding property for all bonds
        ForceField.calcMMFF94AtomCharges(mol, False, False)       # calculate MMFF94 atom charges (tolerant mode) set corresponding property for all atoms


        print('- MMFF94 partial charges')

        for atom in mol.atoms:
            print('Atom #%s: %s' % (str(atom.getIndex()), str(ForceField.getMMFF94Charge(atom))))

    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:

    mols = [list_of_BasicMolecules]

    for mol in mols:
        calc_and_output_charges(mol)


Substructure Searching and Matching
===================================

Substructure searching (a subgraph isomorphism problem) is one of the most fundamental operations in cheminformatics and allows for the
identification of molecules that contain a specific structural motif or pattern. 
Substructure searching needs to be performed in various application areas, such as:

- *Drug Discovery*: Identifying molecules that contain a particular pharmacophore or motif
- *Chemical Database Querying*: Filtering large chemical databases to retrieve molecules of interest
- *Reactivity Analysis*: Identifying the presence of particular functional groups or fragments in molecules

Filtering molecules based on a SMARTS pattern
---------------------------------------------

The SMARTS notation (SMiles ARbitrary Target Specification) is a language used to describe structural patterns. It's an
extension of the SMILES notation and allows for more complex and specific pattern descriptions.
The CDPL provides substructure searching functionality via the class `Chem.SubstructureSearch <../cdpl_api_doc/python_api_doc/classCDPL_1_1Chem_1_1SubstructureSearch.html>`_.
The code snippet below demonstrates how this class can be used to identify molecules that match a specific structural motif described by a SMARTS pattern.


.. code-block:: python

    import CDPL.Chem as Chem

    def filterMoleculesBySmarts(input_file: str, output_file: str, smarts_pattern: str, quiet: bool = False) -> None:
        """
        Filters molecules from the input file that match the provided SMARTS pattern and writes them to the output file.
        
        Parameters:
            input_file (str): Path to the input file containing the molecules to be filtered.
            output_file (str): Path to the output file to which the filtered molecules are written.
            smarts_pattern (str): SMARTS pattern describing the structural motif to be matched.
            quiet (bool): If set to True, no progress information is printed to the console.
        """
        try:
            sub_srch_ptn = Chem.parseSMARTS(smarts_pattern)
            Chem.initSubstructureSearchQuery(sub_srch_ptn, False)
        except Exception as e:
            print(f'Error: parsing of SMARTS pattern failed: {str(e)}')
            return

        substr_srch = Chem.SubstructureSearch(sub_srch_ptn)
        reader = Chem.MoleculeReader(input_file)
        writer = Chem.MolecularGraphWriter(output_file)
        mol = Chem.BasicMolecule()
        i = 1

        try:
            while reader.read(mol):
                mol_id = Chem.getName(mol).strip() or f'Molecule_{i}'

                Chem.initSubstructureSearchTarget(mol, False)

                if substr_srch.mappingExists(mol):
                    if not quiet:
                        print(f' -> substructure found, forwarding molecule {mol_id} to output file')
                    writer.write(mol)
                elif not quiet:
                    print(f' -> substructure not found in molecule {mol_id}')

                i += 1

        except Exception as e:
            print(f'Error: {str(e)}')

        writer.close()

    ###########################################################################
    ###########################################################################
    ###########################################################################
    ################################ Example usage:
    
    filterMoleculesBySmarts("input.sdf", "output.sdf", "[#6]1:[#6]:[#6]:[#6]:[#6]:[#6]:1")