Chemical Data Processing Library C++ API - Version 1.4.0
Classes | Public Types | Public Member Functions | Static Public Attributes | List of all members
CDPL::Descr::CircularFingerprintGenerator Class Reference

Generation of atom-centered circular substructure fingerprints in the spirit of SciTegic's Extended Connectivity Fingerprints (ECFP). More...

#include <CircularFingerprintGenerator.hpp>

Classes

class  DefAtomIdentifierFunctor
 The functor for the generation of ECFP atom identifiers. More...
 
class  DefBondIdentifierFunctor
 The default functor for the generation of bond identifiers. More...
 

Public Types

typedef std::function< std::uint64_t(const Chem::Atom &, const Chem::MolecularGraph &)> AtomIdentifierFunction
 Type of the generic functor class used to store user-defined functions or function objects for the generation of atom identifiers. More...
 
typedef std::function< std::uint64_t(const Chem::Bond &)> BondIdentifierFunction
 Type of the generic functor class used to store user-defined functions or function objects for the generation of bond identifiers. More...
 

Public Member Functions

 CircularFingerprintGenerator ()
 Constructs the CircularFingerprintGenerator instance. More...
 
 CircularFingerprintGenerator (const Chem::MolecularGraph &molgraph)
 Constructs the CircularFingerprintGenerator instance and generates the atom-centered circular substructure fingerprint of the molecular graph molgraph. More...
 
void setAtomIdentifierFunction (const AtomIdentifierFunction &func)
 Allows to specify a customized function for the generation of initial atom identifiers. More...
 
void setBondIdentifierFunction (const BondIdentifierFunction &func)
 Allows to specify a customized function for the generation of initial bond identifiers. More...
 
void setNumIterations (std::size_t num_iter)
 Allows to specify the desired number of feature substructure growing iterations. More...
 
std::size_t getNumIterations () const
 Returns the number of feature substructure growing iterations. More...
 
void includeHydrogens (bool include)
 Specifies whether hydrogens shall be included in the generated fingerprint. More...
 
bool hydrogensIncluded () const
 Tells whether hydrogens are considered during fingerprint generation. More...
 
void includeChirality (bool include)
 Specifies whether atom stereo configurations shall be incorporated into atom identifiers. More...
 
bool chiralityIncluded () const
 Tells whether atom chirality is considered during fingerprint generation. More...
 
void generate (const Chem::MolecularGraph &molgraph)
 Generates the atom-centered circular substructure fingerprint of the molecular graph molgraph. More...
 
void setFeatureBits (Util::BitSet &bs, bool reset=true) const
 Maps previously generated feature identifiers to bit indices and sets the correponding bits of bs. More...
 
void setFeatureBits (std::size_t atom_idx, Util::BitSet &bs, bool reset=true) const
 Maps previously generated identifiers of structural features involving the atom specified by atom_idx to bit indices and sets the correponding bits of bs. More...
 
std::size_t getNumFeatures () const
 Returns the number of features generated by the most recent call to generate(). More...
 
std::uint64_t getFeatureIdentifier (std::size_t ftr_idx) const
 Returns the identifier of the feature at index ftr_idx. More...
 
const Util::BitSetgetFeatureSubstructure (std::size_t ftr_idx) const
 Returns the atom-bit mask describing the substructure covered by the feature at index ftr_idx. More...
 
void getFeatureSubstructure (std::size_t ftr_idx, Chem::Fragment &frag, bool clear=true) const
 Extracts the substructure covered by the feature at index ftr_idx into frag. More...
 
void getFeatureSubstructures (std::size_t bit_idx, std::size_t bs_size, Chem::FragmentList &frags, bool clear=true) const
 Extracts the substructures of every feature that, when folded into a bitset of size bs_size, maps to the bit index bit_idx. More...
 

Static Public Attributes

static constexpr unsigned int DEF_ATOM_PROPERTY_FLAGS
 Specifies the default set of atomic properties considered in the generation of atom identifiers by DefAtomIdentifierFunctor. More...
 
static constexpr unsigned int DEF_BOND_PROPERTY_FLAGS
 Specifies the default set of bond properties considered in the generation of bond identifiers by DefBondIdentifierFunctor. More...
 

Detailed Description

Generation of atom-centered circular substructure fingerprints in the spirit of SciTegic's Extended Connectivity Fingerprints (ECFP).

Starting from initial atom and bond identifiers (generated either by the built-in DefAtomIdentifierFunctor / DefBondIdentifierFunctor or by user-supplied functions) the generator runs a configurable number of growing iterations (see setNumIterations()). Each iteration produces a new set of feature identifiers from the identifiers of the previous iteration and the connecting bonds, capturing circular substructures of incrementing radius. The resulting feature identifiers can be folded into a bitset of any size via setFeatureBits().

See also
[STECFP]

Member Typedef Documentation

◆ AtomIdentifierFunction

Type of the generic functor class used to store user-defined functions or function objects for the generation of atom identifiers.

Functions or function objects for the generation of atom identifiers are required to take the atom (as a const reference to Chem::Atom) and containing molecular graph (as a const reference to Chem::MolecularGraph) as argument and return the identifier as an integer of type std::uint64_t (see [FUNWRP]).

◆ BondIdentifierFunction

Type of the generic functor class used to store user-defined functions or function objects for the generation of bond identifiers.

Functions or function objects for the generation of bond identifiers are required to take the bond (as a const reference to Chem::Bond) as argument and return the identifier as an integer of type std::uint64_t (see [FUNWRP]).

Constructor & Destructor Documentation

◆ CircularFingerprintGenerator() [1/2]

CDPL::Descr::CircularFingerprintGenerator::CircularFingerprintGenerator ( )

Constructs the CircularFingerprintGenerator instance.

◆ CircularFingerprintGenerator() [2/2]

CDPL::Descr::CircularFingerprintGenerator::CircularFingerprintGenerator ( const Chem::MolecularGraph molgraph)

Constructs the CircularFingerprintGenerator instance and generates the atom-centered circular substructure fingerprint of the molecular graph molgraph.

Parameters
molgraphThe molecular graph to process.

Member Function Documentation

◆ setAtomIdentifierFunction()

void CDPL::Descr::CircularFingerprintGenerator::setAtomIdentifierFunction ( const AtomIdentifierFunction func)

Allows to specify a customized function for the generation of initial atom identifiers.

Parameters
funcA CircularFingerprintGenerator::AtomIdentifierFunction instance that wraps the target function.
Note
By default, atom identifiers are generated by a CircularFingerprintGenerator::DefAtomIdentifierFunctor instance. If the generated initial identifier for an atom is 0, the atom is regarded as not being present in the processed molecular graph.

◆ setBondIdentifierFunction()

void CDPL::Descr::CircularFingerprintGenerator::setBondIdentifierFunction ( const BondIdentifierFunction func)

Allows to specify a customized function for the generation of initial bond identifiers.

Parameters
funcA CircularFingerprintGenerator::BondIdentifierFunction instance that wraps the target function.
Note
By default, bond identifiers are generated by a CircularFingerprintGenerator::DefBondIdentifierFunctor instance. If the generated initial identifier for a bond is 0, the bond is regarded as not being present in the processed molecular graph.

◆ setNumIterations()

void CDPL::Descr::CircularFingerprintGenerator::setNumIterations ( std::size_t  num_iter)

Allows to specify the desired number of feature substructure growing iterations.

Parameters
num_iterThe number of iterations.
Note
The default number of iterations is 2.

◆ getNumIterations()

std::size_t CDPL::Descr::CircularFingerprintGenerator::getNumIterations ( ) const

Returns the number of feature substructure growing iterations.

Returns
The number of iterations.

◆ includeHydrogens()

void CDPL::Descr::CircularFingerprintGenerator::includeHydrogens ( bool  include)

Specifies whether hydrogens shall be included in the generated fingerprint.

Parameters
includeIf true, hydrogens are considered as regular atoms during fingerprint generation.

◆ hydrogensIncluded()

bool CDPL::Descr::CircularFingerprintGenerator::hydrogensIncluded ( ) const

Tells whether hydrogens are considered during fingerprint generation.

Returns
true if hydrogens are considered, and false otherwise.

◆ includeChirality()

void CDPL::Descr::CircularFingerprintGenerator::includeChirality ( bool  include)

Specifies whether atom stereo configurations shall be incorporated into atom identifiers.

Parameters
includeIf true, atom chirality is considered during fingerprint generation.

◆ chiralityIncluded()

bool CDPL::Descr::CircularFingerprintGenerator::chiralityIncluded ( ) const

Tells whether atom chirality is considered during fingerprint generation.

Returns
true if atom chirality is considered, and false otherwise.

◆ generate()

void CDPL::Descr::CircularFingerprintGenerator::generate ( const Chem::MolecularGraph molgraph)

Generates the atom-centered circular substructure fingerprint of the molecular graph molgraph.

Parameters
molgraphThe molecular graph to process.

◆ setFeatureBits() [1/2]

void CDPL::Descr::CircularFingerprintGenerator::setFeatureBits ( Util::BitSet bs,
bool  reset = true 
) const

Maps previously generated feature identifiers to bit indices and sets the correponding bits of bs.

Parameters
bsThe target bitset.
resetIf true, bs will be cleared before any feature bits are set.
Note
The binary fingerprint size is specified implicitly via the size of bs.
See also
generate()

◆ setFeatureBits() [2/2]

void CDPL::Descr::CircularFingerprintGenerator::setFeatureBits ( std::size_t  atom_idx,
Util::BitSet bs,
bool  reset = true 
) const

Maps previously generated identifiers of structural features involving the atom specified by atom_idx to bit indices and sets the correponding bits of bs.

Parameters
atom_idxThe index of the atom that has to be involved in the structural features.
bsThe target bitset.
resetIf true, bs will be cleared before any feature bits are set.
Note
The binary fingerprint size is specified implicitly via the size of bs.
See also
generate()

◆ getNumFeatures()

std::size_t CDPL::Descr::CircularFingerprintGenerator::getNumFeatures ( ) const

Returns the number of features generated by the most recent call to generate().

Returns
The number of features.

◆ getFeatureIdentifier()

std::uint64_t CDPL::Descr::CircularFingerprintGenerator::getFeatureIdentifier ( std::size_t  ftr_idx) const

Returns the identifier of the feature at index ftr_idx.

Parameters
ftr_idxThe zero-based feature index.
Returns
The feature identifier.

◆ getFeatureSubstructure() [1/2]

const Util::BitSet& CDPL::Descr::CircularFingerprintGenerator::getFeatureSubstructure ( std::size_t  ftr_idx) const

Returns the atom-bit mask describing the substructure covered by the feature at index ftr_idx.

In the returned bitset, the bit at position i is set if the atom with index i is part of the feature substructure.

Parameters
ftr_idxThe zero-based feature index.
Returns
A const reference to the atom-bit mask.

◆ getFeatureSubstructure() [2/2]

void CDPL::Descr::CircularFingerprintGenerator::getFeatureSubstructure ( std::size_t  ftr_idx,
Chem::Fragment frag,
bool  clear = true 
) const

Extracts the substructure covered by the feature at index ftr_idx into frag.

Parameters
ftr_idxThe zero-based feature index.
fragThe output fragment.
clearIf true, frag is cleared before atoms and bonds are added.

◆ getFeatureSubstructures()

void CDPL::Descr::CircularFingerprintGenerator::getFeatureSubstructures ( std::size_t  bit_idx,
std::size_t  bs_size,
Chem::FragmentList frags,
bool  clear = true 
) const

Extracts the substructures of every feature that, when folded into a bitset of size bs_size, maps to the bit index bit_idx.

Parameters
bit_idxThe target bit index.
bs_sizeThe bitset size used for the folding.
fragsThe output fragment list.
clearIf true, frags is cleared before any fragments are appended.

Member Data Documentation

◆ DEF_ATOM_PROPERTY_FLAGS

constexpr unsigned int CDPL::Descr::CircularFingerprintGenerator::DEF_ATOM_PROPERTY_FLAGS
staticconstexpr
Initial value:
=
constexpr unsigned int FORMAL_CHARGE
Specifies the formal charge of an atom.
Definition: Chem/AtomPropertyFlag.hpp:73
constexpr unsigned int H_COUNT
Specifies the hydrogen count of an atom.
Definition: Chem/AtomPropertyFlag.hpp:78
constexpr unsigned int ISOTOPE
Specifies the isotopic mass of an atom.
Definition: Chem/AtomPropertyFlag.hpp:68
constexpr unsigned int TOPOLOGY
Specifies the ring/chain topology of an atom.
Definition: Chem/AtomPropertyFlag.hpp:88
constexpr unsigned int HEAVY_BOND_COUNT
Specifies the heavy bond count of an atom.
Definition: Chem/AtomPropertyFlag.hpp:108
constexpr unsigned int VALENCE
Specifies the valence of an atom.
Definition: Chem/AtomPropertyFlag.hpp:113
constexpr unsigned int TYPE
Specifies the generic type or element of an atom.
Definition: Chem/AtomPropertyFlag.hpp:63

Specifies the default set of atomic properties considered in the generation of atom identifiers by DefAtomIdentifierFunctor.

◆ DEF_BOND_PROPERTY_FLAGS

constexpr unsigned int CDPL::Descr::CircularFingerprintGenerator::DEF_BOND_PROPERTY_FLAGS
staticconstexpr
Initial value:
=
constexpr unsigned int AROMATICITY
Specifies the membership of a bond in aromatic rings.
Definition: BondPropertyFlag.hpp:73
constexpr unsigned int ORDER
Specifies the order of a bond.
Definition: BondPropertyFlag.hpp:63

Specifies the default set of bond properties considered in the generation of bond identifiers by DefBondIdentifierFunctor.


The documentation for this class was generated from the following file: