Chemical Data Processing Library Python API - Version 1.4.0
Classes | Public Member Functions | Properties | List of all members
CDPL.Chem.SubstructureHistogramCalculator Class Reference

Counts occurrences of registered substructure patterns in a molecular graph, emitting the per-pattern hit counts into a user-supplied histogram container. More...

+ Inheritance diagram for CDPL.Chem.SubstructureHistogramCalculator:

Classes

class  Pattern
 Stores a single substructure query molecular graph, its histogram ID, its priority and match-handling flags. More...
 

Public Member Functions

None __init__ ()
 Constructs an empty SubstructureHistogramCalculator instance.
 
None __init__ (SubstructureHistogramCalculator calc)
 Initializes a copy of the SubstructureHistogramCalculator instance calc. More...
 
int getObjectID ()
 Returns the numeric identifier (ID) of the wrapped C++ class instance. More...
 
None addPattern (MolecularGraph molgraph, int id=0, int priority=0, bool all_matches=True, bool unique_matches=True)
 Registers a new pattern by its query molecular graph and per-pattern settings. More...
 
None addPattern (Pattern pattern)
 Appends a copy of the pre-built pattern pattern. More...
 
Pattern getPattern (int idx)
 Returns the registered pattern at index idx. More...
 
None removePattern (int idx)
 Removes the registered pattern at index idx. More...
 
None clear ()
 Removes all registered patterns.
 
int getNumPatterns ()
 Returns the number of registered patterns. More...
 
None calculate (MolecularGraph molgraph, object histo)
 Counts substructure occurrences in molgraph and writes the per-pattern hit counts to histo. More...
 
SubstructureHistogramCalculator assign (SubstructureHistogramCalculator calc)
 Replaces the current state of self with a copy of the state of the SubstructureHistogramCalculator instance calc. More...
 

Properties

 objectID = property(getObjectID)
 
 numPatterns = property(getNumPatterns)
 

Detailed Description

Counts occurrences of registered substructure patterns in a molecular graph, emitting the per-pattern hit counts into a user-supplied histogram container.

Patterns are added via addPattern() (each pattern carries a query molecular graph, a numeric ID, a priority and match-handling flags). On calculate() the registered patterns are run in priority order against the input molecular graph. Matched atom/bond regions are masked so that subsequent lower-priority patterns cannot re-count overlapping substructures. The per-pattern hit count is then forwarded to the histogram via the expression histo[id]++ for every accepted match.

Constructor & Destructor Documentation

◆ __init__()

None CDPL.Chem.SubstructureHistogramCalculator.__init__ ( SubstructureHistogramCalculator  calc)

Initializes a copy of the SubstructureHistogramCalculator instance calc.

Parameters
calcThe SubstructureHistogramCalculator instance to copy.

Member Function Documentation

◆ getObjectID()

int CDPL.Chem.SubstructureHistogramCalculator.getObjectID ( )

Returns the numeric identifier (ID) of the wrapped C++ class instance.

Different Python SubstructureHistogramCalculator instances may reference the same underlying C++ class instance. The commonly used Python expression a is not b thus cannot tell reliably whether the two SubstructureHistogramCalculator instances a and b reference different C++ objects. The numeric identifier returned by this method allows to correctly implement such an identity test via the simple expression a.getObjectID() != b.getObjectID().

Returns
The numeric ID of the internally referenced C++ class instance.

◆ addPattern() [1/2]

None CDPL.Chem.SubstructureHistogramCalculator.addPattern ( MolecularGraph  molgraph,
int   id = 0,
int   priority = 0,
bool   all_matches = True,
bool   unique_matches = True 
)

Registers a new pattern by its query molecular graph and per-pattern settings.

Parameters
molgraphThe query molecular graph.
idThe histogram bin ID to which matches of this pattern contribute.
priorityThe pattern's priority (higher priority patterns are evaluated first).
all_matchesIf True, every match of the query is processed. Otherwise, only the first.
unique_matchesIf True, only one of multiple equivalent substructure mappings is processed per match.

◆ addPattern() [2/2]

None CDPL.Chem.SubstructureHistogramCalculator.addPattern ( Pattern  pattern)

Appends a copy of the pre-built pattern pattern.

Parameters
patternThe pattern to copy and register.

◆ getPattern()

Pattern CDPL.Chem.SubstructureHistogramCalculator.getPattern ( int  idx)

Returns the registered pattern at index idx.

Parameters
idxThe zero-based pattern index.
Returns
A reference to the pattern.
Exceptions
Base.IndexErrorif the number of patterns is zero or idx is not in the range [0, getNumPatterns() - 1].

◆ removePattern()

None CDPL.Chem.SubstructureHistogramCalculator.removePattern ( int  idx)

Removes the registered pattern at index idx.

Parameters
idxThe zero-based pattern index.
Exceptions
Base.IndexErrorif the number of patterns is zero or idx is not in the range [0, getNumPatterns() - 1].

◆ getNumPatterns()

int CDPL.Chem.SubstructureHistogramCalculator.getNumPatterns ( )

Returns the number of registered patterns.

Returns
The pattern count.

◆ calculate()

None CDPL.Chem.SubstructureHistogramCalculator.calculate ( MolecularGraph  molgraph,
object  histo 
)

Counts substructure occurrences in molgraph and writes the per-pattern hit counts to histo.

For every accepted match, the histogram is updated via histo[id] += 1 with the ID being the histogram-bin ID of the matching pattern.

Parameters
molgraphThe molecular graph to be analyzed.
histoThe histogram receiving the hit counts.

◆ assign()

SubstructureHistogramCalculator CDPL.Chem.SubstructureHistogramCalculator.assign ( SubstructureHistogramCalculator  calc)

Replaces the current state of self with a copy of the state of the SubstructureHistogramCalculator instance calc.

Parameters
calcThe SubstructureHistogramCalculator instance to copy.
Returns
self