Chemical Data Processing Library Python API - Version 1.1.1
Public Member Functions | Properties | List of all members
CDPL.Math.DMLRModel Class Reference

Performs Multiple Linear Regression [WLIREG] on a set of data points \( (y_i, \vec{X}_i) \). More...

+ Inheritance diagram for CDPL.Math.DMLRModel:

Public Member Functions

None __init__ ()
 Constructs and initializes a regression model with an empty data set.
 
None __init__ (DMLRModel model)
 Initializes a copy of the DMLRModel instance model. More...
 
int getObjectID ()
 Returns the numeric identifier (ID) of the wrapped C++ class instance. More...
 
DMLRModel assign (DMLRModel model)
 Replaces the current state of self with a copy of the state of the DMLRModel instance model. More...
 
None resizeDataSet (int num_points, int num_vars)
 Resizes the data set to hold num_points data points with num_vars independent variables. More...
 
None clearDataSet ()
 Clears the data set. More...
 
None setXYData (int i, ConstFVectorExpression x_vars, float y)
 Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set. More...
 
None setXYData (int i, ConstDVectorExpression x_vars, float y)
 Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set. More...
 
None setXYData (int i, ConstLVectorExpression x_vars, float y)
 Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set. More...
 
None setXYData (int i, ConstULVectorExpression x_vars, float y)
 Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set. More...
 
None addXYData (ConstFVectorExpression x_vars, float y)
 Adds a new data point \( (y, \vec{X}) \) to the current data set. More...
 
None addXYData (ConstDVectorExpression x_vars, float y)
 Adds a new data point \( (y, \vec{X}) \) to the current data set. More...
 
None addXYData (ConstLVectorExpression x_vars, float y)
 Adds a new data point \( (y, \vec{X}) \) to the current data set. More...
 
None addXYData (ConstULVectorExpression x_vars, float y)
 Adds a new data point \( (y, \vec{X}) \) to the current data set. More...
 
DMatrix getXMatrix ()
 Returns a read-only matrix where each row represents the vector \( \vec{X}_i \) with independent variables of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More...
 
DVector getYValues ()
 Returns a read-only vector containing the dependent variables \( y_i \) of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More...
 
None buildModel ()
 Performs linear least squares regression modeling of the set of currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More...
 
None calcStatistics ()
 Calculates various statistical parameters describing the built regression model. More...
 
float calcYValue (ConstFVectorExpression x_vars)
 Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More...
 
float calcYValue (ConstDVectorExpression x_vars)
 Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More...
 
float calcYValue (ConstLVectorExpression x_vars)
 Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More...
 
float calcYValue (ConstULVectorExpression x_vars)
 Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More...
 
DVector getCoefficients ()
 Returns a read-only vector containing the estimated regression coefficients \( \beta_i \) which were calculated by buildModel(). More...
 
float getChiSquare ()
 Returns the sum of squared residuals \( \chi^2 \). More...
 
float getGoodnessOfFit ()
 Returns the goodness of fit \( Q \). More...
 
float getCorrelationCoefficient ()
 Returns the correlation coefficient \( r \). More...
 
float getStandardDeviation ()
 Returns the standard deviation of the residuals \( s_r \). More...
 
float __call__ (ConstFVectorExpression x_vars)
 Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More...
 
float __call__ (ConstDVectorExpression x_vars)
 Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More...
 
float __call__ (ConstLVectorExpression x_vars)
 Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More...
 
float __call__ (ConstULVectorExpression x_vars)
 Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More...
 

Properties

 objectID = property(getObjectID)
 
 xMatrix = property(getXMatrix)
 
 yValues = property(getYValues)
 
 coefficients = property(getCoefficients)
 
 chiSquare = property(getChiSquare)
 
 goodnessOfFit = property(getGoodnessOfFit)
 
 correlationCoefficient = property(getCorrelationCoefficient)
 
 standardDeviation = property(getStandardDeviation)
 

Detailed Description

Performs Multiple Linear Regression [WLIREG] on a set of data points \( (y_i, \vec{X}_i) \).

For each data point, \( y_i \) is the dependent (response) variable and \( \vec{X}_i \) is a \( M \)-dimensional vector containing the independent (explanatory) variables of the modeled function \( y = f(\vec{X}) \). It is assumed that the relationship between the dependent variables \( y_i \) and the independent variables \( \vec{X}_i \) can be modeled by a linear function of \( M \) parameters \( \beta_i, \, i = 1, 2, \ldots, M \) (regression coefficients) plus an error term \( \epsilon_i \):

[ y_i = \beta_1 x_{i1} + \beta_2 x_{i2} + \ldots + \beta_M x_{iM} + \epsilon_i ]

The parameters \( \beta_i \) are estimated by Least Squares Analysis [WLSQRS] which minimizes the sum of squared residuals \( \chi^2 \)

[ \chi^2 = \sum_{i=1}^{N} (y_i - f(\vec{X}_i, \vec{\beta}))^2 ]

of the given set of \( N \) data points with respect to the adjustable parameters \( \vec{\beta} \). The parameters \( \beta_i \) are computed using Singular Value Decomposition [WSVD] as implemented in [NRIC]. This method is computationally intensive, but is particularly useful if the \( X \) matrix is ill-conditioned.

Constructor & Destructor Documentation

◆ __init__()

None CDPL.Math.DMLRModel.__init__ ( DMLRModel  model)

Initializes a copy of the DMLRModel instance model.

Parameters
modelThe DMLRModel instance to copy.

Member Function Documentation

◆ getObjectID()

int CDPL.Math.DMLRModel.getObjectID ( )

Returns the numeric identifier (ID) of the wrapped C++ class instance.

Different Python DMLRModel instances may reference the same underlying C++ class instance. The commonly used Python expression a is not b thus cannot tell reliably whether the two DMLRModel instances a and b reference different C++ objects. The numeric identifier returned by this method allows to correctly implement such an identity test via the simple expression a.getObjectID() != b.getObjectID().

Returns
The numeric ID of the internally referenced C++ class instance.

◆ assign()

DMLRModel CDPL.Math.DMLRModel.assign ( DMLRModel  model)

Replaces the current state of self with a copy of the state of the DMLRModel instance model.

Parameters
modelThe DMLRModel instance to copy.
Returns
self

◆ resizeDataSet()

None CDPL.Math.DMLRModel.resizeDataSet ( int  num_points,
int  num_vars 
)

Resizes the data set to hold num_points data points with num_vars independent variables.

Parameters
num_pointsThe number of data points.
num_varsThe number of independent variables per data point.

◆ clearDataSet()

None CDPL.Math.DMLRModel.clearDataSet ( )

Clears the data set.

Equivalent to calling resizeDataSet() with both arguments beeing zero.

◆ setXYData() [1/4]

None CDPL.Math.DMLRModel.setXYData ( int  i,
ConstFVectorExpression  x_vars,
float  y 
)

Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set.

If i is larger or equal to the number of currently stored data points or if the number of independent variables provided by x_vars is larger than the maximum number so far, the data set will be resized accordingly. Emerging space between the data points so far and the new data point is filled up with zeros. If the number of independent variables provided by x_vars is smaller than the number of variables in the current data set, the missing independent variables are assumed to be zero.

Parameters
iThe zero-based index of the data point in the data set.
x_varsThe vector \( \vec{X}_i \) with independent variables.
yThe dependent variable \( y_i \).

◆ setXYData() [2/4]

None CDPL.Math.DMLRModel.setXYData ( int  i,
ConstDVectorExpression  x_vars,
float  y 
)

Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set.

If i is larger or equal to the number of currently stored data points or if the number of independent variables provided by x_vars is larger than the maximum number so far, the data set will be resized accordingly. Emerging space between the data points so far and the new data point is filled up with zeros. If the number of independent variables provided by x_vars is smaller than the number of variables in the current data set, the missing independent variables are assumed to be zero.

Parameters
iThe zero-based index of the data point in the data set.
x_varsThe vector \( \vec{X}_i \) with independent variables.
yThe dependent variable \( y_i \).

◆ setXYData() [3/4]

None CDPL.Math.DMLRModel.setXYData ( int  i,
ConstLVectorExpression  x_vars,
float  y 
)

Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set.

If i is larger or equal to the number of currently stored data points or if the number of independent variables provided by x_vars is larger than the maximum number so far, the data set will be resized accordingly. Emerging space between the data points so far and the new data point is filled up with zeros. If the number of independent variables provided by x_vars is smaller than the number of variables in the current data set, the missing independent variables are assumed to be zero.

Parameters
iThe zero-based index of the data point in the data set.
x_varsThe vector \( \vec{X}_i \) with independent variables.
yThe dependent variable \( y_i \).

◆ setXYData() [4/4]

None CDPL.Math.DMLRModel.setXYData ( int  i,
ConstULVectorExpression  x_vars,
float  y 
)

Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set.

If i is larger or equal to the number of currently stored data points or if the number of independent variables provided by x_vars is larger than the maximum number so far, the data set will be resized accordingly. Emerging space between the data points so far and the new data point is filled up with zeros. If the number of independent variables provided by x_vars is smaller than the number of variables in the current data set, the missing independent variables are assumed to be zero.

Parameters
iThe zero-based index of the data point in the data set.
x_varsThe vector \( \vec{X}_i \) with independent variables.
yThe dependent variable \( y_i \).

◆ addXYData() [1/4]

None CDPL.Math.DMLRModel.addXYData ( ConstFVectorExpression  x_vars,
float  y 
)

Adds a new data point \( (y, \vec{X}) \) to the current data set.

If the number of independent variables provided by x_vars is larger than the number in the current data set, the data set is resized accordingly and any emerging space will be filled with zeros. If the number of independent variables provided by x_vars is smaller than the number in the current data set, the missing independent variables are assumed to be zero.

Parameters
x_varsThe vector \( \vec{X} \) with independent variables.
yThe dependent variable \( y \).
Note
If the final size of the data set is known in advance, a call to resizeDataSet() followed by calls to setXYData() for each data point is more efficient than a build-up of the data set by repeatedly calling addXYData().

◆ addXYData() [2/4]

None CDPL.Math.DMLRModel.addXYData ( ConstDVectorExpression  x_vars,
float  y 
)

Adds a new data point \( (y, \vec{X}) \) to the current data set.

If the number of independent variables provided by x_vars is larger than the number in the current data set, the data set is resized accordingly and any emerging space will be filled with zeros. If the number of independent variables provided by x_vars is smaller than the number in the current data set, the missing independent variables are assumed to be zero.

Parameters
x_varsThe vector \( \vec{X} \) with independent variables.
yThe dependent variable \( y \).
Note
If the final size of the data set is known in advance, a call to resizeDataSet() followed by calls to setXYData() for each data point is more efficient than a build-up of the data set by repeatedly calling addXYData().

◆ addXYData() [3/4]

None CDPL.Math.DMLRModel.addXYData ( ConstLVectorExpression  x_vars,
float  y 
)

Adds a new data point \( (y, \vec{X}) \) to the current data set.

If the number of independent variables provided by x_vars is larger than the number in the current data set, the data set is resized accordingly and any emerging space will be filled with zeros. If the number of independent variables provided by x_vars is smaller than the number in the current data set, the missing independent variables are assumed to be zero.

Parameters
x_varsThe vector \( \vec{X} \) with independent variables.
yThe dependent variable \( y \).
Note
If the final size of the data set is known in advance, a call to resizeDataSet() followed by calls to setXYData() for each data point is more efficient than a build-up of the data set by repeatedly calling addXYData().

◆ addXYData() [4/4]

None CDPL.Math.DMLRModel.addXYData ( ConstULVectorExpression  x_vars,
float  y 
)

Adds a new data point \( (y, \vec{X}) \) to the current data set.

If the number of independent variables provided by x_vars is larger than the number in the current data set, the data set is resized accordingly and any emerging space will be filled with zeros. If the number of independent variables provided by x_vars is smaller than the number in the current data set, the missing independent variables are assumed to be zero.

Parameters
x_varsThe vector \( \vec{X} \) with independent variables.
yThe dependent variable \( y \).
Note
If the final size of the data set is known in advance, a call to resizeDataSet() followed by calls to setXYData() for each data point is more efficient than a build-up of the data set by repeatedly calling addXYData().

◆ getXMatrix()

DMatrix CDPL.Math.DMLRModel.getXMatrix ( )

Returns a read-only matrix where each row represents the vector \( \vec{X}_i \) with independent variables of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).

Returns
A reference to the matrix with the independent variables \( \vec{X}_i \).

◆ getYValues()

DVector CDPL.Math.DMLRModel.getYValues ( )

Returns a read-only vector containing the dependent variables \( y_i \) of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).

Returns
A reference to the vector with the dependent variables \( y_i \).

◆ buildModel()

None CDPL.Math.DMLRModel.buildModel ( )

Performs linear least squares regression modeling of the set of currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).

Exceptions
Base.CalculationFailedif the data set is empty or the singular value decomposition of the \( X \) matrix failed.

◆ calcStatistics()

None CDPL.Math.DMLRModel.calcStatistics ( )

Calculates various statistical parameters describing the built regression model.

Exceptions
Base.CalculationFailedif the data set is in an inconsistent state (e.g. the number of estimated regression coefficients does not match the number of independent variables that make up the data points).
See also
buildModel(), getChiSquare(), getGoodnessOfFit(), getCorrelationCoefficient(), getStandardDeviation()

◆ calcYValue() [1/4]

float CDPL.Math.DMLRModel.calcYValue ( ConstFVectorExpression  x_vars)

Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.

Parameters
x_varsThe vector \( \vec{X} \) of independent variables.
Returns
The predicted value for \( y \).
Exceptions
Base.CalculationFailedif the number of regression coefficients \( \beta_i \) does not match the size of x_vars.

◆ calcYValue() [2/4]

float CDPL.Math.DMLRModel.calcYValue ( ConstDVectorExpression  x_vars)

Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.

Parameters
x_varsThe vector \( \vec{X} \) of independent variables.
Returns
The predicted value for \( y \).
Exceptions
Base.CalculationFailedif the number of regression coefficients \( \beta_i \) does not match the size of x_vars.

◆ calcYValue() [3/4]

float CDPL.Math.DMLRModel.calcYValue ( ConstLVectorExpression  x_vars)

Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.

Parameters
x_varsThe vector \( \vec{X} \) of independent variables.
Returns
The predicted value for \( y \).
Exceptions
Base.CalculationFailedif the number of regression coefficients \( \beta_i \) does not match the size of x_vars.

◆ calcYValue() [4/4]

float CDPL.Math.DMLRModel.calcYValue ( ConstULVectorExpression  x_vars)

Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.

Parameters
x_varsThe vector \( \vec{X} \) of independent variables.
Returns
The predicted value for \( y \).
Exceptions
Base.CalculationFailedif the number of regression coefficients \( \beta_i \) does not match the size of x_vars.

◆ getCoefficients()

DVector CDPL.Math.DMLRModel.getCoefficients ( )

Returns a read-only vector containing the estimated regression coefficients \( \beta_i \) which were calculated by buildModel().

Returns
A reference to the vector with the estimated regression coefficients \( \beta_i \).

◆ getChiSquare()

float CDPL.Math.DMLRModel.getChiSquare ( )

Returns the sum of squared residuals \( \chi^2 \).

\( \chi^2 \) is calculated by:

[ \chi^2 = \sum_{i=1}^{N} (y_i - \sum_{j=1}^{M}(x_{ij} \beta_j))^2 ]

Returns
The sum of squared residuals \( \chi^2 \).
Note
The returned value is only valid if calcStatistics() has been called before.

◆ getGoodnessOfFit()

float CDPL.Math.DMLRModel.getGoodnessOfFit ( )

Returns the goodness of fit \( Q \).

The goodness of fit \( Q \) is given by:

[ Q = gammaq(\frac{N - 2}{2}, \frac{\chi^2}{2}) ]

where \( gammaq \) is the incomplete gamma function (see [NRIC] for details).

Returns
The goodness of fit \( Q \).
Note
The returned value is only valid if calcStatistics() has been called before.
See also
Math.gammaQ()

◆ getCorrelationCoefficient()

float CDPL.Math.DMLRModel.getCorrelationCoefficient ( )

Returns the correlation coefficient \( r \).

The correlation coefficient \( r \) is calculated by:

[ r = \frac{\sum_{i=1}^{N} (\hat{y}_i - \bar{\hat{y}})(y_i - \bar{y})} {\sqrt{\sum_{i=1}^{N} (\hat{y}_i - \bar{\hat{y}})^2 \sum_{i=1}^{N} (y_i - \bar{y})^2 }} ]

where

\begin{eqnarray*} \hat{y}_i &=& \sum_{j=1}^{M}(x_{ij} \beta_j) \ \bar{\hat{y}} &=& \frac{\sum_{i=1}^{N} \hat{y}_i}{N} \ \bar{y} &=& \frac{\sum_{i=1}^{N} y_i}{N} \end{eqnarray*}

Returns
The correlation coefficient \( r \).
Note
The returned value is only valid if calcStatistics() has been called before.

◆ getStandardDeviation()

float CDPL.Math.DMLRModel.getStandardDeviation ( )

Returns the standard deviation of the residuals \( s_r \).

The standard deviation \( s_r \) is calculated by:

[ s_r = \sqrt{\frac{\sum_{i=1}^{N} (y_i - \sum_{j=1}^{M} (x_{ij} \beta_j))^2} {N - M}} ]

Returns
The standard deviation of the residuals \( s_r \).
Note
\( s_r \) is only definded if \( N > M \) and calcStatistics() has been called before.

◆ __call__() [1/4]

float CDPL.Math.DMLRModel.__call__ ( ConstFVectorExpression  x_vars)

Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.

Equivalent to calling calcYValue().

Parameters
x_varsThe vector \( \vec{X} \) of independent variables.
Returns
The predicted value for \( y \).
Exceptions
Base.CalculationFailedif the number of regression coefficients \( \beta_i \) does not match the size of x_vars.

◆ __call__() [2/4]

float CDPL.Math.DMLRModel.__call__ ( ConstDVectorExpression  x_vars)

Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.

Equivalent to calling calcYValue().

Parameters
x_varsThe vector \( \vec{X} \) of independent variables.
Returns
The predicted value for \( y \).
Exceptions
Base.CalculationFailedif the number of regression coefficients \( \beta_i \) does not match the size of x_vars.

◆ __call__() [3/4]

float CDPL.Math.DMLRModel.__call__ ( ConstLVectorExpression  x_vars)

Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.

Equivalent to calling calcYValue().

Parameters
x_varsThe vector \( \vec{X} \) of independent variables.
Returns
The predicted value for \( y \).
Exceptions
Base.CalculationFailedif the number of regression coefficients \( \beta_i \) does not match the size of x_vars.

◆ __call__() [4/4]

float CDPL.Math.DMLRModel.__call__ ( ConstULVectorExpression  x_vars)

Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.

Equivalent to calling calcYValue().

Parameters
x_varsThe vector \( \vec{X} \) of independent variables.
Returns
The predicted value for \( y \).
Exceptions
Base.CalculationFailedif the number of regression coefficients \( \beta_i \) does not match the size of x_vars.