Chemical Data Processing Library Python API - Version 1.0.0
|
Performs Multiple Linear Regression [WLIREG] on a set of data points \( (y_i, \vec{X}_i) \). More...
Public Member Functions | |
None | __init__ () |
Constructs and initializes a regression model with an empty data set. | |
None | __init__ (DMLRModel model) |
Initializes a copy of the DMLRModel instance model. More... | |
int | getObjectID () |
Returns the numeric identifier (ID) of the wrapped C++ class instance. More... | |
DMLRModel | assign (DMLRModel model) |
Replaces the current state of self with a copy of the state of the DMLRModel instance model. More... | |
None | resizeDataSet (int num_points, int num_vars) |
Resizes the data set to hold num_points data points with num_vars independent variables. More... | |
None | clearDataSet () |
Clears the data set. More... | |
None | setXYData (int i, ConstFVectorExpression x_vars, float y) |
Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set. More... | |
None | setXYData (int i, ConstDVectorExpression x_vars, float y) |
Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set. More... | |
None | setXYData (int i, ConstLVectorExpression x_vars, float y) |
Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set. More... | |
None | setXYData (int i, ConstULVectorExpression x_vars, float y) |
Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set. More... | |
None | addXYData (ConstFVectorExpression x_vars, float y) |
Adds a new data point \( (y, \vec{X}) \) to the current data set. More... | |
None | addXYData (ConstDVectorExpression x_vars, float y) |
Adds a new data point \( (y, \vec{X}) \) to the current data set. More... | |
None | addXYData (ConstLVectorExpression x_vars, float y) |
Adds a new data point \( (y, \vec{X}) \) to the current data set. More... | |
None | addXYData (ConstULVectorExpression x_vars, float y) |
Adds a new data point \( (y, \vec{X}) \) to the current data set. More... | |
DMatrix | getXMatrix () |
Returns a read-only matrix where each row represents the vector \( \vec{X}_i \) with independent variables of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More... | |
DVector | getYValues () |
Returns a read-only vector containing the dependent variables \( y_i \) of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More... | |
None | buildModel () |
Performs linear least squares regression modeling of the set of currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More... | |
None | calcStatistics () |
Calculates various statistical parameters describing the built regression model. More... | |
float | calcYValue (ConstFVectorExpression x_vars) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More... | |
float | calcYValue (ConstDVectorExpression x_vars) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More... | |
float | calcYValue (ConstLVectorExpression x_vars) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More... | |
float | calcYValue (ConstULVectorExpression x_vars) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More... | |
DVector | getCoefficients () |
Returns a read-only vector containing the estimated regression coefficients \( \beta_i \) which were calculated by buildModel(). More... | |
float | getChiSquare () |
Returns the sum of squared residuals \( \chi^2 \). More... | |
float | getGoodnessOfFit () |
Returns the goodness of fit \( Q \). More... | |
float | getCorrelationCoefficient () |
Returns the correlation coefficient \( r \). More... | |
float | getStandardDeviation () |
Returns the standard deviation of the residuals \( s_r \). More... | |
float | __call__ (ConstFVectorExpression x_vars) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More... | |
float | __call__ (ConstDVectorExpression x_vars) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More... | |
float | __call__ (ConstLVectorExpression x_vars) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More... | |
float | __call__ (ConstULVectorExpression x_vars) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More... | |
Properties | |
objectID = property(getObjectID) | |
xMatrix = property(getXMatrix) | |
yValues = property(getYValues) | |
coefficients = property(getCoefficients) | |
chiSquare = property(getChiSquare) | |
goodnessOfFit = property(getGoodnessOfFit) | |
correlationCoefficient = property(getCorrelationCoefficient) | |
standardDeviation = property(getStandardDeviation) | |
Performs Multiple Linear Regression [WLIREG] on a set of data points \( (y_i, \vec{X}_i) \).
For each data point, \( y_i \) is the dependent (response) variable and \( \vec{X}_i \) is a \( M \)-dimensional vector containing the independent (explanatory) variables of the modeled function \( y = f(\vec{X}) \). It is assumed that the relationship between the dependent variables \( y_i \) and the independent variables \( \vec{X}_i \) can be modeled by a linear function of \( M \) parameters \( \beta_i, \, i = 1, 2, \ldots, M \) (regression coefficients) plus an error term \( \epsilon_i \):
[ y_i = \beta_1 x_{i1} + \beta_2 x_{i2} + \ldots + \beta_M x_{iM} + \epsilon_i ]
The parameters \( \beta_i \) are estimated by Least Squares Analysis [WLSQRS] which minimizes the sum of squared residuals \( \chi^2 \)
[ \chi^2 = \sum_{i=1}^{N} (y_i - f(\vec{X}_i, \vec{\beta}))^2 ]
of the given set of \( N \) data points with respect to the adjustable parameters \( \vec{\beta} \). The parameters \( \beta_i \) are computed using Singular Value Decomposition [WSVD] as implemented in [NRIC]. This method is computationally intensive, but is particularly useful if the \( X \) matrix is ill-conditioned.
None CDPL.Math.DMLRModel.__init__ | ( | DMLRModel | model | ) |
Initializes a copy of the DMLRModel instance model.
model | The DMLRModel instance to copy. |
int CDPL.Math.DMLRModel.getObjectID | ( | ) |
Returns the numeric identifier (ID) of the wrapped C++ class instance.
Different Python DMLRModel instances may reference the same underlying C++ class instance. The commonly used Python expression a is not b
thus cannot tell reliably whether the two DMLRModel instances a and b reference different C++ objects. The numeric identifier returned by this method allows to correctly implement such an identity test via the simple expression a.getObjectID() != b.getObjectID()
.
Replaces the current state of self with a copy of the state of the DMLRModel instance model.
model | The DMLRModel instance to copy. |
None CDPL.Math.DMLRModel.resizeDataSet | ( | int | num_points, |
int | num_vars | ||
) |
Resizes the data set to hold num_points data points with num_vars independent variables.
num_points | The number of data points. |
num_vars | The number of independent variables per data point. |
None CDPL.Math.DMLRModel.clearDataSet | ( | ) |
Clears the data set.
Equivalent to calling resizeDataSet() with both arguments beeing zero.
None CDPL.Math.DMLRModel.setXYData | ( | int | i, |
ConstFVectorExpression | x_vars, | ||
float | y | ||
) |
Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set.
If i is larger or equal to the number of currently stored data points or if the number of independent variables provided by x_vars is larger than the maximum number so far, the data set will be resized accordingly. Emerging space between the data points so far and the new data point is filled up with zeros. If the number of independent variables provided by x_vars is smaller than the number of variables in the current data set, the missing independent variables are assumed to be zero.
i | The zero-based index of the data point in the data set. |
x_vars | The vector \( \vec{X}_i \) with independent variables. |
y | The dependent variable \( y_i \). |
None CDPL.Math.DMLRModel.setXYData | ( | int | i, |
ConstDVectorExpression | x_vars, | ||
float | y | ||
) |
Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set.
If i is larger or equal to the number of currently stored data points or if the number of independent variables provided by x_vars is larger than the maximum number so far, the data set will be resized accordingly. Emerging space between the data points so far and the new data point is filled up with zeros. If the number of independent variables provided by x_vars is smaller than the number of variables in the current data set, the missing independent variables are assumed to be zero.
i | The zero-based index of the data point in the data set. |
x_vars | The vector \( \vec{X}_i \) with independent variables. |
y | The dependent variable \( y_i \). |
None CDPL.Math.DMLRModel.setXYData | ( | int | i, |
ConstLVectorExpression | x_vars, | ||
float | y | ||
) |
Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set.
If i is larger or equal to the number of currently stored data points or if the number of independent variables provided by x_vars is larger than the maximum number so far, the data set will be resized accordingly. Emerging space between the data points so far and the new data point is filled up with zeros. If the number of independent variables provided by x_vars is smaller than the number of variables in the current data set, the missing independent variables are assumed to be zero.
i | The zero-based index of the data point in the data set. |
x_vars | The vector \( \vec{X}_i \) with independent variables. |
y | The dependent variable \( y_i \). |
None CDPL.Math.DMLRModel.setXYData | ( | int | i, |
ConstULVectorExpression | x_vars, | ||
float | y | ||
) |
Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set.
If i is larger or equal to the number of currently stored data points or if the number of independent variables provided by x_vars is larger than the maximum number so far, the data set will be resized accordingly. Emerging space between the data points so far and the new data point is filled up with zeros. If the number of independent variables provided by x_vars is smaller than the number of variables in the current data set, the missing independent variables are assumed to be zero.
i | The zero-based index of the data point in the data set. |
x_vars | The vector \( \vec{X}_i \) with independent variables. |
y | The dependent variable \( y_i \). |
None CDPL.Math.DMLRModel.addXYData | ( | ConstFVectorExpression | x_vars, |
float | y | ||
) |
Adds a new data point \( (y, \vec{X}) \) to the current data set.
If the number of independent variables provided by x_vars is larger than the number in the current data set, the data set is resized accordingly and any emerging space will be filled with zeros. If the number of independent variables provided by x_vars is smaller than the number in the current data set, the missing independent variables are assumed to be zero.
x_vars | The vector \( \vec{X} \) with independent variables. |
y | The dependent variable \( y \). |
addXYData
(). None CDPL.Math.DMLRModel.addXYData | ( | ConstDVectorExpression | x_vars, |
float | y | ||
) |
Adds a new data point \( (y, \vec{X}) \) to the current data set.
If the number of independent variables provided by x_vars is larger than the number in the current data set, the data set is resized accordingly and any emerging space will be filled with zeros. If the number of independent variables provided by x_vars is smaller than the number in the current data set, the missing independent variables are assumed to be zero.
x_vars | The vector \( \vec{X} \) with independent variables. |
y | The dependent variable \( y \). |
addXYData
(). None CDPL.Math.DMLRModel.addXYData | ( | ConstLVectorExpression | x_vars, |
float | y | ||
) |
Adds a new data point \( (y, \vec{X}) \) to the current data set.
If the number of independent variables provided by x_vars is larger than the number in the current data set, the data set is resized accordingly and any emerging space will be filled with zeros. If the number of independent variables provided by x_vars is smaller than the number in the current data set, the missing independent variables are assumed to be zero.
x_vars | The vector \( \vec{X} \) with independent variables. |
y | The dependent variable \( y \). |
addXYData
(). None CDPL.Math.DMLRModel.addXYData | ( | ConstULVectorExpression | x_vars, |
float | y | ||
) |
Adds a new data point \( (y, \vec{X}) \) to the current data set.
If the number of independent variables provided by x_vars is larger than the number in the current data set, the data set is resized accordingly and any emerging space will be filled with zeros. If the number of independent variables provided by x_vars is smaller than the number in the current data set, the missing independent variables are assumed to be zero.
x_vars | The vector \( \vec{X} \) with independent variables. |
y | The dependent variable \( y \). |
addXYData
(). DMatrix CDPL.Math.DMLRModel.getXMatrix | ( | ) |
Returns a read-only matrix where each row represents the vector \( \vec{X}_i \) with independent variables of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).
DVector CDPL.Math.DMLRModel.getYValues | ( | ) |
Returns a read-only vector containing the dependent variables \( y_i \) of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).
None CDPL.Math.DMLRModel.buildModel | ( | ) |
Performs linear least squares regression modeling of the set of currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).
Base.CalculationFailed | if the data set is empty or the singular value decomposition of the \( X \) matrix failed. |
None CDPL.Math.DMLRModel.calcStatistics | ( | ) |
Calculates various statistical parameters describing the built regression model.
Base.CalculationFailed | if the data set is in an inconsistent state (e.g. the number of estimated regression coefficients does not match the number of independent variables that make up the data points). |
float CDPL.Math.DMLRModel.calcYValue | ( | ConstFVectorExpression | x_vars | ) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.
x_vars | The vector \( \vec{X} \) of independent variables. |
Base.CalculationFailed | if the number of regression coefficients \( \beta_i \) does not match the size of x_vars. |
float CDPL.Math.DMLRModel.calcYValue | ( | ConstDVectorExpression | x_vars | ) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.
x_vars | The vector \( \vec{X} \) of independent variables. |
Base.CalculationFailed | if the number of regression coefficients \( \beta_i \) does not match the size of x_vars. |
float CDPL.Math.DMLRModel.calcYValue | ( | ConstLVectorExpression | x_vars | ) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.
x_vars | The vector \( \vec{X} \) of independent variables. |
Base.CalculationFailed | if the number of regression coefficients \( \beta_i \) does not match the size of x_vars. |
float CDPL.Math.DMLRModel.calcYValue | ( | ConstULVectorExpression | x_vars | ) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.
x_vars | The vector \( \vec{X} \) of independent variables. |
Base.CalculationFailed | if the number of regression coefficients \( \beta_i \) does not match the size of x_vars. |
DVector CDPL.Math.DMLRModel.getCoefficients | ( | ) |
Returns a read-only vector containing the estimated regression coefficients \( \beta_i \) which were calculated by buildModel().
float CDPL.Math.DMLRModel.getChiSquare | ( | ) |
Returns the sum of squared residuals \( \chi^2 \).
\( \chi^2 \) is calculated by:
[ \chi^2 = \sum_{i=1}^{N} (y_i - \sum_{j=1}^{M}(x_{ij} \beta_j))^2 ]
float CDPL.Math.DMLRModel.getGoodnessOfFit | ( | ) |
Returns the goodness of fit \( Q \).
The goodness of fit \( Q \) is given by:
[ Q = gammaq(\frac{N - 2}{2}, \frac{\chi^2}{2}) ]
where \( gammaq \) is the incomplete gamma function (see [NRIC] for details).
float CDPL.Math.DMLRModel.getCorrelationCoefficient | ( | ) |
Returns the correlation coefficient \( r \).
The correlation coefficient \( r \) is calculated by:
[ r = \frac{\sum_{i=1}^{N} (\hat{y}_i - \bar{\hat{y}})(y_i - \bar{y})} {\sqrt{\sum_{i=1}^{N} (\hat{y}_i - \bar{\hat{y}})^2 \sum_{i=1}^{N} (y_i - \bar{y})^2 }} ]
where
\begin{eqnarray*} \hat{y}_i &=& \sum_{j=1}^{M}(x_{ij} \beta_j) \ \bar{\hat{y}} &=& \frac{\sum_{i=1}^{N} \hat{y}_i}{N} \ \bar{y} &=& \frac{\sum_{i=1}^{N} y_i}{N} \end{eqnarray*}
float CDPL.Math.DMLRModel.getStandardDeviation | ( | ) |
Returns the standard deviation of the residuals \( s_r \).
The standard deviation \( s_r \) is calculated by:
[ s_r = \sqrt{\frac{\sum_{i=1}^{N} (y_i - \sum_{j=1}^{M} (x_{ij} \beta_j))^2} {N - M}} ]
float CDPL.Math.DMLRModel.__call__ | ( | ConstFVectorExpression | x_vars | ) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.
Equivalent to calling calcYValue().
x_vars | The vector \( \vec{X} \) of independent variables. |
Base.CalculationFailed | if the number of regression coefficients \( \beta_i \) does not match the size of x_vars. |
float CDPL.Math.DMLRModel.__call__ | ( | ConstDVectorExpression | x_vars | ) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.
Equivalent to calling calcYValue().
x_vars | The vector \( \vec{X} \) of independent variables. |
Base.CalculationFailed | if the number of regression coefficients \( \beta_i \) does not match the size of x_vars. |
float CDPL.Math.DMLRModel.__call__ | ( | ConstLVectorExpression | x_vars | ) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.
Equivalent to calling calcYValue().
x_vars | The vector \( \vec{X} \) of independent variables. |
Base.CalculationFailed | if the number of regression coefficients \( \beta_i \) does not match the size of x_vars. |
float CDPL.Math.DMLRModel.__call__ | ( | ConstULVectorExpression | x_vars | ) |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.
Equivalent to calling calcYValue().
x_vars | The vector \( \vec{X} \) of independent variables. |
Base.CalculationFailed | if the number of regression coefficients \( \beta_i \) does not match the size of x_vars. |