Chemical Data Processing Library C++ API - Version 1.0.0
|
Performs Multiple Linear Regression [WLIREG] on a set of data points \( (y_i, \vec{X}_i) \). More...
#include <MLRModel.hpp>
Public Types | |
typedef CommonType< typename Vector< T >::SizeType, typename Matrix< T >::SizeType >::Type | SizeType |
typedef T | ValueType |
typedef Matrix< T > | MatrixType |
typedef Vector< T > | VectorType |
Public Member Functions | |
MLRModel () | |
Constructs and initializes a regression model with an empty data set. More... | |
void | resizeDataSet (SizeType num_points, SizeType num_vars) |
Resizes the data set to hold num_points data points with num_vars independent variables. More... | |
void | clearDataSet () |
Clears the data set. More... | |
template<typename V > | |
void | setXYData (SizeType i, const VectorExpression< V > &x_vars, ValueType y) |
Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set. More... | |
template<typename V > | |
void | addXYData (const VectorExpression< V > &x_vars, ValueType y) |
Adds a new data point \( (y, \vec{X}) \) to the current data set. More... | |
MatrixType & | getXMatrix () |
Returns a matrix where each row represents the vector \( \vec{X}_i \) with independent variables of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More... | |
const MatrixType & | getXMatrix () const |
Returns a read-only matrix where each row represents the vector \( \vec{X}_i \) with independent variables of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More... | |
VectorType & | getYValues () |
Returns a vector containing the dependent variables \( y_i \) of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More... | |
const VectorType & | getYValues () const |
Returns a read-only vector containing the dependent variables \( y_i \) of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More... | |
void | buildModel () |
Performs linear least squares regression modeling of the set of currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More... | |
template<typename V > | |
ValueType | calcYValue (const VectorExpression< V > &x_vars) const |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More... | |
template<typename V > | |
ValueType | operator() (const VectorExpression< V > &x_vars) const |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More... | |
const VectorType & | getCoefficients () const |
Returns a read-only vector containing the estimated regression coefficients \( \beta_i \) which were calculated by buildModel(). More... | |
ValueType | getChiSquare () const |
Returns the sum of squared residuals \( \chi^2 \). More... | |
ValueType | getGoodnessOfFit () const |
Returns the goodness of fit \( Q \). More... | |
ValueType | getCorrelationCoefficient () const |
Returns the correlation coefficient \( r \). More... | |
ValueType | getStandardDeviation () const |
Returns the standard deviation of the residuals \( s_r \). More... | |
void | calcStatistics () |
Calculates various statistical parameters describing the built regression model. More... | |
template<typename V > | |
CDPL::Math::MLRModel< T >::ValueType | calcYValue (const VectorExpression< V > &x) const |
template<typename V > | |
CDPL::Math::MLRModel< T >::ValueType | operator() (const VectorExpression< V > &x) const |
Performs Multiple Linear Regression [WLIREG] on a set of data points \( (y_i, \vec{X}_i) \).
For each data point, \( y_i \) is the dependent (response) variable and \( \vec{X}_i \) is a \( M \)-dimensional vector containing the independent (explanatory) variables of the modeled function \( y = f(\vec{X}) \). It is assumed that the relationship between the dependent variables \( y_i \) and the independent variables \( \vec{X}_i \) can be modeled by a linear function of \( M \) parameters \( \beta_i, \, i = 1, 2, \ldots, M \) (regression coefficients) plus an error term \( \epsilon_i \):
\[ y_i = \beta_1 x_{i1} + \beta_2 x_{i2} + \ldots + \beta_M x_{iM} + \epsilon_i \]
The parameters \( \beta_i \) are estimated by Least Squares Analysis [WLSQRS] which minimizes the sum of squared residuals \( \chi^2 \)
\[ \chi^2 = \sum_{i=1}^{N} (y_i - f(\vec{X}_i, \vec{\beta}))^2 \]
of the given set of \( N \) data points with respect to the adjustable parameters \( \vec{\beta} \). The parameters \( \beta_i \) are computed using Singular Value Decomposition [WSVD] as implemented in [NRIC]. This method is computationally intensive, but is particularly useful if the \( X \) matrix is ill-conditioned.
T | The value type used in calculations and for storage of data points. |
typedef CommonType<typename Vector<T>::SizeType, typename Matrix<T>::SizeType>::Type CDPL::Math::MLRModel< T >::SizeType |
typedef T CDPL::Math::MLRModel< T >::ValueType |
typedef Matrix<T> CDPL::Math::MLRModel< T >::MatrixType |
typedef Vector<T> CDPL::Math::MLRModel< T >::VectorType |
|
inline |
Constructs and initializes a regression model with an empty data set.
void CDPL::Math::MLRModel< T >::resizeDataSet | ( | SizeType | num_points, |
SizeType | num_vars | ||
) |
Resizes the data set to hold num_points data points with num_vars independent variables.
num_points | The number of data points. |
num_vars | The number of independent variables per data point. |
void CDPL::Math::MLRModel< T >::clearDataSet |
Clears the data set.
Equivalent to calling resizeDataSet() with both arguments beeing zero.
void CDPL::Math::MLRModel< T >::setXYData | ( | SizeType | i, |
const VectorExpression< V > & | x_vars, | ||
ValueType | y | ||
) |
Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set.
If i is larger or equal to the number of currently stored data points or if the number of independent variables provided by x_vars is larger than the maximum number so far, the data set will be resized accordingly. Emerging space between the data points so far and the new data point is filled up with zeros. If the number of independent variables provided by x_vars is smaller than the number of variables in the current data set, the missing independent variables are assumed to be zero.
i | The zero-based index of the data point in the data set. |
x_vars | The vector \( \vec{X}_i \) with independent variables. |
y | The dependent variable \( y_i \). |
void CDPL::Math::MLRModel< T >::addXYData | ( | const VectorExpression< V > & | x_vars, |
ValueType | y | ||
) |
Adds a new data point \( (y, \vec{X}) \) to the current data set.
If the number of independent variables provided by x_vars is larger than the number in the current data set, the data set is resized accordingly and any emerging space will be filled with zeros. If the number of independent variables provided by x_vars is smaller than the number in the current data set, the missing independent variables are assumed to be zero.
x_vars | The vector \( \vec{X} \) with independent variables. |
y | The dependent variable \( y \). |
addXYData
(). CDPL::Math::Matrix< typename CDPL::Math::MLRModel< T >::ValueType > & CDPL::Math::MLRModel< T >::getXMatrix |
Returns a matrix where each row represents the vector \( \vec{X}_i \) with independent variables of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).
const
reference to the matrix with the independent variables \( \vec{X}_i \). const CDPL::Math::Matrix< typename CDPL::Math::MLRModel< T >::ValueType > & CDPL::Math::MLRModel< T >::getXMatrix |
Returns a read-only matrix where each row represents the vector \( \vec{X}_i \) with independent variables of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).
const
reference to the matrix with the independent variables \( \vec{X}_i \). CDPL::Math::Vector< typename CDPL::Math::MLRModel< T >::ValueType > & CDPL::Math::MLRModel< T >::getYValues |
Returns a vector containing the dependent variables \( y_i \) of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).
const
reference to the vector with the dependent variables \( y_i \). const CDPL::Math::Vector< typename CDPL::Math::MLRModel< T >::ValueType > & CDPL::Math::MLRModel< T >::getYValues |
Returns a read-only vector containing the dependent variables \( y_i \) of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).
const
reference to the vector with the dependent variables \( y_i \). void CDPL::Math::MLRModel< T >::buildModel |
Performs linear least squares regression modeling of the set of currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).
Base::CalculationFailed | if the data set is empty or the singular value decomposition of the \( X \) matrix failed. |
ValueType CDPL::Math::MLRModel< T >::calcYValue | ( | const VectorExpression< V > & | x_vars | ) | const |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.
x_vars | The vector \( \vec{X} \) of independent variables. |
Base::CalculationFailed | if the number of regression coefficients \( \beta_i \) does not match the size of x_vars. |
ValueType CDPL::Math::MLRModel< T >::operator() | ( | const VectorExpression< V > & | x_vars | ) | const |
Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.
Equivalent to calling calcYValue().
x_vars | The vector \( \vec{X} \) of independent variables. |
Base::CalculationFailed | if the number of regression coefficients \( \beta_i \) does not match the size of x_vars. |
const CDPL::Math::Vector< typename CDPL::Math::MLRModel< T >::ValueType > & CDPL::Math::MLRModel< T >::getCoefficients |
Returns a read-only vector containing the estimated regression coefficients \( \beta_i \) which were calculated by buildModel().
const
reference to the vector with the estimated regression coefficients \( \beta_i \). CDPL::Math::MLRModel< T >::ValueType CDPL::Math::MLRModel< T >::getChiSquare |
Returns the sum of squared residuals \( \chi^2 \).
\( \chi^2 \) is calculated by:
\[ \chi^2 = \sum_{i=1}^{N} (y_i - \sum_{j=1}^{M}(x_{ij} \beta_j))^2 \]
CDPL::Math::MLRModel< T >::ValueType CDPL::Math::MLRModel< T >::getGoodnessOfFit |
Returns the goodness of fit \( Q \).
The goodness of fit \( Q \) is given by:
\[ Q = gammaq(\frac{N - 2}{2}, \frac{\chi^2}{2}) \]
where \( gammaq \) is the incomplete gamma function (see [NRIC] for details).
CDPL::Math::MLRModel< T >::ValueType CDPL::Math::MLRModel< T >::getCorrelationCoefficient |
Returns the correlation coefficient \( r \).
The correlation coefficient \( r \) is calculated by:
\[ r = \frac{\sum_{i=1}^{N} (\hat{y}_i - \bar{\hat{y}})(y_i - \bar{y})} {\sqrt{\sum_{i=1}^{N} (\hat{y}_i - \bar{\hat{y}})^2 \sum_{i=1}^{N} (y_i - \bar{y})^2 }} \]
where
\begin{eqnarray*} \hat{y}_i &=& \sum_{j=1}^{M}(x_{ij} \beta_j) \\ \bar{\hat{y}} &=& \frac{\sum_{i=1}^{N} \hat{y}_i}{N} \\ \bar{y} &=& \frac{\sum_{i=1}^{N} y_i}{N} \end{eqnarray*}
CDPL::Math::MLRModel< T >::ValueType CDPL::Math::MLRModel< T >::getStandardDeviation |
Returns the standard deviation of the residuals \( s_r \).
The standard deviation \( s_r \) is calculated by:
\[ s_r = \sqrt{\frac{\sum_{i=1}^{N} (y_i - \sum_{j=1}^{M} (x_{ij} \beta_j))^2} {N - M}} \]
void CDPL::Math::MLRModel< T >::calcStatistics |
Calculates various statistical parameters describing the built regression model.
Base::CalculationFailed | if the data set is in an inconsistent state (e.g. the number of estimated regression coefficients does not match the number of independent variables that make up the data points). |
CDPL::Math::MLRModel<T>::ValueType CDPL::Math::MLRModel< T >::calcYValue | ( | const VectorExpression< V > & | x | ) | const |
CDPL::Math::MLRModel<T>::ValueType CDPL::Math::MLRModel< T >::operator() | ( | const VectorExpression< V > & | x | ) | const |