Chemical Data Processing Library C++ API - Version 1.2.1
Public Types | Public Member Functions | List of all members
CDPL::Math::MLRModel< T > Class Template Reference

Performs Multiple Linear Regression [WLIREG] on a set of data points \( (y_i, \vec{X}_i) \). More...

#include <MLRModel.hpp>

Public Types

typedef CommonType< typename Vector< T >::SizeType, typename Matrix< T >::SizeType >::Type SizeType
 
typedef T ValueType
 
typedef Matrix< T > MatrixType
 
typedef Vector< T > VectorType
 

Public Member Functions

 MLRModel ()
 Constructs and initializes a regression model with an empty data set. More...
 
void resizeDataSet (SizeType num_points, SizeType num_vars)
 Resizes the data set to hold num_points data points with num_vars independent variables. More...
 
void clearDataSet ()
 Clears the data set. More...
 
template<typename V >
void setXYData (SizeType i, const VectorExpression< V > &x_vars, ValueType y)
 Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set. More...
 
template<typename V >
void addXYData (const VectorExpression< V > &x_vars, ValueType y)
 Adds a new data point \( (y, \vec{X}) \) to the current data set. More...
 
MatrixTypegetXMatrix ()
 Returns a matrix where each row represents the vector \( \vec{X}_i \) with independent variables of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More...
 
const MatrixTypegetXMatrix () const
 Returns a read-only matrix where each row represents the vector \( \vec{X}_i \) with independent variables of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More...
 
VectorTypegetYValues ()
 Returns a vector containing the dependent variables \( y_i \) of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More...
 
const VectorTypegetYValues () const
 Returns a read-only vector containing the dependent variables \( y_i \) of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More...
 
void buildModel ()
 Performs linear least squares regression modeling of the set of currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \). More...
 
template<typename V >
ValueType calcYValue (const VectorExpression< V > &x_vars) const
 Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More...
 
template<typename V >
ValueType operator() (const VectorExpression< V > &x_vars) const
 Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars. More...
 
const VectorTypegetCoefficients () const
 Returns a read-only vector containing the estimated regression coefficients \( \beta_i \) which were calculated by buildModel(). More...
 
ValueType getChiSquare () const
 Returns the sum of squared residuals \( \chi^2 \). More...
 
ValueType getGoodnessOfFit () const
 Returns the goodness of fit \( Q \). More...
 
ValueType getCorrelationCoefficient () const
 Returns the correlation coefficient \( r \). More...
 
ValueType getStandardDeviation () const
 Returns the standard deviation of the residuals \( s_r \). More...
 
void calcStatistics ()
 Calculates various statistical parameters describing the built regression model. More...
 
template<typename V >
CDPL::Math::MLRModel< T >::ValueType calcYValue (const VectorExpression< V > &x) const
 
template<typename V >
CDPL::Math::MLRModel< T >::ValueType operator() (const VectorExpression< V > &x) const
 

Detailed Description

template<typename T = double>
class CDPL::Math::MLRModel< T >

Performs Multiple Linear Regression [WLIREG] on a set of data points \( (y_i, \vec{X}_i) \).

For each data point, \( y_i \) is the dependent (response) variable and \( \vec{X}_i \) is a \( M \)-dimensional vector containing the independent (explanatory) variables of the modeled function \( y = f(\vec{X}) \). It is assumed that the relationship between the dependent variables \( y_i \) and the independent variables \( \vec{X}_i \) can be modeled by a linear function of \( M \) parameters \( \beta_i, \, i = 1, 2, \ldots, M \) (regression coefficients) plus an error term \( \epsilon_i \):

\[ y_i = \beta_1 x_{i1} + \beta_2 x_{i2} + \ldots + \beta_M x_{iM} + \epsilon_i \]

The parameters \( \beta_i \) are estimated by Least Squares Analysis [WLSQRS] which minimizes the sum of squared residuals \( \chi^2 \)

\[ \chi^2 = \sum_{i=1}^{N} (y_i - f(\vec{X}_i, \vec{\beta}))^2 \]

of the given set of \( N \) data points with respect to the adjustable parameters \( \vec{\beta} \). The parameters \( \beta_i \) are computed using Singular Value Decomposition [WSVD] as implemented in [NRIC]. This method is computationally intensive, but is particularly useful if the \( X \) matrix is ill-conditioned.

Template Parameters
TThe value type used in calculations and for storage of data points.

Member Typedef Documentation

◆ SizeType

template<typename T = double>
typedef CommonType<typename Vector<T>::SizeType, typename Matrix<T>::SizeType>::Type CDPL::Math::MLRModel< T >::SizeType

◆ ValueType

template<typename T = double>
typedef T CDPL::Math::MLRModel< T >::ValueType

◆ MatrixType

template<typename T = double>
typedef Matrix<T> CDPL::Math::MLRModel< T >::MatrixType

◆ VectorType

template<typename T = double>
typedef Vector<T> CDPL::Math::MLRModel< T >::VectorType

Constructor & Destructor Documentation

◆ MLRModel()

template<typename T = double>
CDPL::Math::MLRModel< T >::MLRModel ( )
inline

Constructs and initializes a regression model with an empty data set.

Member Function Documentation

◆ resizeDataSet()

template<typename T >
void CDPL::Math::MLRModel< T >::resizeDataSet ( SizeType  num_points,
SizeType  num_vars 
)

Resizes the data set to hold num_points data points with num_vars independent variables.

Parameters
num_pointsThe number of data points.
num_varsThe number of independent variables per data point.

◆ clearDataSet()

template<typename T >
void CDPL::Math::MLRModel< T >::clearDataSet

Clears the data set.

Equivalent to calling resizeDataSet() with both arguments beeing zero.

◆ setXYData()

template<typename T >
template<typename V >
void CDPL::Math::MLRModel< T >::setXYData ( SizeType  i,
const VectorExpression< V > &  x_vars,
ValueType  y 
)

Sets the i-th data point \( (y_i, \vec{X}_i) \) of the data set.

If i is larger or equal to the number of currently stored data points or if the number of independent variables provided by x_vars is larger than the maximum number so far, the data set will be resized accordingly. Emerging space between the data points so far and the new data point is filled up with zeros. If the number of independent variables provided by x_vars is smaller than the number of variables in the current data set, the missing independent variables are assumed to be zero.

Parameters
iThe zero-based index of the data point in the data set.
x_varsThe vector \( \vec{X}_i \) with independent variables.
yThe dependent variable \( y_i \).

◆ addXYData()

template<typename T >
template<typename V >
void CDPL::Math::MLRModel< T >::addXYData ( const VectorExpression< V > &  x_vars,
ValueType  y 
)

Adds a new data point \( (y, \vec{X}) \) to the current data set.

If the number of independent variables provided by x_vars is larger than the number in the current data set, the data set is resized accordingly and any emerging space will be filled with zeros. If the number of independent variables provided by x_vars is smaller than the number in the current data set, the missing independent variables are assumed to be zero.

Parameters
x_varsThe vector \( \vec{X} \) with independent variables.
yThe dependent variable \( y \).
Note
If the final size of the data set is known in advance, a call to resizeDataSet() followed by calls to setXYData() for each data point is more efficient than a build-up of the data set by repeatedly calling addXYData().

◆ getXMatrix() [1/2]

template<typename T >
CDPL::Math::Matrix< typename CDPL::Math::MLRModel< T >::ValueType > & CDPL::Math::MLRModel< T >::getXMatrix

Returns a matrix where each row represents the vector \( \vec{X}_i \) with independent variables of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).

Returns
A non-const reference to the matrix with the independent variables \( \vec{X}_i \).

◆ getXMatrix() [2/2]

template<typename T >
const CDPL::Math::Matrix< typename CDPL::Math::MLRModel< T >::ValueType > & CDPL::Math::MLRModel< T >::getXMatrix

Returns a read-only matrix where each row represents the vector \( \vec{X}_i \) with independent variables of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).

Returns
A const reference to the matrix with the independent variables \( \vec{X}_i \).

◆ getYValues() [1/2]

template<typename T >
CDPL::Math::Vector< typename CDPL::Math::MLRModel< T >::ValueType > & CDPL::Math::MLRModel< T >::getYValues

Returns a vector containing the dependent variables \( y_i \) of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).

Returns
A non-const reference to the vector with the dependent variables \( y_i \).

◆ getYValues() [2/2]

template<typename T >
const CDPL::Math::Vector< typename CDPL::Math::MLRModel< T >::ValueType > & CDPL::Math::MLRModel< T >::getYValues

Returns a read-only vector containing the dependent variables \( y_i \) of the currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).

Returns
A const reference to the vector with the dependent variables \( y_i \).

◆ buildModel()

template<typename T >
void CDPL::Math::MLRModel< T >::buildModel

Performs linear least squares regression modeling of the set of currently stored data points \( (y_i, \vec{X}_i), \, i = 1, 2, \ldots, N \).

Exceptions
Base::CalculationFailedif the data set is empty or the singular value decomposition of the \( X \) matrix failed.

◆ calcYValue() [1/2]

template<typename T = double>
template<typename V >
ValueType CDPL::Math::MLRModel< T >::calcYValue ( const VectorExpression< V > &  x_vars) const

Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.

Parameters
x_varsThe vector \( \vec{X} \) of independent variables.
Returns
The predicted value for \( y \).
Exceptions
Base::CalculationFailedif the number of regression coefficients \( \beta_i \) does not match the size of x_vars.

◆ operator()() [1/2]

template<typename T = double>
template<typename V >
ValueType CDPL::Math::MLRModel< T >::operator() ( const VectorExpression< V > &  x_vars) const

Predicts the value of the dependent variable \( y \) for a vector \( \vec{X} \) of independent variables given by x_vars.

Equivalent to calling calcYValue().

Parameters
x_varsThe vector \( \vec{X} \) of independent variables.
Returns
The predicted value for \( y \).
Exceptions
Base::CalculationFailedif the number of regression coefficients \( \beta_i \) does not match the size of x_vars.

◆ getCoefficients()

template<typename T >
const CDPL::Math::Vector< typename CDPL::Math::MLRModel< T >::ValueType > & CDPL::Math::MLRModel< T >::getCoefficients

Returns a read-only vector containing the estimated regression coefficients \( \beta_i \) which were calculated by buildModel().

Returns
A const reference to the vector with the estimated regression coefficients \( \beta_i \).

◆ getChiSquare()

template<typename T >
CDPL::Math::MLRModel< T >::ValueType CDPL::Math::MLRModel< T >::getChiSquare

Returns the sum of squared residuals \( \chi^2 \).

\( \chi^2 \) is calculated by:

\[ \chi^2 = \sum_{i=1}^{N} (y_i - \sum_{j=1}^{M}(x_{ij} \beta_j))^2 \]

Returns
The sum of squared residuals \( \chi^2 \).
Note
The returned value is only valid if calcStatistics() has been called before.

◆ getGoodnessOfFit()

template<typename T >
CDPL::Math::MLRModel< T >::ValueType CDPL::Math::MLRModel< T >::getGoodnessOfFit

Returns the goodness of fit \( Q \).

The goodness of fit \( Q \) is given by:

\[ Q = gammaq(\frac{N - 2}{2}, \frac{\chi^2}{2}) \]

where \( gammaq \) is the incomplete gamma function (see [NRIC] for details).

Returns
The goodness of fit \( Q \).
Note
The returned value is only valid if calcStatistics() has been called before.
See also
Math::gammaQ()

◆ getCorrelationCoefficient()

template<typename T >
CDPL::Math::MLRModel< T >::ValueType CDPL::Math::MLRModel< T >::getCorrelationCoefficient

Returns the correlation coefficient \( r \).

The correlation coefficient \( r \) is calculated by:

\[ r = \frac{\sum_{i=1}^{N} (\hat{y}_i - \bar{\hat{y}})(y_i - \bar{y})} {\sqrt{\sum_{i=1}^{N} (\hat{y}_i - \bar{\hat{y}})^2 \sum_{i=1}^{N} (y_i - \bar{y})^2 }} \]

where

\begin{eqnarray*} \hat{y}_i &=& \sum_{j=1}^{M}(x_{ij} \beta_j) \\ \bar{\hat{y}} &=& \frac{\sum_{i=1}^{N} \hat{y}_i}{N} \\ \bar{y} &=& \frac{\sum_{i=1}^{N} y_i}{N} \end{eqnarray*}

Returns
The correlation coefficient \( r \).
Note
The returned value is only valid if calcStatistics() has been called before.

◆ getStandardDeviation()

template<typename T >
CDPL::Math::MLRModel< T >::ValueType CDPL::Math::MLRModel< T >::getStandardDeviation

Returns the standard deviation of the residuals \( s_r \).

The standard deviation \( s_r \) is calculated by:

\[ s_r = \sqrt{\frac{\sum_{i=1}^{N} (y_i - \sum_{j=1}^{M} (x_{ij} \beta_j))^2} {N - M}} \]

Returns
The standard deviation of the residuals \( s_r \).
Note
\( s_r \) is only definded if \( N > M \) and calcStatistics() has been called before.

◆ calcStatistics()

template<typename T >
void CDPL::Math::MLRModel< T >::calcStatistics

Calculates various statistical parameters describing the built regression model.

Exceptions
Base::CalculationFailedif the data set is in an inconsistent state (e.g. the number of estimated regression coefficients does not match the number of independent variables that make up the data points).
See also
buildModel(), getChiSquare(), getGoodnessOfFit(), getCorrelationCoefficient(), getStandardDeviation()

◆ calcYValue() [2/2]

template<typename T = double>
template<typename V >
CDPL::Math::MLRModel<T>::ValueType CDPL::Math::MLRModel< T >::calcYValue ( const VectorExpression< V > &  x) const

◆ operator()() [2/2]

template<typename T = double>
template<typename V >
CDPL::Math::MLRModel<T>::ValueType CDPL::Math::MLRModel< T >::operator() ( const VectorExpression< V > &  x) const

The documentation for this class was generated from the following file: