Generalized Linear Model

Let \left\{\phi_i(\cdot)\right\}_{i=1}^m be a set of basis functions (see Basis). We think of a Generalized Linear Model (GLM) is a parametrization of a subspace of the functions \mathbf{f}:\mathbb{R}^d\rightarrow \mathbb{R}^q:

(1)\mathbf{f}(\mathbf{x}; \mathbf{W}) =
\boldsymbol{\phi}(\mathbf{x})^T\mathbf{W},

where \mathbf{W}\in\mathbb{R}^{m\times q} is the weight matrix, and

(2)\boldsymbol{\phi}(\mathbf{x}) =
\left(\phi_1(\mathbf{x}), \dots, \phi_m(\mathbf{x})\right).

Usually, the weights \mathbf{W} are not fixed, but its column is has a multi-variate Gaussian distribution:

(3)\mathbf{W}_j \sim \mathcal{N}_m\left(\mathbf{W}_j |
\mathbf{M}_j, \boldsymbol{\Sigma}\right),

for j=1,\dots,q, where \mathbf{A}_j is the j-th column of the matrix \mathbf{A}, \mathbf{M}_j is the mean of \mathbf{M}_j and semi-positive definite \boldsymbol{\Sigma}\in\mathbb{R}^{m\times m} mean of column j and the covariance matrix, respectively. Notice that we have restricted our attention to covariance matrices independent of the output dimension. This is very restrictive but in practice, there are ways around this problem. Giving a more general definition would make it extremely difficult to store all the required information (we would need a (qm)\times(qm) covariance matrix). In any case, this is the model we use in our RVM paper.

Note

The distribution of the weights is to be thought as the posterior distribution for the weights that occures when you attempt to fit the model to some data.

Allowing for the possibility of some Gaussian noise, the predictive distribution for the output \mathbf{y} at the input point \mathbf{x} is given by:

(4)p(\mathbf{y} | \mathbf{x}) =
\mathcal{N}_q\left(\mathbf{y} | \mathbf{m}(\mathbf{x}),
\boldsymbol{\sigma}^2(\mathbf{x})\mathbf{I}_q\right),

where \mathbf{I}_q is the q-dimensional unit matrix, while the mean and the variance at \mathbf{x} are given by:

(5)\mathbf{m}(\mathbf{x}) = \boldsymbol{\phi}(\mathbf{x})^T
\mathbf{W},\;\;
\boldsymbol{\sigma}^2(\mathbf{x}) = \beta^{-1} +
\boldsymbol{\phi}(\mathbf{x})^T\boldsymbol{\Sigma}
\boldsymbol{\phi}(\mathbf{x}),

with \beta being the noise precision (i.e., the inverse variance).

In BEST, we represent the GLM by a best.maps.GeneralizedLinearModel class which inherits from best.maps.Function. It is essentially a function that evaluates the predictive mean of the model. However, it also offers access to several other useful methods for uncertainty quantification. Here is the definition of best.maps.GeneralizedLinearModel:

class GeneralizedLinearModel
Inherits :best.maps.Function

A class that represents a Generalized Linear Model.

__init__(basis[, weights=None[, sigma_sqrt=None[, beta=None[, name='Generalized Linear Model']]]])

Initialize the object.

Note

Notice that instead of the covariance matrix \boldsymbol{\Sigma}, we initialize the object with its square root. The square root of \boldsymbol{\Sigma} is any matrix \mathbf{R}\in \mathbb{R}^{k\times m} such that:

\boldsymbol{\Sigma} = \mathbf{R}^T\mathbf{R}.

This is usefull, because we allow for a the treatment of a semi-positive definite covariance (i.e., when k < m). It is up to the user to supply the right \mathbf{R} in there.

Parameters:
  • basis (best.maps.Function) – A set of basis functions.
  • weights (2D numpy array of shape m\times q) – The mean weights \mathbf{M}. If None, then it is assumed to be all zeros.
  • sigma_sqrt (2D numpy array of shape k\times q, k\le q) – The square root of the covariance materix. If None, then it is assumed to be all zeros.
  • beta (float) – The noise precision (inverse variance). If unspecified, it is assumed to be a very big number.
  • name (str) – A name for the object.
__call__(x[, hyp=None])
Overloads :best.maps.Function.__call__()

Evaluate the mean of the generalized model at x.

Essentially computed \mathbf{m}(\mathbf{x}).

d(x[, hyp=None])
Overloads :best.maps.Function.d()

Evaluate the Jacobian of the generalized model at x.

This is \nabla \mathbf{m}(\mathbf{x}).

get_predictive_covariance(x)

Evaluate the predictive covariance at x.

Assume that x represents n input points \left\{\mathbf{x}^{(i)})\right\}_{i=1}^n. Then, this method computes the semi-positive definite matrix \mathbf{C}\in\mathbb{R}^n\times\mathbb{R}^n, given by

C_{ij} = \phi_k\left(\mathbf{x}^{(i)}\right)
\Sigma_{kl}
\phi_l\left(\mathbf{x}^{(j)}\right).

get_predictive_variance(x)

Evaluate the predictive variance at x.

This is the diagonal of \mathbf{C} of best.maps.GeneralizedLinearModel.get_predictive_covariance(). However, it is computed without ever building \mathbf{C}.

basis

Get the underlying basis.

weights

Get the weights.

sigma_sqrt

Get the square root of the covariance matrix.

beta

Get the inverse precision.

Previous topic

Generalized Polynomial Chaos

Next topic

Relevance Vector Machine

This Page