.. _gpr:
Gaussian Process Regression
===========================
.. module:: best.gp
:synopsis: Everything that is related to Gaussian Process Regression.
:note: The documentation of :mod:`best.gp` is incomplete at the moment \
subject to change in the near future. The package includes lots \
research details that are not fully documented here. The user \
is adviced to use only what is discussed in this section and \
look at the code itself in case he wants to do something more \
excited. The code is at the moment not in a compete state.
.. _mgpr:
Multi-output Separable Gaussian Process Regression
--------------------------------------------------
At the moment, :mod:`best` does not offer a class for performing simple
Gaussian process regression (GPR) with 1D outputs.
Instead it offers a way to do
multi-output GP regression with a separable covariance function. The
simple case is a subcase of this more general case. We will not get
into the mathematical details of GPR. The user is adviced to consult:
* `Gaussian Processes for Machine Learning `_
* `I. Bilionis, N. Zabaras, B. A. Konomi, and G. Lin. Multi-output separable Gaussian process: Towards an efficient, fully Bayesian paradigm for uncertainty quantification. Journal of Computational Physics, 241:212-239, 2013 `_
The main class for performing GPR is
:class:`MultioutputGaussianProcess`. It can be trained either via
:ref:`mcmc` or :ref:`smc`. However, knowledge of these topics is not
required at this point since the object can be trained by itself.
The current version uses internally a :ref:`cov-se` with nuggets.
In future versions, we will let the user supply the :ref:`cov`.
Both the length scales of :ref:`cov-se` and the nuggets have
Exponential prior distributions with parameters that can be specified.
The mean of the model is assumed to be a :ref:`glm`. The user
must provide the design matrix.
.. class:: best.gp.MultioutputGaussianProcess
:inherits: :class:`best.random.MarkovChainMonteCarlo`
.. method:: __init__([mgp=None,[ name='MultioutputGaussianProcess']])
Initialize the object.
:param mgp: If supplied then the object copies all the data \
from here.
:type mgp: :class:`best.gp.MultioutputGaussianProcess`
.. set_data(X, H, Y)
Set the observed data.
:note: To see how the data should be organized for the separable \
case, consult our paper.
:param X: The input points. This must be a tuple of points \
observed for each separable dimension. E.g., if \
we have three separable dimensions, we must provide \
``(X1, X2, X3)``.
:type X: (tuple of) 2D numpy array(s)
:param H: The design matrix.
:type H: (tuple of) 2D numpy array(s)
:param Y: The obverse outputs.
:type Y: 2D numpy array
.. method:: initialize(hyp[, eval=None])
Inialize the object.
:param hyp: The initial hyper-parameters. It must be a 1D \
numpy array ordered so that the first k elements \
correspond to the length scales and the last s to \
the nuggets. The length scales are ordered so that \
the first k[0] correspond to the first group of \
input variables, the following k[1] to the \
second and so on.
:type hyp: 1D numpy array
:param eval_state: A dictionary that contains all the data \
required to start the MCMC algorithm from \
the specified hyper-parameters. If not given \
then these data are initialized from scratch. \
The correct format of ``eval_state`` is the \
one returned by \
:func:`best.gp.MultioutputGaussianProcess.sample()`. \
So, do not try to improvise.
.. method:: sample([x=None[, eval_state=None[, \
return_val_state=False[, steps=1]]])
Take samples from the posterior of the hyper-parameters.
:param x: The initial state. If not specified, we attemp to use \
the previous state processed by this class.
:param eval_state: A dictionary containing the all the data \
required to initialize the object. Such a \
state is actually returned by this \
function if the option ``return_eval_sate`` \
is set to ``True``. If not specified, then \
everything is calculated from scratch.
:param return_eval_state: If specified, then the routine returns \
the ``evaluated_state`` of the sampler, \
which may be used to restart the MCMC \
sampling.
:returns: The current state of the MCMC (the hyper-parameters) \
and (optionally if ``return_eval_state``) is set \
all data required to continue the algorithm.
.. method:: __call__(self, X, H[, Y=None[, C=None[, \
compute_covariance=False]]])
Evaluate the prediction at a given set of points.
The result of this function, is basically the predictive
distribution, encoded in terms of the mean ``Y`` and the
covariance matrix ``C``.
:param X: The input points.
:param H: The design matrices.
:param Y: An array to store the mean. If ``None``, then it is \
returned.
:param C: An array to store the covariance. If ``None``, then
the \
covariance is not computed or it is returned as \
specified by the ``compute_covariance`` option.
:param compute_covariance: If ``C`` is ``None``, and the flag \
is set to ``True``, then the \
covariance is calculated and \
returned. If ``C`` is not ``None``, \
then it is ignored.
.. method:: sample_prediction(self, X, H[, Y=None[, C=None]])
Sample from the predictive distribution of the model.
:param X: The input points.
:param H: The design matrices.
:param Y: An array to store the mean. If ``None``, then it is \
returned.
:param C: An optional array that will store the covariance \
matrix. If not supplied, it will be allocated. \
On the output, the incomplete Cholesky decomposition \
is written on ``C``.
:returns: If ``Y`` is None, then the sample will be returned. \
The trace of the covariance normalized by the number \
of spatial/time inputs and the outputs. This is a \
measure associated with the uncertainty of the given \
input point.
.. method:: add_data(self, X0, H0, Y0):
Add more observations to the data set.
The routine currently only adds observations pertaining to the \
first component. Addition to the other components would ruin \
the Kronecker properties of the matrices.
:param X0: The input variables.
:param H0: The design matrix.
:param Y0: The observations.
.. method:: sample_surrogate(self, X_design, H_design[, \
rel_tol=0.1[, abs_tol=1e-3]])
Sample a surrogate surface.
Samples a surrogate surface that can be evaluated analytically. The
procedure adds the design point with the maximum uncertainty defined
by Eq. (19) of the paper and assuming a uniform input distribution
until:
+ we run of design points,
+ or the uncertainty satisfies a stopping criterion.
The global uncertainty is defined to be the average uncertainty of
all design points. The stopping criterion is implemented as follows:
STOP if ``global uncertainty < rel_tol * init_unc or < abs_tol``,
where init_unc is the initial uncertainty and rel_tol is a relative
reduction and ``abs_tol`` is the absolute uncertainty we are willing to
accept.
:param X_design: The design points to be used. This
should be as dense as is computationally
feasible.
:param rel_tol: We stop if the current uncertainty
is rel_tol times the initial uncertainty.
:param abs_tol: We stop if the current uncertainty is
less than abs_tol.
.. method:: evaluate_sparse(self, X, H[, compute_covariance=False[, \
sp_tol=0.1]])
Evaluate the prediction at a given set of points.
Same as
:func:`best.gp.MultioutputGaussianProcess.__call__()`
but we attemp to use sparse matrices.
.. attribute:: sample_g
Set/See if the nuggets are going to be sampled.
.. attribute:: sample_r
Set/See if the length scales are going to be sampled.
.. attribute:: log_like
The logarithm of the likelihood of the current state.
.. attribute:: cov
Get the covariance function.
.. attribute:: num_mcmc
The number of MCMC steps per Gibbs setp.
.. attribute:: gamma
Get the prior parameters for the length scales.
.. attribute:: delta
Get the prior parameters for the nuggets.
.. attribute:: sigma_r
Get the proposal step for the length scales.
.. attribute:: sigma_g
Get the proposal step for the nuggets.
.. attribute:: g
Get the current nuggets.
.. attribute:: r
Get the current length scales.
.. attribute:: Sigma
Get the output-correlation matrix.
.. attribute:: log_post_lk
Get the logarithm of the posterior likelihood.
.. attribute:: acceptance_rate
Get the MCMC acceptance rate.
A simple 1D example
-------------------
Typically, you would like to pick the hyper-parameters, observe the
convergence of :ref:`mcmc` or even use :ref:`smc` to train the model.
However, here is the simplest possible case we could run that works
just fine with the default parameters::
import numpy as np
import matplotlib.pyplot as plt
from best.gp import MultioutputGaussianProcess
# Number of observations
num_obs = 20
# The noise we will add to the data (std)
noise = 1e-6
# Draw the observed input points randomly
X = -10. + 20. * np.random.rand(num_obs)
X = np.atleast_2d(X).T
# Draw the observations
Y = np.sin(X) / (X + 1e-6) + noise * np.random.randn(*X.shape)
# Construct the design matrix
H = np.ones(X.shape)
# Construct an MGP object
gp = MultioutputGaussianProcess()
gp.set_data(X, H, Y)
# Pick the hyper-parameters (length scales, nuggets)
hyp = np.array([1., 1e-6])
gp.initialize(hyp)
# Run 2000 MCMC steps
gp.sample(steps=2000)
# Get a function object (subject to change in the future)
f = gp
plt.plot(X, Y, '+', markersize=10)
x = np.linspace(-10, 10, 100)
x = np.atleast_2d(x).T
h = np.ones(x.shape)
fx, Cx = f(x, h, compute_covariance=True)
plt.plot(x, fx, 'b', linewidth=2)
plt.plot(x, np.sin(x) / (x + 1e-6), 'r', linewidth=2)
s2 = 2. * np.sqrt(np.diag(Cx)).reshape(fx.shape)
plt.plot(x, fx + s2, 'g')
plt.plot(x, fx - s2, 'g')
plt.show()
You should see something like the following figure:
.. figure:: images/gp_1d.png
:align: center
The crosses are the observed data points. The red line is the
true function from which the data are drawn. The blue line
is the mean of the GPR prediction and the green lines indicated
the 95% confidence intervals.
.. _tgpr:
Treed Gaussian Process Regression
---------------------------------
The class :class:`TreedMultioutputGaussianProcess` implements an
extension of the model we developed in (PAPER REFERENCE). This model
is not trained directly on data, but it requires a
:class:`best.maps.Solver` object. It is used to construct a surrogate
of the solver.
.. class best.gp.TreedMultioutputGaussianProcess
.. method:: __init__(solver[, model=MultioutputGaussianProcess()[, \
mean_model=None[, \
tree=RandomElement(scale_X=True)]]])
Initialize the object.
:param solver: The solver you wish to learn.
:type solver: :class:`best.maps.Solver`
.. method:: initialize()
Initialize the model.
.. method:: train()
Train the model to the solver.
.. __call__(X, H, Y[, V=None])
Evaluate the model at a particular point.
Simple Treed Gaussian Process Regression Example
------------------------------------------------
The following demo can be found in :file:`best/demo/test_treed_gp.py`.
It learns the output of a dynamical system with a discontinuity with
respect to the initial conditions (see :class:`examples.ko.KOSolver`).
It uses active learning (Bayesian Experimental Design) to select
the observed inputs::
if __name__ == '__main__':
import fix_path
from examples.ko import KOSolver
from best.gp import TreedMultioutputGaussianProcess
import numpy as np
import matplotlib.pyplot as plt
if __name__ == '__main__':
# Initialize the solver
solver = KOSolver(k=2, T=[0, 1], n_t=32)
# Initialize the treed GP
tmgp = TreedMultioutputGaussianProcess(solver=solver)
tmgp.num_xi_init = 10
tmgp.num_xi_test = 100
tmgp.num_max = 100
tmgp.num_elm_max = 20
tmgp.verbose = True
tmgp.model.sample_g = True
tmgp.model.num_mcmc = 1
tmgp.model.num_init = 100
# Initialial hyper-parameters
init_hyp = np.array([.1, .1, .1, 1e-1, 1e-1])
tmgp.init_hyp = init_hyp
tmgp.num_mcmc = 100
# Train
tmgp.train()
# Print the tree
print str(tmgp.tree)
# A fine scale solver to test our predictions
fine_solver = KOSolver(k=solver.k_of[0], n_t=50)
# Make predictions
for i in range(10):
xi = np.random.rand(1, solver.k_of[0])
X = [xi] + fine_solver.X_fixed
H = tmgp.mean_model(X)
n = np.prod([x.shape[0] for x in X])
Yp = np.ndarray((n, solver.q), order='F')
Vp = np.ndarray((n, solver.q), order='F')
tmgp(X, H, Yp, Vp)
Y = fine_solver(xi[0, :])
plt.plot(fine_solver.X_fixed[0], Y)
E = 2. * np.sqrt(Vp)
for i in range(solver.q):
plt.errorbar(fine_solver.X_fixed[0], Yp[:, i], yerr=E[:, i])
plt.show()
The plots you will see will look like the following:
.. figure:: images/tgp_ko.png
:align: center
The prediction of the treed Gaussian Process model for the response
of the dynamical system as a function of time with error bars. This
the prediction on a random input sample not used in the training
data. The total number of observations was restricted to 100.