

Students T Processes

Student T Processes (STP) can be viewed as a generalization of Gaussian Processes, in GP models we use the multivariate normal distribution to model noisy observations of an unknown function. Likewise for STP models, we employ the multivariate student t distribution. Formally a student t process is a stochastic process where the finite dimensional distribution is multivariate t.

$\begin{align} \mathbf{y} & \in \mathbb{R}^n \\ \mathbf{y} & \sim MVT_{n}(\nu, \phi, K) \\ p(\mathbf{y}) & = \frac{\Gamma(\frac{\nu + n}{2})}{(\nu \pi)^{n/2} \Gamma(\nu/2)} |K|^{-1/2} \\ & \times (1 + \frac{(\mathbf{y} - \phi)^T K^{-1} (\mathbf{y} - \phi)}{\nu})^{-\frac{\nu + n}{2}} \end{align}$

It is known that as $\nu \rightarrow \infty$ , the $MVT_{n}(\nu, \phi, K)$ tends towards the multivariate normal distribution $\mathcal{N}_{n}(\phi, K)$ .

Regression with Student T Processes¶

The regression formulation for STP models is identical to the GP regression framework, to summarize the posterior predictive distribution takes the following form.

Suppose $\mathbf{t} \sim MVT_{n_{tr} + n_t}(\nu, \mathbf{0}, K)$ is the process producing the data. Let $[\mathbf{f_*}]_{n_{t} \times 1}$ represent the values of the function on the test inputs and $[\mathbf{y}]_{n_{tr} \times 1}$ represent noisy observations made on the training data points.

$\begin{align} & \mathbf{f_*}|X,\mathbf{y},X_* \sim MVT_{\nu + n_{tr}}(\mathbf{\bar{f_*}}, \frac{\nu + \beta - 2}{\nu + n_{tr} - 2} \times cov(\mathbf{f_*})) \label{eq:posterior}\\ & \beta = \mathbf{y}^T K^{-1} \mathbf{y} \\ & \mathbf{\bar{f_*}} \overset{\triangle}{=} \mathbb{E}[\mathbf{f_*}|X,y,X_*] = K(X_*,X)[K(X,X) + \sigma^{2}_n \it{I}]^{-1} \mathbf{y} \label{eq:posterior:mean} \\ & cov(\mathbf{f_*}) = K(X_*,X_*) - K(X_*,X)[K(X,X) + \sigma^{2}_n \it{I}]^{-1}K(X,X_*) \end{align}$

STP models for a single output¶

For univariate GP models (single output), use the StudentTRegressionModel class (an extension of AbstractSTPRegressionModel). To construct a STP regression model you would need:

The degrees of freedom $\nu$
Kernel/covariance instance to model correlation between values of the latent function at each pair of input features.
Kernel instance to model the correlation of the additive noise, generally the DiracKernel (white noise) is used.
Training data

val trainingdata: Stream[(DenseVector[Double], Double)] = ...
val num_features = trainingdata.head._1.length

// Create an implicit vector field for the creation of the stationary
// radial basis function kernel
implicit val field = VectorField(num_features)

val kernel = new RBFKernel(2.5)
val noiseKernel = new DiracKernel(1.5)
val model = new StudentTRegression(1.5, kernel, noiseKernel, trainingData)

STP models for Multiple Outputs¶

You can use the MOStudentTRegression[I] class to create multi-output GP models.

val trainingdata: Stream[(DenseVector[Double], DenseVector[Double])] = ...

val model = new MOStudentTRegression[DenseVector[Double]](
    sos_kernel, sos_noise, trainingdata,
    trainingdata.length, trainingdata.head._2.length)

Tip

Working with multi-output Student T models is similar to multi-output GP models. We need to create a kernel function over the combined index set (DenseVector[Double], Int). This can be done using the sum of separable kernel idea.

val linearK = new PolynomialKernel(2, 1.0)
val tKernel = new TStudentKernel(0.2)
val d = new DiracKernel(0.037)

val mixedEffects = new MixedEffectRegularizer(0.5)
val coRegCauchyMatrix = new CoRegCauchyKernel(10.0)
val coRegDiracMatrix = new CoRegDiracKernel

val sos_kernel: CompositeCovariance[(DenseVector[Double], Int)] =
    (linearK :* mixedEffects)  + (tKernel :* coRegCauchyMatrix)

val sos_noise: CompositeCovariance[(DenseVector[Double], Int)] =
    d :* coRegDiracMatrix

Students T Processes

Regression with Student T Processes¶

STP models for a single output¶

STP models for Multiple Outputs¶

Comments