v1.4.1
Version 1.4.1 of DynaML, released March 26, 2017, implements a number of new models (Extended Skew GP, student T process, generalized least squares, etc) and features.
Pipes API¶
Additions¶
The pipes API has been vastly extended by creating pipes which encapsulate functions of multiple arguments leading to the following end points.
DataPipe2[A, B, C]: Pipe which takes 2 argumentsDataPipe3[A, B, C, D]: Pipe which takes 3 argumentsDataPipe4[A, B, C, D, E]: Pipe which takes 4 arguments
Furthermore there is now the ability to create pipes which return pipes, something akin to curried functions in functional programming.
MetaPipe: Takes an argument returns aDataPipeMetaPipe21: Takes 2 arguments returns aDataPipeMetaPipe12: Takes an argument returns aDataPipe2
A new kind of Stream data pipe, StreamFlatMapPipe is added to represent data pipelines which can perform flat map like operations on streams.
val mapFunc: (I) => Stream[J] = ...
val streamFMPipe = StreamFlatMapPipe(mapFunc)
- Added Data Pipes API for Apache Spark RDDs.
val num = 20
val numbers = sc.parallelize(1 to num)
val convPipe = RDDPipe((n: Int) => n.toDouble)
val sqPipe = RDDPipe((x: Double) => x*x)
val sqrtPipe = RDDPipe((x: Double) => math.sqrt(x))
val resultPipe = RDDPipe((r: RDD[Double]) => r.reduce(_+_).toInt)
val netPipeline = convPipe > sqPipe > sqrtPipe > resultPipe
netPipeline(numbers)
- Added
UnivariateGaussianScalerclass for gaussian scaling of univariate data.
Core API¶
Additions¶
Package dynaml.models.bayes
This new package will house stochastic prior models, currently there is support for GP and Skew GP priors, to see a starting example see stochasticPriors.sc in the scripts directory of the DynaML source.
Package dynaml.kernels
- Added
evaluateAt(h)(x,y)andgradientAt(h)(x,y); expressingevaluate(x,y)andgradient(x,y)in terms of them - Added
asPipemethod for Covariance Functions - For backwards compatibility users are advised to extend
LocalSVMKernelin their custom Kernel implementations incase they do not want to implement theevaluateAtAPI endpoints. - Added
FeatureMapKernel, representing kernels which can be explicitly decomposed into feature mappings. - Added Matern half integer kernel
GenericMaternKernel[I] - Added
block(S: String*)method to block any hyper-parameters of kernels. - Added
NeuralNetworkKernelandGaussianSpectralKernel. - Added
DecomposableCovariance
import io.github.mandar2812.dynaml.DynaMLPipe._
import io.github.mandar2812.dynaml.kernels._
implicit val ev = VectorField(6)
implicit val sp = breezeDVSplitEncoder(2)
implicit val sumR = sumReducer
val kernel = new LaplacianKernel(1.5)
val other_kernel = new PolynomialKernel(1, 0.05)
val decompKernel = new DecomposableCovariance(kernel, other_kernel)(sp, sumReducer)
val other_kernel1 = new FBMKernel(1.0)
val decompKernel1 = new DecomposableCovariance(decompKernel, other_kernel1)(sp, sumReducer)
val veca = DenseVector.tabulate[Double](8)(math.sin(_))
val vecb = DenseVector.tabulate[Double](8)(math.cos(_))
decompKernel1.evaluate(veca, vecb)
Package dynaml.algebra
Partitioned Matrices/Vectors and the following operations
- Addition, Subtraction
- Matrix, vector multiplication
- LU, Cholesky
A\y,A\Y
Added calculation of quadratic forms, namely:
quadraticFormwhich calculates \mathbf{x}^\intercal A^{-1} \mathbf{x}crossQuadraticFormwhich calculates \mathbf{y}^\intercal A^{-1} \mathbf{x}
Where A is assumed to be a symmetric positive semi-definite matrix
Usage:
import io.github.mandar2812.dynaml.algebra._
val x: DenseVector[Double] = ...
val y: DenseVector[Double] = ...
val a: DenseMatrix[Double] = ...
quadraticForm(a,x)
crossQuadraticForm(y, a, x)
Package dynaml.modelpipe
New package created, moved all inheriting classes of ModelPipe to this package.
Added the following:
GLMPipe2A pipe taking two arguments and returning aGeneralizedLinearModelinstanceGeneralizedLeastSquaresPipe2:GeneralizedLeastSquaresPipe3:
Package dynaml.models
- Added a new Neural Networks API:
NeuralNetandGenericFFNeuralNet, for an example refer toTestNNDelveindynaml-examples. GeneralizedLeastSquaresModel: The GLS model.ESGPModel: The implementation of a skew gaussian process regression model- Warped Gaussian Process models WIP
- Added mean function capability to Gaussian Process and Student T process models.
- Added Apache Spark implementation of Generalized Linear Models; see SparkGLM, SparkLogisticModel, SparkProbitGLM
Package dynaml.probability
MultivariateSkewNormalas specified in Azzalani et. alExtendedMultivariateSkewNormalUESNandMESNrepresenting an alternative formulation of the skew gaussian family from Adcock and Shutes.TruncatedGaussian: Truncated version of the Gaussian distribution.- Matrix Normal Distribution
- Added Expectation operator for
RandomVariableimplementations in theio.github.mandar2812.dynaml.probabilitypackage object. Usage example given below. SkewGaussian,ExtendedSkewGaussian: An breeze implementation of the SkewGaussian and extended Skew-Gaussian distributions respectivelyPushforwardMap,DifferentiableMapadded:PushforwardMapenables creating new random variables with defined density from base random variables.
import io.github.mandar2812.dynaml.analysis._
import io.github.mandar2812.dynaml.probability._
import io.github.mandar2812.dynaml.probability.distributions._
val g = GaussianRV(0.0, 0.25)
val sg = RandomVariable(SkewGaussian(1.0, 0.0, 0.25))
//Define a determinant implementation for the Jacobian type (Double in this case)
implicit val detImpl = identityPipe[Double]
//Defines a homeomorphism y = exp(x) x = log(y)
val h: PushforwardMap[Double, Double, Double] = PushforwardMap(
DataPipe((x: Double) => math.exp(x)),
DifferentiableMap(
(x: Double) => math.log(x),
(x: Double) => 1.0/x)
)
//Creates a log-normal random variable
val p = h->g
//Creates a log-skew-gaussian random variable
val q = h->sg
//Calculate expectation of q
println("E[Q] = "+E(q))
ContinuousMCMC and the underlying sampling implementation in GeneralMetropolisHastings.
- Added implementation of Approximate Bayesian Computation (ABC) in the ApproxBayesComputation class.
//The mean
val center: DenseMatrix[Double] = ...
//Covariance (positive semi-def) matrix among rows
val sigmaRows: DenseMatrix[Double] = ...
//Covariance (positive semi-def) matrix among columns
val sigmaCols: DenseMatrix[Double] = ...
val matD = MatrixNormal(center, sigmaRows, sigmaCols)
//The degrees of freedom (must be > 2.0 for existence of finite moments)
val mu: Double = ...
//The mean
val center: DenseMatrix[Double] = ...
//Covariance (positive semi-def) matrix among rows
val sigmaRows: DenseMatrix[Double] = ...
//Covariance (positive semi-def) matrix among columns
val sigmaCols: DenseMatrix[Double] = ...
val matD = MatrixT(mu, center, sigmaCols, sigmaRows)
Package dynaml.optimization
- Added
ProbGPCommMachinewhich performs grid search or CSA and then instead of selecting a single hyper-parameter configuration calculates a weighted Gaussian Process committee where the weights correspond to probabilities or confidence on each model instance (hyper-parameter configuration).
Package dynaml.utils
//Returns logarithm of multivariate gamma function
val g = mvlgamma(5, 1.5)
Package dynaml.dataformat
- Added support for reading MATLAB
.matfiles in theMATobject.
Improvements/Bug Fixes¶
Package dynaml.probability
- Removed
ProbabilityModeland replaced withJointProbabilitySchemeandBayesJointProbabilityScheme, major refactoring toRandomVariableAPI.
Package dynaml.optimization
- Improved logging of
CoupledSimulatedAnnealing - Refactored
GPMLOptimizertoGradBasedGlobalOptimizer
Package dynaml.utils
- Correction to utils.getStats method used for calculating mean and variance of data sets consisting of DenseVector[Double].
- minMaxScalingTrainTest minMaxScaling of DynaMLPipe using GaussianScaler instead of MinMaxScaler for processing of features.
Package dynaml.kernels
- Fix to
CoRegCauchyKernel: corrected mismatch of hyper-parameter string - Fix to
SVMKernelobjects matrix gradient computation in the case when kernel dimensions are not multiples of block size. - Correction to gradient calculation in RBF kernel family.
- Speed up of kernel gradient computation, kernel and kernel gradient matrices with respect to the model hyper-parameters now calculated in a single pass through the data.