State of DynaML 2016

Summarizes some of the pet projects being tackled in DynaML

The past year has seen DynaML grow by leaps and bounds, this post hopes to give you an update about what has been achieved and a taste for what is to come.

Completed Features

A short tour of the enhancements which were completed.

January to June

  • Released v1.3.x series with the following new additions

Models

  • Regularized Least Squares
  • Logistic and Probit Regression
  • Feed Forward Neural Nets
  • Gaussian Process (GP) classification and NARX based models
  • Least Squares Support Vector Machines (LSSVM) for classification and regression
  • Meta model API, committee models

Optimization Primitives

  • Regularized Least Squares Solvers
  • Gradient Descent
  • Committee model solvers
  • Linear Solvers for LSSVM
  • Laplace approximation for GPs

Miscellaneous

  • Data Pipes API
  • Migration to scala version 2.11.8

  • Started work on release 1.4.x series with initial progress

Improvements

  • Migrated from Maven to Sbt.
  • Set Ammonite as default REPL.

June to December

  • Released v1.4 with the following features.

Models

The following inference models have been added.

  • LSSVM committees.
  • Multi-output, multi-task Gaussian Process models as reviewed in Lawrence et. al.
  • Student T Processes: single and multi output inspired from Shah, Ghahramani et. al
  • Performance improvement to computation of marginal likelihood and posterior predictive distribution in Gaussian Process models.
  • Posterior predictive distribution outputted by the AbstractGPRegression base class is now changed to MultGaussianRV which is added to the dynaml.probability package.

Kernels

  • Added StationaryKernel and LocallyStationaryKernel classes in the kernel APIs, converted RBFKernel, CauchyKernel, RationalQuadraticKernel & LaplacianKernel to subclasses of StationaryKernel

  • Added MLPKernel which implements the maximum likelihood perceptron kernel as shown here.

  • Added co-regionalization kernels which are used in Lawrence et. al to formulate kernels for vector valued functions. In this category the following co-regionalization kernels were implemented.

    • CoRegRBFKernel
    • CoRegCauchyKernel
    • CoRegLaplaceKernel
    • CoRegDiracKernel

    • Improved performance when calculating kernel matrices for composite kernels.

    • Added :* operator to kernels so that one can create separable kernels used in co-regionalization models.

    Optimization

    • Improved performance of CoupledSimulatedAnnealing, enabled use of 4 variants of Coupled Simulated Annealing, adding the ability to set annealing schedule using so called variance control scheme as outlined in de-Souza, Suykens et. al.

    Pipes

    • Added Scaler and ReversibleScaler traits to represent transformations which input and output into the same domain set, these traits are extensions of DataPipe.

    • Added Discrete Wavelet Transform based on the Haar wavelet.

  • Started work on v1.4.1 with the following progress

Linear Algebra API

  • Partitioned Matrices/Vectors and the following operations

    1. Addition, Subtraction
    2. Matrix, vector multiplication
    3. LU, Cholesky
    4. A\y, A\Y

Probability API

  • Added API end points for representing Measurable Functions of random variables.

Model Evaluation

  • Added Matthews Correlation Coefficient calculation to BinaryClassificationMetrics via the matthewsCCByThreshold method

Data Pipes API

  • Added Encoder[S,D] traits which are reversible data pipes representing an encoding between types S and D.

Miscellaneous

  1. Updated ammonite version to 0.8.1
  2. Added support for compiling basic R code with renjin. Run R code in the following manner:
val toRDF = csvToRDF("dfWine", ';')
val wine_quality_red = toRDF("data/winequality-red.csv")
//Descriptive statistics
val commands: String = """
print(summary(dfWine))
print("\n")
print(str(dfWine))
"""
r(commands)
//Build Linear Model
val modelGLM = rdfToGLM("model", "quality", Array("fixed.acidity", "citric.acid", "chlorides"))
modelGLM("dfWine")
//Print goodness of fit
r("print(summary(model))")

Ongoing Work

Some projects being worked on right now are.

  • Bayesian optimization using Gaussian Process models.
  • Implementation of Neural Networks using the akka actor API.
  • Implementation of kernels which can be decomposed on data dimensions k((x_1, x_2), (y_1, y_2)) = k_1(x_1, y_1) + k_2(x_2, y_2)

Comments