Automatic Feature Extraction

Summary

"Automatic Feature Extraction (AFE) is a technique by which approximate eigen-functions can be extracted from gram matrices constructed via kernel functions. These eigen-functions are features engineered by a particular kernel.""

Some Mathematical Background

Definitions

Let X_k \ \in \ \mathbb{R}^d \ , \ k = 1, \cdots ,n be a random sample drawn from a distribution F(x). Let C \in \mathbb{R}^d be a compact set such that, \mathcal{H} = \mathcal{L}^2(C) be a Hilbert space of functions given by the inner product below.

\begin{equation} <f,g>_{\mathcal{H}} = \int f(x)g(x) dF(x) \end{equation}

Further let M(\mathcal{H}, \mathcal{H}) be a class of linear operators from \mathcal{H} to \mathcal{H}.

Nyström method

Automatic Feature Extraction (AFE) using the Nyström method aims at finding a finite dimensional approximation to the kernel eigenfunction expansion of Mercer kernels, as shown below.

\begin{equation} K(x,t) = \sum_i{\lambda_i \phi(x)\phi(t)} \end{equation}

It is well known that Mercer kernels form a Reproducing Kernel Hilbert Space (RHKS) of functions. Every Mercer kernel defines a unique RHKS of functions as shown by the Moore-Aronszajn theorem. For a more involved treatment of RHKS and their applications the reader may refer to the book written by Bertinet et.al.

Mercer's theorem states that the spectral decomposition of integral operator of K, \mathcal{T} \in M(\mathcal{H},\mathcal{H}) defined below yields the eigenfunctions which span the RHKS generated by K and having an inner product defined as above.

\begin{equation} (\mathcal{T}\phi_i)(t) = \int K(x,t) \phi(x) dF(x) \end{equation}

Equation above is more commonly also known as the Fredholm integral equation of the first kind. Nyström's method method approximates this integral using the quadrature constructed by considering a finite kernel matrix constructed out of a prototype set X_k \ k = 1, \cdots, m and calculating its spectral decomposition consisting of eigenvalues \lambda_k and eigen-vectors u_k. This yields an expression for the approximate non-linear feature map \hat{\phi} : \mathbb{R}^d \longrightarrow \mathbb{R}^m.

\begin{equation} \hat{\phi}_{i}(t) = \frac{\sqrt{m}}{\lambda_i}\sum_{k=1}^{m}K(X_k, t)u_{k,i} \end{equation}

AFE in DynaML Kernels

The SVMKernel[M] contains an implementation of AFE in the method

featureMapping(
  decomposition: (DenseVector[Double], DenseMatrix[Double]))(
    prototypes: List[DenseVector[Double]])(
      data: DenseVector[Double]): DenseVector[Double]

Note

The SVMKernel class is extended by all the implemented library kernels in DynaML thereby enabling the use of AFE in potentially any model employing kernels.

Comments