

Automatic Feature Extraction

Summary

"Automatic Feature Extraction (AFE) is a technique by which approximate eigen-functions can be extracted from gram matrices constructed via kernel functions. These eigen-functions are features engineered by a particular kernel.""

Some Mathematical Background¶

Definitions¶

Let $X_k \ \in \ \mathbb{R}^d \ , \ k = 1, \cdots ,n$ be a random sample drawn from a distribution $F(x)$ . Let $C \in \mathbb{R}^d$ be a compact set such that, $\mathcal{H} = \mathcal{L}^2(C)$ be a Hilbert space of functions given by the inner product below.

$\begin{equation} <f,g>_{\mathcal{H}} = \int f(x)g(x) dF(x) \end{equation}$

Further let $M(\mathcal{H}, \mathcal{H})$ be a class of linear operators from $\mathcal{H}$ to $\mathcal{H}$ .

Nyström method¶

Automatic Feature Extraction (AFE) using the Nyström method aims at finding a finite dimensional approximation to the kernel eigenfunction expansion of Mercer kernels, as shown below.

$\begin{equation} K(x,t) = \sum_i{\lambda_i \phi(x)\phi(t)} \end{equation}$

It is well known that Mercer kernels form a Reproducing Kernel Hilbert Space (RHKS) of functions. Every Mercer kernel defines a unique RHKS of functions as shown by the Moore-Aronszajn theorem. For a more involved treatment of RHKS and their applications the reader may refer to the book written by Bertinet et.al.

Mercer's theorem states that the spectral decomposition of integral operator of $K$ , $\mathcal{T} \in M(\mathcal{H},\mathcal{H})$ defined below yields the eigenfunctions which span the RHKS generated by $K$ and having an inner product defined as above.

$\begin{equation} (\mathcal{T}\phi_i)(t) = \int K(x,t) \phi(x) dF(x) \end{equation}$

Equation above is more commonly also known as the Fredholm integral equation of the first kind. Nyström's method method approximates this integral using the quadrature constructed by considering a finite kernel matrix constructed out of a prototype set $X_k \ k = 1, \cdots, m$ and calculating its spectral decomposition consisting of eigenvalues $\lambda_k$ and eigen-vectors $u_k$ . This yields an expression for the approximate non-linear feature map $\hat{\phi} : \mathbb{R}^d \longrightarrow \mathbb{R}^m$ .

$\begin{equation} \hat{\phi}_{i}(t) = \frac{\sqrt{m}}{\lambda_i}\sum_{k=1}^{m}K(X_k, t)u_{k,i} \end{equation}$

AFE in DynaML Kernels¶

The SVMKernel[M] contains an implementation of AFE in the method

featureMapping(
  decomposition: (DenseVector[Double], DenseMatrix[Double]))(
    prototypes: List[DenseVector[Double]])(
      data: DenseVector[Double]): DenseVector[Double]

Note

The SVMKernel class is extended by all the implemented library kernels in DynaML thereby enabling the use of AFE in potentially any model employing kernels.