Scalers & Encoders
Summary
The pipes API provides a good foundation to construct data processing pipelines, in this section we show how it is extended for application to a specific application i.e. attribute scaling & transformation.
Transforming data attributes is an often repeated task, some examples include re-scaling values in a finite domain [min, max], gaussian centering, principal component analysis (PCA), discreet Haar wavelet (DWT) transform etc.
The pipes API contains traits for these tasks, they are abstract skeletons which can be extended by the user to create arbitrary feature re-scaling transformations.
Encoders¶
Encoder[I, J]
are an extension of DataPipe[I, J]
class which has an extra value member i: DataPipe[J, I]
which represents the inverse transformation.
Note
Encoder[I, J]
implies a reversible, one to one transformation of the input. Mathematically this can be expressed as
An encoder can be created by calling the apply
method of the Encoder
object.
//Converts a point expressed in cartesian coordinates
//into a point expressed in polar coordinates and vice versa.
val cartesianToPolar = Encoder(
(pointCart: (Double, Double)) => {
val (x,y) = pointCart
val r = math.sqrt(x*x + y*y)
if(r != 0.0) (r, math.arcsin(y/r)) else (0.0, 0.0)
}),
(pointPolar: (Double, Double)) => {
val (r, theta) = pointPolar
(r*math.cos(theta), r*math.sin(theta))
}
)
Note
In the above example, we created a cartesian to polar coordinate converter by specifying the forward and reverse transformations as anonymous scala functions. But we could as well have passed the forward and reverse transforms as DataPipe
instances.
val forwardTransform: DataPipe[I, J] = _
val reverseTransform: DataPipe[J, I] = _
//Still works.
val enc = Encoder(forwardTransform, reverseTransform)
Scalers¶
Scaler[I]
is an extension of the DataPipe[I, I]
trait. Represents transformations of inputs which don't change their type.
val linTr = Scaler((x: Double) => x*5.0 + -1.5)
Reversible Scalers¶
ReversibleScaler[I]
extends Scaler[I]
along with Encoder[I, J]
, a reversible re-scaling of inputs.
The >
and *
for scalers and encoders
Since Encoder[S, D]
, Scaler[S]
and ReversibleScaler[S, D]
are inherit the DataPipe
trait, they can be composed with any data pipeline as usual, but there are special cases.
If an Encoder[I, J]
instance is composed with Encoder[J, K]
, the result is of type Encoder[I, K]
and accordingly for Scaler[I]
and ReversibleScaler[I]
.
The *
can be used to create cartesian products of encoders and scalers.
val enc1: Encoder[I, J] = _
val enc2: Encoder[K, L] = _
val enc3: Encoder[(I, K), (J, L)] = enc1*enc2
Tip
Common attribute transformations like gaussian centering, min-max scaling, etc are included in the dynaml.utils
package, click here to see their syntax.