# Feature Processing

## Feature Processing¶

### Extract features and targets¶

splitFeaturesAndTargets

• Type: DataPipe[Stream[String], Stream[(DenseVector[Double], Double)]]
• Result: Take each line which is a comma separated string and extract all but the last element into a feature vector and leave the last element as the "target" value.

### Extract Specific Columns¶

extractTrainingFeatures(columns, missingVals)

• Type: DataPipe[Stream[String], Stream[String]]
• Result: Extract a subset of columns from a stream of comma separated string also replace any missing value strings with the empty string.
• Usage: DynaMLPipe.extractTrainingFeatures(List(1,2,3), Map(1 -> "N.A.", 2 -> "NA", 3 -> "na"))

### Gaussian Scaling of Data¶

gaussianScaling

• Result: Perform gaussian normalization of features & targets, on a data stream which is a of the form Stream[(DenseVector[Double], Double)].

### Gaussian Scaling of Train/Test Splits¶

gaussianScalingTrainTest

• Result: Perform gaussian normalization of features & targets, on a data stream which is a Tuple2 of the form (Stream(training data), Stream(test data)).

### Min-Max Scaling of Data¶

minMaxScaling

• Result: Perform 0-1 scaling of features & targets, on a data stream which is a of the form Stream[(DenseVector[Double], Double)].

### Min-Max Scaling of Train/Test Splits¶

minMaxScalingTrainTest

• Result: Perform 0-1 scaling of features & targets, on a data stream which is a Tuple2 of the form (Stream(training_data), Stream(test_data)).