Feature Processing
Feature Processing
Extract features and targets
- Type:
DataPipe[Stream[String], Stream[(DenseVector[Double], Double)]]
- Result: Take each line which is a comma separated string and extract all but the last element into a feature vector and leave the last element as the "target" value.
Extract Specific Columns
extractTrainingFeatures(columns, missingVals)
- Type:
DataPipe[Stream[String], Stream[String]]
- Result: Extract a subset of columns from a stream of comma separated string also replace any missing value strings with the empty string.
- Usage:
DynaMLPipe.extractTrainingFeatures(List(1,2,3), Map(1 -> "N.A.", 2 -> "NA", 3 -> "na"))
Gaussian Scaling of Data
- Result: Perform gaussian normalization of features & targets, on a data stream which is a of the form
Stream[(DenseVector[Double], Double)]
.
Gaussian Scaling of Train/Test Splits
- Result: Perform gaussian normalization of features & targets, on a data stream which is a
Tuple2
of the form (Stream(training data), Stream(test data))
.
Min-Max Scaling of Data
- Result: Perform 0-1 scaling of features & targets, on a data stream which is a of the form
Stream[(DenseVector[Double], Double)]
.
Min-Max Scaling of Train/Test Splits
- Result: Perform 0-1 scaling of features & targets, on a data stream which is a
Tuple2
of the form (Stream(training_data), Stream(test_data))
.