Feature Processing

Feature Processing

Extract features and targets

splitFeaturesAndTargets
  • Type: DataPipe[Stream[String], Stream[(DenseVector[Double], Double)]]
  • Result: Take each line which is a comma separated string and extract all but the last element into a feature vector and leave the last element as the "target" value.

Extract Specific Columns

extractTrainingFeatures(columns, missingVals)
  • Type: DataPipe[Stream[String], Stream[String]]
  • Result: Extract a subset of columns from a stream of comma separated string also replace any missing value strings with the empty string.
  • Usage: DynaMLPipe.extractTrainingFeatures(List(1,2,3), Map(1 -> "N.A.", 2 -> "NA", 3 -> "na"))

Gaussian Scaling of Data

gaussianScaling
  • Result: Perform gaussian normalization of features & targets, on a data stream which is a of the form Stream[(DenseVector[Double], Double)].

Gaussian Scaling of Train/Test Splits

gaussianScalingTrainTest
  • Result: Perform gaussian normalization of features & targets, on a data stream which is a Tuple2 of the form (Stream(training data), Stream(test data)).

Min-Max Scaling of Data

minMaxScaling
  • Result: Perform 0-1 scaling of features & targets, on a data stream which is a of the form Stream[(DenseVector[Double], Double)].

Min-Max Scaling of Train/Test Splits

minMaxScalingTrainTest
  • Result: Perform 0-1 scaling of features & targets, on a data stream which is a Tuple2 of the form (Stream(training_data), Stream(test_data)).

Comments