String & File Processing
Summary
A List of data pipes useful for processing contents of data files.
Data Pre-processing¶
File to Stream of Lines¶
fileToStream
- Type:
DataPipe[String, Stream[String]]
- Result: Converts a text file (inputted as a file path string) into
Stream[String]
Write Stream of Lines to File¶
streamToFile(fileName: String)
- Type:
DataPipe[Stream[String], Unit]
- Result: Writes a stream of lines to the file specified by
filePath
Drop first line in Stream¶
dropHead
- Type:
DataPipe[Stream[String], Stream[String]]
- Result: Drop the first element of a
Stream
ofString
Replace Occurrences in of a String¶
replace(original, newString)
- Type:
DataPipe[Stream[String], Stream[String]]
- Result: Replace all occurrences of a regular expression or string in a
Stream
ofString
with with a specified replacement string.
Replace White Spaces¶
replaceWhiteSpaces
- Type:
DataPipe[Stream[String], Stream[String]]
- Result: Replace all white space characters in a stream of lines.
Remove Trailing White Spaces¶
- Type:
DataPipe[Stream[String], Stream[String]]
- Result: Trim white spaces from both sides of every line.
Remove White Spaces¶
replaceWhiteSpaces
- Type:
DataPipe[Stream[String], Stream[String]]
- Result: Replace all white space characters in a stream of lines.
Remove Missing Records¶
removeMissingLines
- Type:
DataPipe[Stream[String], Stream[String]]
- Result: Remove all lines/records which contain missing values
Create Train/Test splits¶
splitTrainingTest(num_training, num_test)
- Type:
DataPipe[(Stream[(DenseVector[Double], Double)], Stream[(DenseVector[Double], Double)]), (Stream[(DenseVector[Double], Double)], Stream[(DenseVector[Double], Double)])]
- Result: Extract a subset of the data into a
Tuple2
which can be used as a training, test combo for model learning and evaluation.