Wine Quality
The wine quality data set is a common example used to benchmark classification models. Here we use the DynaML scala machine learning environment to train classifiers to detect 'good' wine from 'bad' wine. A short listing of the data attributes/columns is given below. The UCI archive has two files in the wine quality data set namely winequality-red.csv
and winequality-white.csv
. We train two separate classification models, one for red wine and one for white.
Attribute Information:¶
Inputs:¶
- fixed acidity
- volatile acidity
- citric acid
- residual sugar
- chlorides
- free sulfur dioxide
- total sulfur dioxide
- density
- pH
- sulphates
- alcohol
Output (based on sensory data):¶
- quality (score between 0 and 10)
Data Output Preprocessing¶
The wine quality target variable can take integer values from 0
to 10
, first we convert this into a binary class variable by setting the quality to be 'good'(encoded by the value 1
) if the numerical value is greater than 6
and 'bad' (encoded by value 0
) otherwise.
Model¶
Below is a classification model for predicting the quality label y.
Logit¶
Probit¶
The probit regression model is an alternative to the logit model it is represented as.
Syntax¶
The TestLogisticWineQuality
program in the examples
package trains and tests logit and probit models on the wine quality data.
Parameter | Type | Default value |Notes
--------|-----------|-----------|------------|
training | Int
| 100 | Number of training samples
test | Int
| 1000 | Number of test samples
columns | List[Int]
| 11, 0, ... , 10 | The columns to be selected for analysis (indexed from 0), first one is the target column.
stepSize | Double
| 0.01 | Step size chosen for GradientDescent
maxIt | Int
| 30 | Maximum number of iterations for gradient descent update.
mini | Double
| 1.0 | Fraction of training samples to sample for each batch update.
regularization | Double
| 0.5 | Regularization parameter.
wineType | String
| red | The type of wine: red or white
modelType | String
| logistic | The type of model: logistic or probit
Red Wine¶
TestLogisticWineQuality(stepSize = 0.2, maxIt = 120,
mini = 1.0, training = 800,
test = 800, regularization = 0.2,
wineType = "red")
16/04/01 15:21:57 INFO BinaryClassificationMetrics: Classification Model Performance
16/04/01 15:21:57 INFO BinaryClassificationMetrics: ============================
16/04/01 15:21:57 INFO BinaryClassificationMetrics: Accuracy: 0.8475
16/04/01 15:21:57 INFO BinaryClassificationMetrics: Area under ROC: 0.7968417788802267
16/04/01 15:21:57 INFO BinaryClassificationMetrics: Maximum F Measure: 0.7493563745371187
White Wine¶
TestLogisticWineQuality(stepSize = 0.26, maxIt = 300,
mini = 1.0, training = 3800,
test = 1000, regularization = 0.0,
wineType = "white")
16/04/01 15:27:17 INFO BinaryClassificationMetrics: Classification Model Performance
16/04/01 15:27:17 INFO BinaryClassificationMetrics: ============================
16/04/01 15:27:17 INFO BinaryClassificationMetrics: Accuracy: 0.829
16/04/01 15:27:17 INFO BinaryClassificationMetrics: Area under ROC: 0.7184782682020251
16/04/01 15:27:17 INFO BinaryClassificationMetrics: Maximum F Measure: 0.7182203962483446