rtemis vignette: MediBoost

Welcome to the rtemis vignette for MediBoost.

Load `rtemis`

Let’s load rtemis:

library(rtemis)

##   .:rtemis v0.7: Welcome, egenn
##   [x86_64-apple-darwin15.6.0: 4 threads available]

In rtemis, you can either provide a feature matrix / data frame, x, and an outcome vector, y, separately, or provide a combined dataset x alone, in which case the last column should be the outcome.
For classification, the outcome should be a factor where the first level is the ‘positive’ case.

UCI: parkinsons

Let’s load a dataset from the online UCI ML repository:

We convert the outcome variable “status” to a factor,
move it to the last column,
and set levels appropriately
We then use the checkData function to examine the dataset

parkinsons <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data")
parkinsons$Status <- factor(parkinsons$status, levels = c(1, 0))
parkinsons$status <- NULL
parkinsons$name <- NULL
checkData(parkinsons)

##   -------------------------------
##   Dataset: parkinsons 
##   -------------------------------
## 
##   Summary
##   -------------------------------
##   195 cases with 23 features: 
##   * 22 continuous features 
##   * 0 integer features 
##   * 1 categorical feature, which is not ordered
##   * 0 constant features 
##   * 0 features include 'NA' values
## 
##   Recommendations
##   -------------------------------
##   * Everything looks good

Train MediBoost

Let’s train a MediBoost (MDB) model on the full sample:

parkinsons.mdb <- s.MDB(parkinsons, gamma = .8, learning.rate = .1)

## [2018-03-02 04:14:19 s.MDB] Hello, egenn 
## [2018-03-02 04:14:19 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 
## ------------------------------------------------------
## Input Summary
## ------------------------------------------------------
##     Training features: 195 x 22 
##      Training outcome: 195 x 1 
##      Testing features: Not available
##       Testing outcome: Not available
## 
## [2018-03-02 04:14:20 s.MDB] Training MDB... 
## ------------------------------------------------------
## MDB Classification Training Summary
## ------------------------------------------------------
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   1   0
##          1 145   1
##          0   2  47
##                                           
##                Accuracy : 0.9846          
##                  95% CI : (0.9557, 0.9968)
##     No Information Rate : 0.7538          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9588          
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9864          
##             Specificity : 0.9792          
##          Pos Pred Value : 0.9932          
##          Neg Pred Value : 0.9592          
##               Precision : 0.9932          
##                  Recall : 0.9864          
##                      F1 : 0.9898          
##              Prevalence : 0.7538          
##          Detection Rate : 0.7436          
##    Detection Prevalence : 0.7487          
##       Balanced Accuracy : 0.9828          
##                                           
##        'Positive' Class : 1               
##

## [2018-03-02 04:14:23 s.MDB] Run completed in 0.06 minutes (Real: 3.81; User: 3.62; System: 0.13)

Plot MediBoost

MDB trees are saved as data.tree objects. We can plot them using mplot3.mdb, which creates html output using graphviz.
The first line shows the rule, followed by the N of samples that match the rule, and lastly by the percent of the above that were outcome positive.
By default, leaf nodes with an estimate of 1 (positive class) are orange, and those with estimate 0 are teal.
You can mouse over nodes, edges, and the plot background for some popup info:

mplot3.mdb(parkinsons.mdb)

Print MediBoost tree

We can also explore the tree in the console without plotting:

parkinsons.mdb$mod$mdb.tree.pruned

##                                    levelName
## 1  All cases                                
## 2   ¦--PPE < 0.1339935                      
## 3   °--PPE ≥ 0.1339935                      
## 4       ¦--Shimmer.APQ5 < 0.012745          
## 5       ¦   ¦--MDVP.Fo.Hz. < 117.25         
## 6       ¦   ¦   ¦--Shimmer.APQ3 < 0.008825  
## 7       ¦   ¦   ¦   ¦--MDVP.Fo.Hz. < 110.723
## 8       ¦   ¦   ¦   °--MDVP.Fo.Hz. ≥ 110.723
## 9       ¦   ¦   °--Shimmer.APQ3 ≥ 0.008825  
## 10      ¦   °--MDVP.Fo.Hz. ≥ 117.25         
## 11      °--Shimmer.APQ5 ≥ 0.012745

Any attribute can be printed along the hierarchical tree structure:

print(parkinsons.mdb$mod$mdb.tree.pruned, "Estimate")

##                                    levelName Estimate
## 1  All cases                                        1
## 2   ¦--PPE < 0.1339935                              0
## 3   °--PPE ≥ 0.1339935                              1
## 4       ¦--Shimmer.APQ5 < 0.012745                  1
## 5       ¦   ¦--MDVP.Fo.Hz. < 117.25                 0
## 6       ¦   ¦   ¦--Shimmer.APQ3 < 0.008825          0
## 7       ¦   ¦   ¦   ¦--MDVP.Fo.Hz. < 110.723        1
## 8       ¦   ¦   ¦   °--MDVP.Fo.Hz. ≥ 110.723        0
## 9       ¦   ¦   °--Shimmer.APQ3 ≥ 0.008825          1
## 10      ¦   °--MDVP.Fo.Hz. ≥ 117.25                 1
## 11      °--Shimmer.APQ5 ≥ 0.012745                  1

Predict

To get predicted values, use the predict S3 generic with the familiar syntax
predict(mod, newdata):

predict(parkinsons.mdb)

##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0
##  [36] 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1
##  [71] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [106] 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [141] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
## [176] 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0
## Levels: 1 0

Training and testing

Create resamples of our data
Visualize them (white is testing, teal is training)
Split data to train and test sets

train <- resample(parkinsons, n.resamples = 10, resampler = "kfold", verbose = TRUE)

## [2018-03-02 04:14:23 resample] Input contains more than one columns; will stratify on last 
## ------------------------------------------------------
## Resampling Parameters 
## ------------------------------------------------------
##   n.resamples: 10 
##     resampler: kfold 
## 
## [2018-03-02 04:14:23 resample] Created 10 independent folds

mplot3.res(train)

parkinsons.train <- parkinsons[train$Fold01, ]
parkinsons.test <- parkinsons[-train$Fold01, ]

parkinsons.mdb <- s.MDB(parkinsons.train, x.test = parkinsons.test,
                        gamma = .8, learning.rate = .1)

## [2018-03-02 04:14:23 s.MDB] Hello, egenn 
## [2018-03-02 04:14:23 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 
## ------------------------------------------------------
## Input Summary
## ------------------------------------------------------
##     Training features: 176 x 22 
##      Training outcome: 176 x 1 
##      Testing features: 19 x 22 
##       Testing outcome: 19 x 1 
## 
## [2018-03-02 04:14:23 s.MDB] Training MDB... 
## ------------------------------------------------------
## MDB Classification Training Summary
## ------------------------------------------------------
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   1   0
##          1 124   0
##          0   9  43
##                                           
##                Accuracy : 0.9489          
##                  95% CI : (0.9051, 0.9764)
##     No Information Rate : 0.7557          
##     P-Value [Acc > NIR] : 6.465e-12       
##                                           
##                   Kappa : 0.8707          
##  Mcnemar's Test P-Value : 0.007661        
##                                           
##             Sensitivity : 0.9323          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.8269          
##               Precision : 1.0000          
##                  Recall : 0.9323          
##                      F1 : 0.9650          
##              Prevalence : 0.7557          
##          Detection Rate : 0.7045          
##    Detection Prevalence : 0.7045          
##       Balanced Accuracy : 0.9662          
##                                           
##        'Positive' Class : 1               
##                                           
## ------------------------------------------------------
## MDB Classification Testing Summary
## ------------------------------------------------------
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  1  0
##          1 13  0
##          0  1  5
##                                           
##                Accuracy : 0.9474          
##                  95% CI : (0.7397, 0.9987)
##     No Information Rate : 0.7368          
##     P-Value [Acc > NIR] : 0.02352         
##                                           
##                   Kappa : 0.8725          
##  Mcnemar's Test P-Value : 1.00000         
##                                           
##             Sensitivity : 0.9286          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.8333          
##               Precision : 1.0000          
##                  Recall : 0.9286          
##                      F1 : 0.9630          
##              Prevalence : 0.7368          
##          Detection Rate : 0.6842          
##    Detection Prevalence : 0.6842          
##       Balanced Accuracy : 0.9643          
##                                           
##        'Positive' Class : 1               
##

## [2018-03-02 04:14:25 s.MDB] Run completed in 0.03 minutes (Real: 1.75; User: 1.68; System: 0.03)

Hyperparameter tuning

rtemis supervised learners, like s.MDB, support automatic hyperparameter tuning. When more than a single value is passed to a tunable argument, grid search with internal resampling takes place using all available cores (threads).

parkinsons.mdb.tune <- s.MDB(parkinsons.train, x.test = parkinsons.test,
                             gamma = seq(.6, .9, .1), learning.rate = .1)

## [2018-03-02 04:14:25 s.MDB] Hello, egenn 
## [2018-03-02 04:14:25 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 
## ------------------------------------------------------
## Input Summary
## ------------------------------------------------------
##     Training features: 176 x 22 
##      Training outcome: 176 x 1 
##      Testing features: 19 x 22 
##       Testing outcome: 19 x 1 
## 
## [2018-03-02 04:14:25 gridSearchLearn] Hello, egenn 
## ------------------------------------------------------
## Resampling Parameters 
## ------------------------------------------------------
##   n.resamples: 5 
##     resampler: kfold 
## 
## [2018-03-02 04:14:25 resample] Created 5 independent folds 
## 
## ------------------------------------------------------
## Input parameters 
## ------------------------------------------------------
##   grid.params:  
##                        gamma: 0.6, 0.7, 0.8, 0.9 
##                    max.depth: 30 
##                learning.rate: 0.1 
##                  min.hessian: 0.001 
## 
## [2018-03-02 04:14:25 gridSearchLearn] Tuning MediBoost Tree-Structured Boosting by exhaustive grid search: 
## [2018-03-02 04:14:25 gridSearchLearn] 5 resamples; 20 models total; running on 4 cores (x86_64-apple-darwin15.6.0)
##  
## 
## ------------------------------------------------------
## Best parameters to maximize BalancedAccuracy 
## ------------------------------------------------------
##   best.tune:  
##                      gamma: 0.7 
##                  max.depth: 30 
##              learning.rate: 0.1 
##                min.hessian: 0.001 
## 
## [2018-03-02 04:14:34 gridSearchLearn] Run completed in 0.15 minutes (Real: 9.01; User: 0.07; System: 0.04) 
## [2018-03-02 04:14:34 s.MDB] Training MDB... 
## ------------------------------------------------------
## MDB Classification Training Summary
## ------------------------------------------------------
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   1   0
##          1 126   0
##          0   7  43
##                                           
##                Accuracy : 0.9602          
##                  95% CI : (0.9198, 0.9839)
##     No Information Rate : 0.7557          
##     P-Value [Acc > NIR] : 1.502e-13       
##                                           
##                   Kappa : 0.8979          
##  Mcnemar's Test P-Value : 0.02334         
##                                           
##             Sensitivity : 0.9474          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.8600          
##               Precision : 1.0000          
##                  Recall : 0.9474          
##                      F1 : 0.9730          
##              Prevalence : 0.7557          
##          Detection Rate : 0.7159          
##    Detection Prevalence : 0.7159          
##       Balanced Accuracy : 0.9737          
##                                           
##        'Positive' Class : 1               
##                                           
## ------------------------------------------------------
## MDB Classification Testing Summary
## ------------------------------------------------------
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  1  0
##          1 13  1
##          0  1  4
##                                          
##                Accuracy : 0.8947         
##                  95% CI : (0.6686, 0.987)
##     No Information Rate : 0.7368         
##     P-Value [Acc > NIR] : 0.0894         
##                                          
##                   Kappa : 0.7286         
##  Mcnemar's Test P-Value : 1.0000         
##                                          
##             Sensitivity : 0.9286         
##             Specificity : 0.8000         
##          Pos Pred Value : 0.9286         
##          Neg Pred Value : 0.8000         
##               Precision : 0.9286         
##                  Recall : 0.9286         
##                      F1 : 0.9286         
##              Prevalence : 0.7368         
##          Detection Rate : 0.6842         
##    Detection Prevalence : 0.7368         
##       Balanced Accuracy : 0.8643         
##                                          
##        'Positive' Class : 1              
##

## [2018-03-02 04:14:35 s.MDB] Run completed in 0.17 minutes (Real: 10.19; User: 1.13; System: 0.08)

We can define tuning resampling parameters with the grid.resampler.rtSet. The rtSet.resampler convenience function helps easily build the list needed by grid.resampler.rtSet, providing auto-completion.

parkinsons.mdb.tune <- s.MDB(parkinsons.train, x.test = parkinsons.test,
                             gamma = seq(.6, .9, .1), learning.rate = .1,
                             grid.resampler.rtSet = rtSet.resampler(resampler = 'strat.boot',
                                                                    n.resamples = 5))

## [2018-03-02 04:14:35 s.MDB] Hello, egenn 
## [2018-03-02 04:14:35 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 
## ------------------------------------------------------
## Input Summary
## ------------------------------------------------------
##     Training features: 176 x 22 
##      Training outcome: 176 x 1 
##      Testing features: 19 x 22 
##       Testing outcome: 19 x 1 
## 
## [2018-03-02 04:14:35 gridSearchLearn] Hello, egenn 
## ------------------------------------------------------
## Resampling Parameters 
## ------------------------------------------------------
##     n.resamples: 5 
##       resampler: strat.boot 
##    stratify.var: y 
##            cv.p: 0.75 
##       cv.groups: 4 
##   target.length: NULL 
## 
## [2018-03-02 04:14:35 resample] Created 5 stratified bootstraps 
## 
## ------------------------------------------------------
## Input parameters 
## ------------------------------------------------------
##   grid.params:  
##                        gamma: 0.6, 0.7, 0.8, 0.9 
##                    max.depth: 30 
##                learning.rate: 0.1 
##                  min.hessian: 0.001 
## 
## [2018-03-02 04:14:35 gridSearchLearn] Tuning MediBoost Tree-Structured Boosting by exhaustive grid search: 
## [2018-03-02 04:14:35 gridSearchLearn] 5 resamples; 20 models total; running on 4 cores (x86_64-apple-darwin15.6.0)
##  
## 
## ------------------------------------------------------
## Best parameters to maximize BalancedAccuracy 
## ------------------------------------------------------
##   best.tune:  
##                      gamma: 0.7 
##                  max.depth: 30 
##              learning.rate: 0.1 
##                min.hessian: 0.001 
## 
## [2018-03-02 04:14:45 gridSearchLearn] Run completed in 0.17 minutes (Real: 10.13; User: 0.12; System: 0.03) 
## [2018-03-02 04:14:45 s.MDB] Training MDB... 
## ------------------------------------------------------
## MDB Classification Training Summary
## ------------------------------------------------------
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   1   0
##          1 126   0
##          0   7  43
##                                           
##                Accuracy : 0.9602          
##                  95% CI : (0.9198, 0.9839)
##     No Information Rate : 0.7557          
##     P-Value [Acc > NIR] : 1.502e-13       
##                                           
##                   Kappa : 0.8979          
##  Mcnemar's Test P-Value : 0.02334         
##                                           
##             Sensitivity : 0.9474          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.8600          
##               Precision : 1.0000          
##                  Recall : 0.9474          
##                      F1 : 0.9730          
##              Prevalence : 0.7557          
##          Detection Rate : 0.7159          
##    Detection Prevalence : 0.7159          
##       Balanced Accuracy : 0.9737          
##                                           
##        'Positive' Class : 1               
##                                           
## ------------------------------------------------------
## MDB Classification Testing Summary
## ------------------------------------------------------
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  1  0
##          1 13  1
##          0  1  4
##                                          
##                Accuracy : 0.8947         
##                  95% CI : (0.6686, 0.987)
##     No Information Rate : 0.7368         
##     P-Value [Acc > NIR] : 0.0894         
##                                          
##                   Kappa : 0.7286         
##  Mcnemar's Test P-Value : 1.0000         
##                                          
##             Sensitivity : 0.9286         
##             Specificity : 0.8000         
##          Pos Pred Value : 0.9286         
##          Neg Pred Value : 0.8000         
##               Precision : 0.9286         
##                  Recall : 0.9286         
##                      F1 : 0.9286         
##              Prevalence : 0.7368         
##          Detection Rate : 0.6842         
##    Detection Prevalence : 0.7368         
##       Balanced Accuracy : 0.8643         
##                                          
##        'Positive' Class : 1              
##

## [2018-03-02 04:14:46 s.MDB] Run completed in 0.18 minutes (Real: 11.01; User: 0.96; System: 0.06)

Let’s look at the tuning results (this is a small dataset and tuning may not be very accurate):

parkinsons.mdb.tune$extra$gridSearch$tune.results

Nested resampling: Cross-validation and hyperparameter tuning

We now use the core rtemis supervised learning function elevate to use nested resampling for cross-validation and hyperparameter tuning:

parkinsons.mdb.10fold <- elevate(parkinsons, mod = 'mdb', 
                                 gamma = c(.8, .9), 
                                 learning.rate = c(.01, .05),
                                 seed = 2018)

## [2018-03-02 04:14:46 elevate] Hello, egenn 
## ------------------------------------------------------
## Classification Input Summary
## ------------------------------------------------------
##     Training features: 195 x 22 
##      Training outcome: 195 x 1 
## 
## [2018-03-02 04:14:46 elevate] Training MDB on 10 resamples... 
## [2018-03-02 04:14:46 resLearn] Hello, egenn: resLearn running... 
## ------------------------------------------------------
## Resampling Parameters 
## ------------------------------------------------------
##   n.resamples: 10 
##     resampler: kfold 
## 
## [2018-03-02 04:14:46 resample] Created 10 independent folds 
## 
## ------------------------------------------------------
## MDB Parameters 
## ------------------------------------------------------
##   params:  
##                   gamma: 0.8, 0.9 
##           learning.rate: 0.01, 0.05 
## 
## [2018-03-02 04:14:46 resLearn] Training MediBoost Tree-Structured Boosting on 10 resamples... 
## [2018-03-02 04:17:21 resLearn] Run completed in 2.58 minutes (Real: 154.55; User: 16.81; System: 0.81) 
## 
## ======================================================
## elevate MDB 
## ======================================================
## N repeats = 1 
## N resamples = 10 
## Resampler = kfold 
## Balanced Accuracy of 10 aggregated test sets in each repeat = 0.87
## ======================================================
## [2018-03-02 04:17:21 elevate] Run completed in 2.58 minutes (Real: 154.65; User: 16.91; System: 0.81)

We can get a summary of the cross-validation by printing the elevate object:

parkinsons.mdb.10fold

## ======================================================
## .:rtemis Cross-Validated Model
## ======================================================
##                  Algorithm: MDB (MediBoost Tree-Structured Boosting)
##                 Resampling: n = 10, type = kfold
##               N of repeats: 1 
##  Average Balanced Accuracy across repeats = 0.8722364

OpenML: sleep

Let’s grab a dataset from the massive OpenML repository.
(We can read the .arff files as CSVs)

con <- gzcon(url("https://www.openml.org/data/get_csv/53273/sleep.arff"),
             text = TRUE)
sleep <- read.csv(con, header = TRUE, na.strings = "?")
checkData(sleep)

##   -------------------------------
##   Dataset: sleep 
##   -------------------------------
## 
##   Summary
##   -------------------------------
##   62 cases with 8 features: 
##   * 4 continuous features 
##   * 3 integer features 
##   * 1 categorical feature, which is not ordered
##   * 0 constant features 
##   * 2 features include 'NA' values; 8 'NA' values total
##     ** Max percent missing in a feature is 6.45% (max_life_span)
##     ** Max percent missing in a case is 25.00% (case #13)
## 
##   Recommendations
##   -------------------------------
##   * Consider imputing missing values or use complete cases only
##   * Check the 3 integer features and consider if they should be converted to factors

We can impute missing data with preproc:

sleep <- preproc(sleep, impute = TRUE)

## [2018-03-02 04:17:22 preproc.default] Imputing missing values... 
## [2018-03-02 04:17:22 preproc.default] Done

Train and plot MDB:

sleep.mdb <- s.MDB(sleep, gamma = .8, learning.rate = .1)

## [2018-03-02 04:17:22 s.MDB] Hello, egenn 
## [2018-03-02 04:17:22 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 
## ------------------------------------------------------
## Input Summary
## ------------------------------------------------------
##     Training features: 62 x 7 
##      Training outcome: 62 x 1 
##      Testing features: Not available
##       Testing outcome: Not available
## 
## [2018-03-02 04:17:22 s.MDB] Training MDB... 
## ------------------------------------------------------
## MDB Classification Training Summary
## ------------------------------------------------------
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  N  P
##          N 30  2
##          P  3 27
##                                           
##                Accuracy : 0.9194          
##                  95% CI : (0.8217, 0.9733)
##     No Information Rate : 0.5323          
##     P-Value [Acc > NIR] : 3.924e-11       
##                                           
##                   Kappa : 0.8384          
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9091          
##             Specificity : 0.9310          
##          Pos Pred Value : 0.9375          
##          Neg Pred Value : 0.9000          
##               Precision : 0.9375          
##                  Recall : 0.9091          
##                      F1 : 0.9231          
##              Prevalence : 0.5323          
##          Detection Rate : 0.4839          
##    Detection Prevalence : 0.5161          
##       Balanced Accuracy : 0.9201          
##                                           
##        'Positive' Class : N               
##

## [2018-03-02 04:17:23 s.MDB] Run completed in 0.01 minutes (Real: 0.64; User: 0.62; System: 0.01)

mplot3.mdb(sleep.mdb)

PMLB: chess

Let’s load a dataset from the Penn ML Benchmarks github repository.
R allows us to read a gzipped file and unzip on the fly:

We open a remote connection to a gzipped tab-separated file,
read it in R with read.table,
set the target levels,
and check the data

rzd <- gzcon(url("https://github.com/EpistasisLab/penn-ml-benchmarks/raw/master/datasets/classification/chess/chess.tsv.gz"),
             text = TRUE)
chess <- read.table(rzd, header = TRUE)
chess$target <- factor(chess$target, levels = c(1, 0))
checkData(chess)

##   -------------------------------
##   Dataset: chess 
##   -------------------------------
## 
##   Summary
##   -------------------------------
##   3196 cases with 37 features: 
##   * 0 continuous features 
##   * 36 integer features 
##   * 1 categorical feature, which is not ordered
##   * 0 constant features 
##   * 0 features include 'NA' values
## 
##   Recommendations
##   -------------------------------
##   * Everything looks good

chess.mdb <- s.MDB(chess, gamma = .8, learning.rate = .1)

## [2018-03-02 04:17:25 s.MDB] Hello, egenn 
## [2018-03-02 04:17:25 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 
## ------------------------------------------------------
## Input Summary
## ------------------------------------------------------
##     Training features: 3196 x 36 
##      Training outcome: 3196 x 1 
##      Testing features: Not available
##       Testing outcome: Not available
## 
## [2018-03-02 04:17:25 s.MDB] Training MDB... 
## ------------------------------------------------------
## MDB Classification Training Summary
## ------------------------------------------------------
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    1    0
##          1 1623   28
##          0   46 1499
##                                          
##                Accuracy : 0.9768         
##                  95% CI : (0.971, 0.9818)
##     No Information Rate : 0.5222         
##     P-Value [Acc > NIR] : < 2e-16        
##                                          
##                   Kappa : 0.9536         
##  Mcnemar's Test P-Value : 0.04813        
##                                          
##             Sensitivity : 0.9724         
##             Specificity : 0.9817         
##          Pos Pred Value : 0.9830         
##          Neg Pred Value : 0.9702         
##               Precision : 0.9830         
##                  Recall : 0.9724         
##                      F1 : 0.9777         
##              Prevalence : 0.5222         
##          Detection Rate : 0.5078         
##    Detection Prevalence : 0.5166         
##       Balanced Accuracy : 0.9771         
##                                          
##        'Positive' Class : 1              
##

## [2018-03-02 04:17:33 s.MDB] Run completed in 0.13 minutes (Real: 7.72; User: 7.37; System: 0.24)

mplot3.mdb(chess.mdb)

rtemis vignette: MediBoost

Efstathios D. Gennatas, MBBS PhD

12/31/2017

Load `rtemis`

UCI: parkinsons

Train MediBoost

Plot MediBoost

Print MediBoost tree

Predict

Training and testing

Hyperparameter tuning

Nested resampling: Cross-validation and hyperparameter tuning

OpenML: sleep

PMLB: chess

rtemis vignette: MediBoost

Efstathios D. Gennatas, MBBS PhD

12/31/2017

Load rtemis

UCI: parkinsons

Train MediBoost

Plot MediBoost

Print MediBoost tree

Predict

Training and testing

Hyperparameter tuning

Nested resampling: Cross-validation and hyperparameter tuning

OpenML: sleep

PMLB: chess

Load `rtemis`