library(sf)
= read_sf("data/temp_train.gpkg")
temp_train plot(temp_train)
III Congreso & XIV Jornadas de Usuarios de R, Sevilla, Spain
2024-11-08
Supervised machine learning
General idea:
Two main types of problems:
Many machine learning techniques:
This is a regression problem
temp = extract(predictors, temp_train, ID = FALSE)
temp_train = cbind(temp_train, temp)
head(temp_train)
Simple feature collection with 6 features and 7 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 825940.4 ymin: 4541533 xmax: 934920.7 ymax: 4630234
Projected CRS: ED50 / UTM zone 30N
temp popdens coast dem ndvi lst_day lst_night
1 17.52610 0.000000 1.1263009 85.905403 0.3656146 24.37792 12.642557
2 16.94795 1.211701 6.7432733 75.001259 0.3990190 28.13341 10.706681
3 17.49233 5.681698 1.7549587 2.556155 0.1987631 25.76198 11.370279
4 15.30838 4752.076660 45.7688789 256.110870 0.3861388 26.97013 8.315234
5 16.56247 1789.268799 6.2198448 303.596924 0.5917153 22.47704 12.101181
6 17.22139 13260.116211 0.7378924 12.070770 0.2349442 24.79462 13.021243
geom
1 POINT (825940.4 4541533)
2 POINT (849548.2 4563427)
3 POINT (924683.3 4583884)
4 POINT (902776.4 4630234)
5 POINT (928394.5 4598097)
6 POINT (934920.7 4595391)
Basic steps:
<TaskRegrST:temp_train> (195 x 7)
* Target: temp
* Properties: -
* Features (6):
- dbl (6): coast, dem, lst_day, lst_night, ndvi, popdens
* Coordinates:
X Y
<num> <num>
1: 825940.4 4541533
2: 849548.2 4563427
3: 924683.3 4583884
4: 902776.4 4630234
5: 928394.5 4598097
---
191: 764532.5 4724981
192: 721314.2 4662824
193: 794727.3 4524892
194: 822024.6 4512558
195: 817662.0 4735035
<DictionaryLearner> with 27 stored values
Keys: classif.cv_glmnet, classif.debug, classif.featureless,
classif.glmnet, classif.kknn, classif.lda, classif.log_reg,
classif.multinom, classif.naive_bayes, classif.nnet, classif.qda,
classif.ranger, classif.rpart, classif.svm, classif.xgboost,
regr.cv_glmnet, regr.debug, regr.featureless, regr.glmnet, regr.kknn,
regr.km, regr.lm, regr.nnet, regr.ranger, regr.rpart, regr.svm,
regr.xgboost
<LearnerRegrRpart:regr.rpart>: Regression Tree
* Model: -
* Parameters: xval=0
* Packages: mlr3, rpart
* Predict Types: [response]
* Feature Types: logical, integer, numeric, factor, ordered
* Properties: importance, missings, selected_features, weights
<DictionaryResampling> with 24 stored values
Keys: bootstrap, custom, custom_cv, cv, holdout, insample, loo,
repeated_cv, repeated_spcv_block, repeated_spcv_coords,
repeated_spcv_disc, repeated_spcv_env, repeated_spcv_knndm,
repeated_spcv_tiles, repeated_sptcv_cstf, spcv_block, spcv_buffer,
spcv_coords, spcv_disc, spcv_env, spcv_knndm, spcv_tiles, sptcv_cstf,
subsampling
set.seed(2024-10-31)
rr_cv_rpart = mlr3::resample(task = task,
learner = learner_rpart,
resampling = resampling)
rr_cv_rpart
<ResampleResult> with 100 resampling iterations
task_id learner_id resampling_id iteration prediction_test warnings errors
temp_train regr.rpart repeated_cv 1 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 2 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 3 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 4 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 5 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 6 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 7 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 8 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 9 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 10 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 11 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 12 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 13 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 14 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 15 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 16 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 17 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 18 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 19 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 20 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 21 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 22 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 23 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 24 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 25 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 26 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 27 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 28 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 29 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 30 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 31 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 32 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 33 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 34 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 35 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 36 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 37 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 38 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 39 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 40 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 41 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 42 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 43 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 44 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 45 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 46 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 47 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 48 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 49 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 50 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 51 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 52 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 53 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 54 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 55 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 56 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 57 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 58 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 59 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 60 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 61 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 62 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 63 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 64 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 65 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 66 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 67 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 68 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 69 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 70 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 71 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 72 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 73 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 74 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 75 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 76 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 77 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 78 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 79 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 80 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 81 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 82 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 83 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 84 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 85 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 86 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 87 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 88 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 89 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 90 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 91 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 92 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 93 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 94 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 95 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 96 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 97 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 98 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 99 <PredictionRegr> 0 0
temp_train regr.rpart repeated_cv 100 <PredictionRegr> 0 0
task_id learner_id resampling_id iteration prediction_test warnings errors
<DictionaryMeasure> with 65 stored values
Keys: aic, bic, classif.acc, classif.auc, classif.bacc, classif.bbrier,
classif.ce, classif.costs, classif.dor, classif.fbeta, classif.fdr,
classif.fn, classif.fnr, classif.fomr, classif.fp, classif.fpr,
classif.logloss, classif.mauc_au1p, classif.mauc_au1u,
classif.mauc_aunp, classif.mauc_aunu, classif.mauc_mu,
classif.mbrier, classif.mcc, classif.npv, classif.ppv, classif.prauc,
classif.precision, classif.recall, classif.sensitivity,
classif.specificity, classif.tn, classif.tnr, classif.tp,
classif.tpr, debug_classif, internal_valid_score, oob_error,
regr.bias, regr.ktau, regr.mae, regr.mape, regr.maxae, regr.medae,
regr.medse, regr.mse, regr.msle, regr.pbias, regr.pinball, regr.rae,
regr.rmse, regr.rmsle, regr.rrse, regr.rse, regr.rsq, regr.sae,
regr.smape, regr.srho, regr.sse, selected_features, sim.jaccard,
sim.phi, time_both, time_predict, time_train
task_id learner_id resampling_id iteration regr.rmse rsq
<char> <char> <char> <int> <num> <num>
1: temp_train regr.rpart repeated_cv 1 0.981040 0.8608386
2: temp_train regr.rpart repeated_cv 2 1.115505 0.8413669
3: temp_train regr.rpart repeated_cv 3 1.115693 0.8336519
4: temp_train regr.rpart repeated_cv 4 1.161596 0.8258320
5: temp_train regr.rpart repeated_cv 5 1.108825 0.8525720
6: temp_train regr.rpart repeated_cv 6 1.115927 0.7920166
Hidden columns: task, learner, resampling, prediction_test
Source: mlr3spatiotempcv
Id | Method |
---|---|
A | spcv_block |
B | spcv_coords |
C | spcv_env |
D | spcv_disc |
E | spcv_tiles |
F | spcv_buffer |
spcv_knndm | |
spcv_env |
Each of the methods has a repeated_
version
Also: there are additional methods for spatio-temporal data
rr_spcv_rpart = mlr3::resample(task = task,
learner = learner_rpart,
resampling = spcv_resampling)
rr_spcv_rpart
<ResampleResult> with 100 resampling iterations
task_id learner_id resampling_id iteration prediction_test warnings
temp_train regr.rpart repeated_spcv_coords 1 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 2 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 3 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 4 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 5 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 6 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 7 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 8 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 9 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 10 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 11 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 12 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 13 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 14 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 15 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 16 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 17 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 18 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 19 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 20 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 21 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 22 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 23 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 24 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 25 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 26 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 27 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 28 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 29 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 30 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 31 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 32 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 33 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 34 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 35 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 36 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 37 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 38 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 39 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 40 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 41 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 42 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 43 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 44 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 45 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 46 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 47 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 48 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 49 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 50 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 51 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 52 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 53 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 54 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 55 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 56 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 57 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 58 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 59 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 60 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 61 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 62 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 63 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 64 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 65 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 66 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 67 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 68 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 69 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 70 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 71 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 72 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 73 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 74 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 75 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 76 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 77 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 78 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 79 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 80 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 81 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 82 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 83 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 84 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 85 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 86 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 87 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 88 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 89 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 90 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 91 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 92 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 93 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 94 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 95 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 96 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 97 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 98 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 99 <PredictionRegr> 0
temp_train regr.rpart repeated_spcv_coords 100 <PredictionRegr> 0
task_id learner_id resampling_id iteration prediction_test warnings
errors
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
errors
task_id learner_id resampling_id iteration regr.rmse rsq
<char> <char> <char> <int> <num> <num>
1: temp_train regr.rpart repeated_spcv_coords 1 1.3158684 0.7673390
2: temp_train regr.rpart repeated_spcv_coords 2 0.9992242 0.8144015
3: temp_train regr.rpart repeated_spcv_coords 3 1.2083355 0.7613023
4: temp_train regr.rpart repeated_spcv_coords 4 1.0921883 0.4605314
5: temp_train regr.rpart repeated_spcv_coords 5 1.2048538 0.2331354
6: temp_train regr.rpart repeated_spcv_coords 6 1.3081681 0.7129704
Hidden columns: task, learner, resampling, prediction_test
Feature engineering:
Hyperparameter tuning and feature selection:
lst_night dem lst_day coast ndvi popdens
1157.36732 891.79205 436.44619 206.08428 182.92553 49.37387
library(DALEX)
library(DALEXtra)
regr_exp = DALEXtra::explain_mlr3(learner_rpart,
data = st_drop_geometry(temp_train)[-1],
y = temp_train$temp)
Preparation of a new explainer is initiated
-> model label : R6 ( default )
-> data : 195 rows 6 cols
-> target variable : 195 values
-> predict function : yhat.LearnerRegr will be used ( default )
-> predicted values : No value for predict function target column. ( default )
-> model_info : package mlr3 , ver. 0.21.1 , task regression ( default )
-> predicted values : numerical, min = 8.157678 , mean = 15.10157 , max = 18.14362
-> residual function : difference between y and yhat ( default )
-> residuals : numerical, min = -3.09541 , mean = -1.829466e-17 , max = 2.473572
A new explainer has been created!