Machine learning approaches for working with spatial data

Jakub Nowosad, https://jakubnowosad.com/

III Congreso & XIV Jornadas de Usuarios de R, Sevilla, Spain

2024-11-08

Supervised machine learning

General idea:

We have information about dependent variable (response, outcome, target) in some locations
We want to predict values of the dependent variables in unsampled locations
For that purpose, we need independent variables (predictors, explanatory variables)
We assume that there is a relationship between our dependent variable and independent variables

Two main types of problems:

Regression
Classificaiton

Many machine learning techniques:

Linear/Logistic regression
Regression/decision trees
Random forest
Gradient boosting
…

Example data

Dependent variable

This is a regression problem

library(sf)
temp_train = read_sf("data/temp_train.gpkg")
plot(temp_train)

Independent variables

library(terra)
predictors = rast("data/predictors.tif")
plot(predictors, axes = FALSE)

Data preparation

temp = extract(predictors, temp_train, ID = FALSE)
temp_train = cbind(temp_train, temp)
head(temp_train)

Simple feature collection with 6 features and 7 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 825940.4 ymin: 4541533 xmax: 934920.7 ymax: 4630234
Projected CRS: ED50 / UTM zone 30N
      temp      popdens      coast        dem      ndvi  lst_day lst_night
1 17.52610     0.000000  1.1263009  85.905403 0.3656146 24.37792 12.642557
2 16.94795     1.211701  6.7432733  75.001259 0.3990190 28.13341 10.706681
3 17.49233     5.681698  1.7549587   2.556155 0.1987631 25.76198 11.370279
4 15.30838  4752.076660 45.7688789 256.110870 0.3861388 26.97013  8.315234
5 16.56247  1789.268799  6.2198448 303.596924 0.5917153 22.47704 12.101181
6 17.22139 13260.116211  0.7378924  12.070770 0.2349442 24.79462 13.021243
                      geom
1 POINT (825940.4 4541533)
2 POINT (849548.2 4563427)
3 POINT (924683.3 4583884)
4 POINT (902776.4 4630234)
5 POINT (928394.5 4598097)
6 POINT (934920.7 4595391)

Raster template creation

grid_raster = rast(predictors)
grid_raster

class       : SpatRaster 
dimensions  : 873, 1036, 6  (nrow, ncol, nlyr)
resolution  : 1000, 1000  (x, y)
extent      : -13954.02, 1022046, 3987316, 4860316  (xmin, xmax, ymin, ymax)
coord. ref. : ED50 / UTM zone 30N (EPSG:23030)

Bare ML workflow

Model specification

library(rpart)
rpart_model = rpart(temp ~ ., data = st_drop_geometry(temp_train))

plot(rpart_model, margin = 0.2)
text(rpart_model)

Prediction

temp_pred = predict(predictors, rpart_model)
plot(temp_pred)

Basic ML workflow

Basic ML workflow in mlr3

library(mlr3)
library(mlr3learners)
library(mlr3spatiotempcv)
lgr::get_logger("mlr3")$set_threshold("warn")

Basic steps:

Create a task: it includes the input data and the target variable
Specify the learner: it includes the model type and its parameters
Specify the resampling strategy: it includes the validation method and its parameters

Specifying Task

task = mlr3spatiotempcv::as_task_regr_st(temp_train, target = "temp")
task

<TaskRegrST:temp_train> (195 x 7)
* Target: temp
* Properties: -
* Features (6):
  - dbl (6): coast, dem, lst_day, lst_night, ndvi, popdens
* Coordinates:
            X       Y
        <num>   <num>
  1: 825940.4 4541533
  2: 849548.2 4563427
  3: 924683.3 4583884
  4: 902776.4 4630234
  5: 928394.5 4598097
 ---                 
191: 764532.5 4724981
192: 721314.2 4662824
193: 794727.3 4524892
194: 822024.6 4512558
195: 817662.0 4735035

Specifying Learner

mlr3::mlr_learners

<DictionaryLearner> with 27 stored values
Keys: classif.cv_glmnet, classif.debug, classif.featureless,
  classif.glmnet, classif.kknn, classif.lda, classif.log_reg,
  classif.multinom, classif.naive_bayes, classif.nnet, classif.qda,
  classif.ranger, classif.rpart, classif.svm, classif.xgboost,
  regr.cv_glmnet, regr.debug, regr.featureless, regr.glmnet, regr.kknn,
  regr.km, regr.lm, regr.nnet, regr.ranger, regr.rpart, regr.svm,
  regr.xgboost

learner_rpart = mlr3::lrn("regr.rpart")
learner_rpart

<LearnerRegrRpart:regr.rpart>: Regression Tree
* Model: -
* Parameters: xval=0
* Packages: mlr3, rpart
* Predict Types:  [response]
* Feature Types: logical, integer, numeric, factor, ordered
* Properties: importance, missings, selected_features, weights

Specifying Resampling

mlr_resamplings

<DictionaryResampling> with 24 stored values
Keys: bootstrap, custom, custom_cv, cv, holdout, insample, loo,
  repeated_cv, repeated_spcv_block, repeated_spcv_coords,
  repeated_spcv_disc, repeated_spcv_env, repeated_spcv_knndm,
  repeated_spcv_tiles, repeated_sptcv_cstf, spcv_block, spcv_buffer,
  spcv_coords, spcv_disc, spcv_env, spcv_knndm, spcv_tiles, sptcv_cstf,
  subsampling

resampling = mlr3::rsmp("repeated_cv", folds = 5, repeats = 20)
resampling

<ResamplingRepeatedCV>: Repeated Cross-Validation
* Iterations: 100
* Instantiated: FALSE
* Parameters: folds=5, repeats=20

Applying the resampling strategy

set.seed(2024-10-31)
rr_cv_rpart = mlr3::resample(task = task,
                             learner = learner_rpart,
                             resampling = resampling)
rr_cv_rpart

<ResampleResult> with 100 resampling iterations
    task_id learner_id resampling_id iteration  prediction_test warnings errors
 temp_train regr.rpart   repeated_cv         1 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv         2 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv         3 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv         4 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv         5 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv         6 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv         7 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv         8 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv         9 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        10 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        11 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        12 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        13 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        14 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        15 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        16 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        17 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        18 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        19 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        20 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        21 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        22 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        23 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        24 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        25 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        26 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        27 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        28 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        29 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        30 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        31 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        32 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        33 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        34 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        35 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        36 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        37 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        38 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        39 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        40 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        41 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        42 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        43 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        44 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        45 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        46 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        47 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        48 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        49 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        50 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        51 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        52 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        53 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        54 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        55 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        56 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        57 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        58 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        59 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        60 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        61 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        62 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        63 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        64 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        65 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        66 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        67 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        68 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        69 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        70 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        71 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        72 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        73 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        74 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        75 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        76 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        77 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        78 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        79 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        80 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        81 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        82 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        83 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        84 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        85 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        86 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        87 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        88 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        89 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        90 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        91 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        92 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        93 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        94 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        95 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        96 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        97 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        98 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv        99 <PredictionRegr>        0      0
 temp_train regr.rpart   repeated_cv       100 <PredictionRegr>        0      0
    task_id learner_id resampling_id iteration  prediction_test warnings errors

Evaluation

mlr_measures

<DictionaryMeasure> with 65 stored values
Keys: aic, bic, classif.acc, classif.auc, classif.bacc, classif.bbrier,
  classif.ce, classif.costs, classif.dor, classif.fbeta, classif.fdr,
  classif.fn, classif.fnr, classif.fomr, classif.fp, classif.fpr,
  classif.logloss, classif.mauc_au1p, classif.mauc_au1u,
  classif.mauc_aunp, classif.mauc_aunu, classif.mauc_mu,
  classif.mbrier, classif.mcc, classif.npv, classif.ppv, classif.prauc,
  classif.precision, classif.recall, classif.sensitivity,
  classif.specificity, classif.tn, classif.tnr, classif.tp,
  classif.tpr, debug_classif, internal_valid_score, oob_error,
  regr.bias, regr.ktau, regr.mae, regr.mape, regr.maxae, regr.medae,
  regr.medse, regr.mse, regr.msle, regr.pbias, regr.pinball, regr.rae,
  regr.rmse, regr.rmsle, regr.rrse, regr.rse, regr.rsq, regr.sae,
  regr.smape, regr.srho, regr.sse, selected_features, sim.jaccard,
  sim.phi, time_both, time_predict, time_train

my_measures = c(mlr3::msr("regr.rmse"), mlr3::msr("regr.rsq"))

Evaluation

score_cv_rpart = rr_cv_rpart$score(measures = my_measures)
head(score_cv_rpart)

      task_id learner_id resampling_id iteration regr.rmse       rsq
       <char>     <char>        <char>     <int>     <num>     <num>
1: temp_train regr.rpart   repeated_cv         1  0.981040 0.8608386
2: temp_train regr.rpart   repeated_cv         2  1.115505 0.8413669
3: temp_train regr.rpart   repeated_cv         3  1.115693 0.8336519
4: temp_train regr.rpart   repeated_cv         4  1.161596 0.8258320
5: temp_train regr.rpart   repeated_cv         5  1.108825 0.8525720
6: temp_train regr.rpart   repeated_cv         6  1.115927 0.7920166
Hidden columns: task, learner, resampling, prediction_test

hist(score_cv_rpart$regr.rmse)

mean(score_cv_rpart$regr.rmse)

[1] 1.155105

mean(score_cv_rpart$rsq)

[1] 0.8191018

Prediction

learner_rpart$train(task)

pred_rpart = terra::predict(predictors, model = learner_rpart, na.rm = TRUE)
plot(pred_rpart)

Spatial resampling (cross-validation)

Source: mlr3spatiotempcv

Id	Method
A	spcv_block
B	spcv_coords
C	spcv_env
D	spcv_disc
E	spcv_tiles
F	spcv_buffer
	spcv_knndm
	spcv_env

Each of the methods has a repeated_ version

Also: there are additional methods for spatio-temporal data

Spatial resampling (cross-validation)

spcv_resampling = mlr3::rsmp("repeated_spcv_coords", folds = 5, repeats = 20)

rr_spcv_rpart = mlr3::resample(task = task,
                               learner = learner_rpart,
                               resampling = spcv_resampling)
rr_spcv_rpart

<ResampleResult> with 100 resampling iterations
    task_id learner_id        resampling_id iteration  prediction_test warnings
 temp_train regr.rpart repeated_spcv_coords         1 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords         2 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords         3 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords         4 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords         5 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords         6 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords         7 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords         8 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords         9 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        10 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        11 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        12 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        13 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        14 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        15 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        16 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        17 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        18 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        19 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        20 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        21 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        22 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        23 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        24 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        25 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        26 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        27 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        28 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        29 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        30 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        31 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        32 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        33 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        34 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        35 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        36 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        37 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        38 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        39 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        40 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        41 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        42 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        43 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        44 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        45 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        46 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        47 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        48 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        49 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        50 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        51 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        52 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        53 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        54 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        55 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        56 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        57 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        58 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        59 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        60 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        61 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        62 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        63 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        64 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        65 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        66 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        67 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        68 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        69 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        70 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        71 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        72 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        73 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        74 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        75 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        76 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        77 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        78 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        79 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        80 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        81 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        82 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        83 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        84 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        85 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        86 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        87 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        88 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        89 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        90 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        91 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        92 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        93 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        94 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        95 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        96 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        97 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        98 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords        99 <PredictionRegr>        0
 temp_train regr.rpart repeated_spcv_coords       100 <PredictionRegr>        0
    task_id learner_id        resampling_id iteration  prediction_test warnings
 errors
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
 errors

Spatial resampling (cross-validation)

score_spcv_rpart = rr_spcv_rpart$score(measures = my_measures)
head(score_spcv_rpart)

      task_id learner_id        resampling_id iteration regr.rmse       rsq
       <char>     <char>               <char>     <int>     <num>     <num>
1: temp_train regr.rpart repeated_spcv_coords         1 1.3158684 0.7673390
2: temp_train regr.rpart repeated_spcv_coords         2 0.9992242 0.8144015
3: temp_train regr.rpart repeated_spcv_coords         3 1.2083355 0.7613023
4: temp_train regr.rpart repeated_spcv_coords         4 1.0921883 0.4605314
5: temp_train regr.rpart repeated_spcv_coords         5 1.2048538 0.2331354
6: temp_train regr.rpart repeated_spcv_coords         6 1.3081681 0.7129704
Hidden columns: task, learner, resampling, prediction_test

hist(score_spcv_rpart$regr.rmse)

# non-spatial RMSE: 1.14
mean(score_spcv_rpart$regr.rmse)

[1] 1.175743

# non-spatial R2: 0.82
mean(score_spcv_rpart$rsq)

[1] 0.5751845

Machine learning workflow

Feature engineering; Hyperparameter tuning and feature selection

Feature engineering:

Spectral indices
Terrain indices (land surface parameters)
Texture indices
Directional variables
Coordinates and other proximity variables (spatial proxies, e.g., see Milà et al. (2024), https://doi.org/10.5194/gmd-17-6007-2024)
Aggregates (e.g., road density)

Hyperparameter tuning and feature selection:

Deciding on the best hyperparameters values (e.g., see Schratz et al. 2019, https://doi.org/10.1016/j.ecolmodel.2019.06.002)
Forward feature selection (ffs) (e.g., see Meyer et al. 2019, https://doi.org/10.1016/j.ecolmodel.2019.108815)
This is also closely related to cross-validation

Variable importance

importance = learner_rpart$importance()
importance

 lst_night        dem    lst_day      coast       ndvi    popdens 
1157.36732  891.79205  436.44619  206.08428  182.92553   49.37387

library(ggplot2)
imp_df = data.frame(variable = names(importance), importance = importance)
ggplot(imp_df, aes(x = reorder(variable, importance), y = importance)) +
  geom_col() +
  coord_flip()

Area of applicability

library(CAST)
AOA = aoa(predictors, train = st_drop_geometry(temp_train),
          variables = task$feature_names, weight = data.frame(t(importance)),
          verbose = FALSE)
plot(AOA)

Area of applicability

plot(AOA[[2]])

plot(AOA[[3]])

Explainability

library(DALEX)
library(DALEXtra)
regr_exp = DALEXtra::explain_mlr3(learner_rpart,
                                  data = st_drop_geometry(temp_train)[-1],
                                  y = temp_train$temp)

Preparation of a new explainer is initiated
  -> model label       :  R6  (  default  )
  -> data              :  195  rows  6  cols 
  -> target variable   :  195  values 
  -> predict function  :  yhat.LearnerRegr  will be used (  default  )
  -> predicted values  :  No value for predict function target column. (  default  )
  -> model_info        :  package mlr3 , ver. 0.21.1 , task regression (  default  ) 
  -> predicted values  :  numerical, min =  8.157678 , mean =  15.10157 , max =  18.14362  
  -> residual function :  difference between y and yhat (  default  )
  -> residuals         :  numerical, min =  -3.09541 , mean =  -1.829466e-17 , max =  2.473572  
  A new explainer has been created!

Explainability (global)

regr_exp_profiles = model_profile(regr_exp)
plot(regr_exp_profiles)

Explainability (local)

my_obs = st_drop_geometry(temp_train)[42, ]
plot(predict_parts(regr_exp, new_observation = my_obs))

Explainability (local)

plot(predict_profile(regr_exp, my_obs))

Explainability (spatial)

source("R/predict_parts_spatial.R")
regr_pps = predict_parts_spatial(predictors, regr_exp, maxcell = 2000)
plot(regr_pps)

Completing the draft

Interpolation or extrapolation or knowledge discovery
Sampling
Understanding the variables
Other types of models
Evaluation measures
Uncertainty
Ensemble models
Spatio-temporal models
Machine learning vs deep learning

Contact

Mastodon: fosstodon.org/@nowosad

Website: https://jakubnowosad.com

Materials

Slides: https://jakubnowosad.com/IIIRqueR_workshop