| Evaluation approach | Lowland-area weight (%) | Highland-area weight (%) | Overall RMSE |
|---|---|---|---|
| Germany domain (target distribution) | 50 | 50 | 0.667 |
Jakub Nowosad (Adam Mickiewicz University, Poznań and University of Münster)
Machine Learning for Earth Observation 2026, Exeter, UK
2026-06-22

Reality:
We validate where we have data, but predict where we do not.
We have this:

We want to predict here:

We have this:

Our predictor distributions are similar here:

We have this:

We want to predict here:

We have this:

Our predictor distributions are a bit different here:




Identify areas where the environment is not well represented, making predictions less trustworthy (Area of Applicability – AoA, Meyer and Pebesma, 2021); also local point density (LPD, Schumacher et al., 2025)



Reality:
We validate where we have data, but predict where we do not.
Reality:
The validation strategy should follow the prediction task.






k-Nearest Neighbor Distance Matching (kNNDM, Linnenbrink et al., 2024) matches folds to the prediction scenario using distance structure (either in geographic or predictor space).


Reality:
We validate where we have data, but predict where we do not.
Reality:
The validation strategy should follow the prediction task.
Reality:
Prediction conditions are not equally common.




| Evaluation approach | Lowland-area weight (%) | Highland-area weight (%) | Overall RMSE |
|---|---|---|---|
| Germany domain (target distribution) | 50 | 50 | 0.667 |
| Preferential sample (unweighted) | 89 | 11 | 0.541 |
| Preferential sample (reweighted) | 50 | 50 | 0.667 |
Target-Weighted Cross-Validation (TWCV, Brenning and Suesse, 2026) adjusts cross-validation weights to align evaluation with the prediction domain rather than the sampled data distribution.

Reality:
We validate where we have data, but predict where we do not.
Possible solution: identify regions of reliable prediction.
Reality:
The validation strategy should follow the prediction task.
Possible solution: use adaptive evaluation strategies to create folds that resemble the prediction scenario.
Reality:
Prediction conditions are not equally common.
Possible solution: weight validation points according to their prevalence in the prediction area.

Area of applicability for different sampling designs


Evaluation results for different validation strategies


Effect of weighting validation points 
Open questions remain, including how to mix these three components together.
Also: these are three important components, not a complete theory of spatial ML evaluation.