
Keynote slides: https://jakubnowosad.com/ml4eo2026/
Workshop materials: https://jakubnowosad.com/ml4eo2026workshop/
Machine learning is now deeply embedded1 in Earth observation workflows, from mapping current environmental conditions to forecasting future change. However, the quality of a spatial prediction map cannot be judged only by how well a model performs on a convenient test sample. In spatial problems, the gap between where we have observations and where we want to make predictions is often a crucial factor in determining whether a model can be trusted.
At the Machine Learning for Earth Observation 2026 conference in Exeter2, I gave a keynote talk entitled Rethinking Validation for Spatial Machine Learning (June 22, 2026). The next day, I showed some practical ways to implement these ideas in a workshop called Where your models can be trusted: evaluating spatial machine learning reliably (June 23, 2026). Both focused on the same general question: how can we evaluate spatial machine learning in a way that reflects the actual prediction task?
The keynote was structured around three assumptions that are easy to make, but often unsafe in spatial prediction:
- We can predict everywhere. In practice, we validate where we have data, but predict in places that may be poorly represented by the training sample. Tools such as Area of Applicability (AoA) and Local Point Density (LPD) help identify parts of the prediction domain where environmental conditions are more or less supported by the available data.
- There is one “correct” validation approach. In reality, validation should follow the prediction task. Random cross-validation can be too optimistic when observations are spatially clustered, while spatial cross-validation can be too pessimistic when the intended prediction scenario is closer to interpolation. Adaptive strategies such as k-Nearest Neighbor Distance Matching (kNNDM) try to align validation folds with the distance structure of the prediction domain.
- All validation points are equal. Prediction conditions are not equally common across a map, so a single unweighted average error can misrepresent the expected performance over the full prediction domain. This motivates thinking about how validation samples should be weighted by their prevalence in the places where predictions will be used.
Together, these points lead to the idea of prediction-domain adaptive evaluation: first define the prediction domain, then construct validation folds that reflect it, and finally summarize performance in a way that accounts for how common different prediction conditions are. This is not a complete theory of spatial machine learning evaluation, but it is a useful step away from treating validation as a model-only problem. (To learn more about these ideas, read our preprint: https://arxiv.org/abs/2605.13689.)
The workshop turned these ideas into practical R workflows. Using synthetic and real-world-inspired examples, we used and discussed techniques for Area of Applicability, Local Point Density, compared random cross-validation, spatial cross-validation, and kNNDM cross-validation, and looked at error profiles. The hands-on materials also include exercises, where participants can compare validation strategies, map areas of applicability, and explore how expected error varies across space.
The main takeaway is simple: for spatial machine learning, the question is not only How accurate is the model? It is also Where can the model be trusted?
Footnotes
And embeddings are too, but that’s a story for another day↩︎
Many thanks to the organizers for inviting me to speak and for hosting a great event! The next edition of the conference will be in Exeter again in June 2027, and I highly recommend it to anyone interested in (broad) spatial machine learning.↩︎
Citation
@online{nowosad2026,
author = {Nowosad, Jakub},
title = {Rethinking {Validation} for {Spatial} {Machine} {Learning:}
{Takeaways} from the {Talk}},
date = {2026-07-03},
url = {https://jakubnowosad.com/posts/2026-07-03-ml4eo/},
langid = {en}
}