the 28th AGILE conference
2025-06-11
Traditional machine learning models (e.g., SVM, RF, GBM) lack inherent spatial awareness
Ignoring spatial structure can lead to poor predictive performance, biased predictions, or poor generalization
Incorporating spatial information:
Moran’s I is used to assess spatial autocorrelation before and after modeling
\[ I = \frac{n}{\sum_{i}^{n} \sum_{j}^{n} w_{ij}} \times \frac{\sum_{i}^{n} \sum_{j}^{n} w_{ij} (x_i - \bar{x}) (x_j - \bar{x})}{\sum_{i}^{n} (x_i - \bar{x})^2} \]
Spatial weight defines which observations are considered neighbors.
Various types of spatial weights can be used — this decision affects the value of Moran’s I.
Three ranges of spatial autocorrelation (10, 50, 100 units)
\[ \phantom{x} \]
Three ranges of spatial autocorrelation (10, 50, 100 units)
\[ Y = X_1 + X_2 \cdot X_3 + X_4 + X_5 \cdot X_6 + \mathcal{E} \]
All repeated 100 times.
Random Forest modeling approach:
Total number of models: 2400
\[ RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} \]
\[ I = \frac{n}{\sum_{i}^{n} \sum_{j}^{n} w_{ij}} \times \frac{\sum_{i}^{n} \sum_{j}^{n} w_{ij} (x_i - \bar{x}) (x_j - \bar{x})}{\sum_{i}^{n} (x_i - \bar{x})^2} \]
Here, we focus on the residuals of the model predictions, and thus:
\[ x_i = y_i - \hat{y}_i \]
Eight closest cells or point samples were used to calculate the Moran’s I value.
Model 45: range 100, 500 random training samples, 500 random testing samples
Moran’s I is highly sensitive to spatial weight definitions (e.g., neighborhood choice) – please report it
In spatial ML, Moran’s I can be useful for assessing the spatial autocorrelation of residuals in the testing set
However, unlike RMSE, Moran’s I for the testing set does not reflect overall prediction performance
Instead, it is influenced by the sampling strategy and sample size (sampling density). It indicates how well the model captures spatial structure at the testing set — typically at a much finer scale than the resolution of the complete raster
Therefore, Moran’s I should not be used to compare performance across different studies. However, it may be useful for comparing models within the same study
Website: https://jakubnowosad.com
Mastodon: fosstodon.org/@nowosad
Slides: https://jakubnowosad.com/agile-gi2025/
Paper: https://doi.org/10.5194/agile-giss-6-40-2025
Code examples: https://github.com/Nowosad/moran-i-spatial-ml-prelim