Exploring spatial autocorrelation
and variable importance
in machine learning models
using patternograms




Jakub Nowosad, https://jakubnowosad.com/

European Conference of Ecological Modelling 2023

2023-09-05

Introduction

How to detect and describe a range of spatial similarity (spatial autocorrelation) for multiple variables?

Nowosad and Stepinski, 2018, https://doi.org/10.1016/j.jag.2018.03.004

Nowosad and Stepinski, 2022, https://doi.org/10.1016/j.jag.2022.102935

Patternogram creation

Main goal: to detect spatial autocorrelation for many data structures.

  • Step 1. (Only for rasters) Extract values from a random sample of locations in the data.

Patternogram creation

  • Step 2. Calculate the spatial distance between each pair of points/locations.
  • Step 3. Create a distance (dissimilarity) matrix between the values of each pair of points/locations.

Patternogram creation

  • Step 4. Divide the distances between the pairs of points/locations into ~15 breaks (groups). Then, calculate the average dissimilarity in each break.
  • Step 5. Visualize the relationship between the spatial distance and the average dissimilarity.

Inspired by variogram

Variogram model

  • Type: exponential
  • Sill of the variogram model: 0.025
  • Range: 100000

Inspired by variogram

Variogram model

Simulation

Inspired by variogram

Simulation

Comparison

Patternograms properly mimic both modeled and empirical relations of values vs distances.

This is not surprising, but suggests that other dissimilarity measures (than semivariance) may be applied.

Classification

Response variable

Simulated data

Predictors

Selected WorldClim bioclimatic variables

Classification

Accuracy of a random forest model: 0.85

Simulated data

Predictions

Repeated coordinate-based k-means clustering resampling

Classification

Variable importance:

bio12 bio13  bio5 bio10  bio7  bio2  bio1 bio11  bio6 
51.85 22.32 21.03 16.14 12.11 11.47  6.97  6.32  5.91 

Two additional steps:

  1. Scale the predictors values
  2. Multiply the scaled predictors’ values by the importance of each predictor

Classification

A patternogram of all of the model’s predictors

Classification

Patternograms of the model’s predictors for different predicted categories

Regression

Patternograms can be also calculated for regression problems:

Other applications of patternogram (#1)

To compare spatial autocorrelation of variables over time.

WorldClim version 2.1 climate data
for 1970-2000

CMIP6 downscaled future climate projection for 2061-2080 [model: CNRM-ESM2-1; ssp: “585”]

Minimum temperature (°C)

Other applications of patternogram (#1)

Other applications of patternogram (#2)

To analyze spatial autocorrelation of categorical spatial patterns.

Co-occurrence matrix spatial signature

Other applications of patternogram (#2)

Patternogram of the co-occurrence matrix spatial signature

Look at the shorter distances

Other applications of patternogram (#3)

Fourcade, Besnard, and Secondi, 2018, https://doi.org/10.1111/geb.12684

Behrens and Viscarra Rossel, 2020, https://doi.org/10.1038/s41598-020-73773-y:

“In spatial modelling with machine learning, using a sufficient number of meaningless (or structurally independent) predictors, with a similar range (or similar scale) to that of the response variable will produce accurate model evaluation statistics.”

Summary

  • The patternogram method: a new approach to quantifying and visualizing spatial autocorrelation
  • It can be used to:
    • explore spatial autocorrelations of predictors in machine learning models
    • detect spatial autocorrelation in various data structures
    • compare the spatial autocorrelation of variables over time
    • investigate spatial autocorrelation of categorical spatial patterns
  • Other dissimilarity measures can already be used; other sampling schemes will be added
  • This is still a work-in-progress – suggestions are welcomed
  • The method is implemented in the R package patternogram, available at https://github.com/Nowosad/patternogram.

Contact

Resources

This work was supported the grant 038/04/NP/0020 funded by the Initiative of Excellence - Research University project at Adam Mickiewicz University, Poznan.