Describing and comparing spatial patterns

Jakub Nowosad, https://jakubnowosad.com/

EON Summer School 2024

2024-09-05

Hello, I am Jakub

Website: https://jakubnowosad.com/

Computational geographer working at the intersection between geocomputation and the environmental sciences
Associate professor at the Adam Mickiewicz University, Poznan, Poland; Visiting Scientist, University of Münster, Germany
Main scientific interest: spatial pattern analysis, focusing on quantifying and understanding patterns in environmental data
Co-author of a few books and textbooks on geocomputation and spatial analysis with R and Python, including “Geocomputation with R”
Creator and contributor of various R packages for spatial data processing and visualization
Educator, teaching courses on geocomputation, spatial analysis, programming, and data visualization

Acknowledgments

My family

Space Informatics Lab

The r-spatialecology organization

Many open-source contributors and inspirations

Spatial patterns

Discovering and describing patterns is a vital part of many spatial analysis However, spatial data is gathered in many ways and stored in forms, which requires different approaches to understanding spatial patterns

https://www.ft.com/content/a2901ce8-5eb7-4633-b89c-cbdf5b386938

https://climate.copernicus.eu/copernicus-2020-warmest-year-record-europe-globally-2020-ties-2016-warmest-year-recorded

Spatial patterns of categorical rasters

Discovering and describing spatial patterns is an important part of many geographical studies, and spatial patterns are linked to natural and social processes.

Evaluation of the susceptibility of forest landscapes to agricultural expansion

Bourgoin et al., 2020, 10.1016/j.jag.2019.101958

Quantification of diversity and racial segregation

Dmowska et at., 2020, j.apgeog.2020.102239

Reinterpretation of histological images as categorized rasters and their use for disease classification (e.g., liver cancer)

Kendall et al., 2020, 10.1038/s41598-020-74691-9

Quantification of categorical spatial patterns

Spatial patterns can be quantified using landscape metrics (O’Neill et al. 1988; Turner and Gardner 1991; Li and Reynolds 1993; He et al. 2000; Jaeger 2000; Kot i in. 2006; McGarigal 2014).

Software such as FRAGSTATS, GuidosToolbox, or landscapemetrics has proven useful in many scientific studies (> 12,000 citations).

There is a relationship between an area’s pattern composition and configuration and ecosystem characteristics, such as vegetation diversity, animal distributions, and water quality within this area (Hunsaker i Levine, 1995; Fahrig i Nuttle, 2005; Klingbeil i Willig, 2009; Holzschuh et al., 2010; Fahrig et al., 2011; Carrara et al., 2015; Arroyo-Rodŕıguez et al. 2016; Duflot et al., 2017, many others..)

Example data

Land cover data for the year 2016 from the CCI-LC project
Simplified into nine main categories

Example data

Land cover data for the year 2016 from the CCI-LC project
Simplified into nine main categories
Partitioned into 30 x 30 kilometers square blocks
13,909 categorical rasters (100x100 cells)

Example data

I randomely selected 16 rasters with different proportions of forest (green) areas:

Landscape metrics

In the last 40 or so years, several hundred different landscape metrics were developed
They quantify the composition and configuration of spatial patterns of categorical rasters
General assumption is that the spatial pattern of a landscape influences the processes that occur within it
Landscape metrics can be calculated for three different levels: patch, class, and landscape (here we focus on the landscape level)
They can be divided into several groups: (1) area and edge metrics, (2) shape metrics, (3) core metrics, (4) aggregation metrics, (5) diversity metrics, (6) complexity metrics

Important considerations:

Scale: the size of the area over which the metrics are calculated
Extent: the borders of the study area
Spatial resolution: the size of the raster cells
Categorization: the categories used in the analysis
Redundancy: many metrics are highly correlated

Helpful resources:

Landscape metrics

SHDI - Shannon’s diversity index - takes both the number of classes and the abundance of each class into account; larger values indicate higher diversity
AI - Aggregation index - from 0 for maximally disaggregated to 100 for maximally aggregated classes

SHDI:

AI:

Code examples

Chapter “The landscapemetrics and motif packages for measuring landscape patterns and processes”

library(landscapemetrics)
library(terra)
r9 = rast("exdata/r9.tif")
r1 = rast("exdata/r1.tif")
plot(r1); plot(r9)

lsm_l_shdi(r9)

# A tibble: 1 × 6
  layer level     class    id metric value
  <int> <chr>     <int> <int> <chr>  <dbl>
1     1 landscape    NA    NA shdi    1.06

lsm_l_ai(r9)

# A tibble: 1 × 6
  layer level     class    id metric value
  <int> <chr>     <int> <int> <chr>  <dbl>
1     1 landscape    NA    NA ai      82.1

calculate_lsm(r9, what = c("lsm_l_shdi", "lsm_l_ai"))

# A tibble: 2 × 6
  layer level     class    id metric value
  <int> <chr>     <int> <int> <chr>  <dbl>
1     1 landscape    NA    NA ai     82.1 
2     1 landscape    NA    NA shdi    1.06

two_r = list(r1, r9)
calculate_lsm(two_r, what = c("lsm_l_shdi", "lsm_l_ai"))

# A tibble: 4 × 6
  layer level     class    id metric   value
  <int> <chr>     <int> <int> <chr>    <dbl>
1     1 landscape    NA    NA ai     98.7   
2     1 landscape    NA    NA shdi    0.0811
3     2 landscape    NA    NA ai     82.1   
4     2 landscape    NA    NA shdi    1.06

Code examples

mat_window = matrix(1, nrow = 11, ncol = 11)
w_result = window_lsm(r9, window = mat_window, what = "lsm_l_ai")
plot(r9); plot(w_result$layer_1$lsm_l_ai)

https://r-spatialecology.github.io/landscapemetrics/

# A tibble: 133 × 5
   metric name                                type           level function_name
   <chr>  <chr>                               <chr>          <chr> <chr>        
 1 area   patch area                          area and edge… patch lsm_p_area   
 2 cai    core area index                     core area met… patch lsm_p_cai    
 3 circle related circumscribing circle       shape metric   patch lsm_p_circle 
 4 contig contiguity index                    shape metric   patch lsm_p_contig 
 5 core   core area                           core area met… patch lsm_p_core   
 6 enn    euclidean nearest neighbor distance aggregation m… patch lsm_p_enn    
 7 frac   fractal dimension index             shape metric   patch lsm_p_frac   
 8 gyrate radius of gyration                  area and edge… patch lsm_p_gyrate 
 9 ncore  number of core areas                core area met… patch lsm_p_ncore  
10 para   perimeter-area ratio                shape metric   patch lsm_p_para   
# ℹ 123 more rows

Sampling around points of interest
Moving window calculations
Calculating landscape metrics for irregular areas
Visualizations
More…

Exercises

Read the data from exdata/lc_small.tif and visualize it. What is the location of the data? What are the extent of the data and its spatial resolution? How many categories it contains?
Calculate Aggregation Index (AI) for the raster. Interpret the results.
Calculate Total Edge (TE) for the raster. Interpret the results. Next, read the data from exdata/lc_small2.tif, calculate AI and TE for this raster, and compare the results with the previous raster.
Calculate Total Edge (TE) for the raster, but this time in a moving window of 9 by 9 cells. Visualize the results.
(Extra) Using the read_sf() function from the sf package read the exdata/points.gpkg file. Next, calculate SHDI and AI of an area of 3000 meters from each sampling point (see the sample_lsm() function).

Landscape metrics

Problem no. 1: which of the hundreds of spatial metrics should we choose?
Problem no. 2: many landscape metrics are highly correlated…

PCA of landscape metrics

We performed a principal component analysis (PCA) using 17 landscape-level metrics:

Type	Landscape-level metrics
Shape	PAFRAG; CONTIG AM; CONTIG RA
Aggregation	AI; CONTAG; IJI; PLATJ; PD; DIVISION; LPI
Connectivity	COHESION
Diversity	SHDI; SIDI; MSIDI; SHEI; SIEI; MSIEI

First two principal components explained ~71% of variability

PC1:

PC2:

PCA of landscape metrics

The result allows to distinguish between:

simple and complex rasters (left<->right)
fragmented and consolidated rasters (bottom<->top)

However, there are still some problems here…

PCA of landscape metrics

We performed a second PCA using data from the United Kingdom only
Next, we predicted the results on the data for the whole Europe

PC1:

PC2:

PCA of landscape metrics

Issues with the PCA approach:

Each new dataset requires recalculation of both, landscape metrics and principal components analysis (PCA)
Highly correlated landscape metrics are used
PCA results interpretation is not straightforward

IT metrics

Five information theory metrics based on a co-occurrence matrix exist (Nowosad and Stepinski, 2019, https://doi.org/10.1007/s10980-019-00830-x)
Marginal entropy [H(x)] - diversity (composition) of spatial categories - from monothematic patterns to multithematic patterns
Relative mutual information [U] - clumpiness (configuration) of spatial categories from fragmented patterns to consolidated patterns)
H(x) and U are uncorrelated

Entropy:

Relative mutual information:

IT metrics

2D parametrization of categorical rasters’ configurations based on two weakly correlated IT metrics groups similar patterns into distinct regions of the parameters space

IT metrics - final results

Land cover data:

Parametrization using two IT metrics:

Exercises

The marginal entropy and relative mutual information can be calculated using the landscapemetrics package’s functions: lsm_l_ent() and lsm_l_relmutinf(). Calculate both of these metrics for the exdata/lc_small.tif raster.
Read the exdata/lc_europe.tif raster using rast() from the terra package and the exdata/polygons.gpkg vector data using the read_sf() function from the sf package. Calculate the marginal entropy and relative mutual information for each polygon using the sample_lsm() function.
Join the calculated values with the polygons (see https://r-spatialecology.github.io/landscapemetrics/articles/irregular_areas.html for more details).
Calculate SHDI and AI for the polygons. Compare the values of SHDI and AI with the marginal entropy and relative mutual information (e.g., using a scatterplot or by calculating the correlation coefficient). Are the results similar?
(Extra) Create your own polygonal grid using st_make_grid() function from the sf package for the area from the exdata/polygons.gpkg file. Calculate the marginal entropy and relative mutual information for each square using the sample_lsm() function. Visualize the results.

IT metrics

These metrics still leave some questions open…

Relative mutual information is a result of dividing mutual information by entropy. What to do when the entropy is zero?
How to incorporate the meaning of categories into the analysis?

Parametrization using two IT metrics:

Pattern-based spatial analysis

In recent years, the ideas of analyzing spatial patterns have been extended through an approach called pattern-based spatial analysis (Long in in. 2010; Cardille in in. 2010; Cardille in in. 2012; Jasiewicz i in. 2013; Jasiewicz i in. 2015).

The fundamental idea is to divide data into a large number of smaller areas (local landscapes).

Next, represent each area using a statistical description of the spatial pattern - a spatial signature.

Spatial signatures can be compared using a large number of existing distance or dissimilarity measures (Lin 1991; Cha 2007).

This approach enables spatial analyses such as searching, change detection, clustering, or segmentation.

Spatial signatures

Most landscape metrics are single numbers representing specific features of a local landscape.

Spatial signatures, on the other hand, are multi-element representations of landscape composition and configuration.

The basic signature is the co-occurrence matrix:

	agriculture	forest	grassland	water
agriculture	272	218	4	0
forest	218	38778	32	12
grassland	4	32	16	0
water	0	12	0	2

Spatial signatures

A spatial signature should allow simplification to the form of a normalized vector.

Co-occurrence vector (cove):


272	218	4	0	218	38778	32	12	4	32	16	0	0	12	0	2

Co-occurrence vector (cove):


136	218	19389	4	32	8	0	12	0	1

Normalized co-occurrence vector (cove):


0.0069	0.011	0.9792	0.0002	0.0016	0.0004	0	0.0006	0	0.0001

Dissimilarity measures

Measuring the distance between two signatures in the form of normalized vectors allows determining dissimilarity between spatial structures.


0.0069	0.011	0.9792	0.0002	0.0016	0.0004	0	0.0006	0	0.0001


0.1282	0.0609	0.8105	0.0002	0.0002	0.0001	0	0	0	0

\[JSD(A, B) = H(\frac{A + B}{2}) - \frac{1}{2}[H(A) + H(B)]\]

Jensen-Shannon distance between the above rasters: 0.0684

Dissimilarity measures

Measuring the distance between two signatures in the form of normalized vectors allows determining dissimilarity between spatial structures.


0.0069	0.011	0.9792	0.0002	0.0016	0.0004	0	0	0	0	0	0.0006	0	0	0.0001


0.2033	0.1335	0.2944	0.1747	0.0562	0.1307	0.0035	0.0002	0.0004	0.0015	0.0007	0.0005	0	0	0.0005

\[JSD(A, B) = H(\frac{A + B}{2}) - \frac{1}{2}[H(A) + H(B)]\]

Jensen-Shannon distance between the above rasters: 0.444

Pattern-based spatial analysis

Knowing the distance between spatial signatures can be used in several contexts (Nowosad, 2021, 10.1007/s10980-020-01135-0):

one-to-many

finding similar spatial structures

one-to-one

quantitative assessment of changes in spatial structures

many-to-many

clustering similar spatial structures

One-to-many

Finding areas with similar topography to the Suwalski Landscape Park.

One-to-one

The map above shows that many areas in the Amazon have undergone significant land cover changes between 1992 and 2018.

The challenge now is to determine which areas have changed the most.

One-to-one

Areas with the greatest change have the highest dissimilarity values.

Importantly, changes in both category and spatial configuration are measured.

Many-to-many

Areas in Africa with similar spatial structures for two themes have been identified - land cover and landforms.

Many-to-many

The quality of each cluster can be assessed using metrics:

Intra-cluster heterogeneity: determines distances between all landscapes within a group
Inter-cluster isolation: determines distances between a given group and all others

Code examples

library(terra)
library(motif)
r9 = rast("exdata/r9.tif")
r9_sign_coma = lsp_signature(r9, type = "coma")
r9_sign_coma

# A tibble: 1 × 3
     id na_prop signature    
* <int>   <dbl> <list>       
1     1       0 <int [7 × 7]>

r9_sign_coma$signature

[[1]]
     1     2    3   4  5 6   7
1 7162  1176  998 129  4 5  30
2 1176 21110  984 104 12 2  63
3  998   984 3198  79  0 1  88
4  129   104   79 234  0 0   0
5    4    12    0   0 12 0   0
6    5     2    1   0  0 0   0
7   30    63   88   0  0 0 534

r9_sign = lsp_signature(r9, type = "cove")
r9_sign

# A tibble: 1 × 3
     id na_prop signature     
* <int>   <dbl> <list>        
1     1       0 <dbl [1 × 28]>

r9_sign$signature

[[1]]
          [,1]       [,2]      [,3]       [,4]       [,5]       [,6]
[1,] 0.1808586 0.05939394 0.5330808 0.05040404 0.04969697 0.08075758
            [,7]        [,8]        [,9]       [,10]        [,11]        [,12]
[1,] 0.006515152 0.005252525 0.003989899 0.005909091 0.0002020202 0.0006060606
     [,13] [,14]        [,15]        [,16]        [,17]         [,18] [,19]
[1,]     0     0 0.0003030303 0.0002525253 0.0001010101 0.00005050505     0
     [,20] [,21]       [,22]       [,23]       [,24] [,25] [,26] [,27]
[1,]     0     0 0.001515152 0.003181818 0.004444444     0     0     0
          [,28]
[1,] 0.01348485

Code examples

landcover = rast(system.file("raster/landcover2015s.tif", package = "motif"))
ecoregions = read_sf(system.file("vector/ecoregionss.gpkg", package = "motif"))
ecoregion1 = ecoregions[1, ]
landcover1 = crop(landcover, ecoregion1, mask = TRUE)
plot(landcover);plot(landcover1)

search_result = lsp_search(landcover1, landcover, 
           type = "cove", dist_fun = "jensen-shannon", window = 25,
           output = "sf")
plot(search_result["dist"])

search_result$id[which.min(search_result$dist)]

[1] 75

search_results_75 = lsp_extract(landcover, window = 25, id = 75)
plot(search_results_75)

Exercises

Read the polygon with the study area from the exdata/harz_borders.gpkg file using the read_sf() function from the sf package.
Read the raster data with the land cover for Europe from the exdata/lc_europe.tif file using the rast() function from the terra package. Visualize both datasets.
Crop and mask the raster to the borders of the polygon. Visualize the results.
Calculate a spatial signature for the study area. Can you understand its meaning?
Find which areas from the Europe raster are the most similar to the study area (it make take a minute or so). Try a few different window sizes (e.g., 200 or 500).

Examples of applications

Using spatial signatures to compare and evaluate digital soil maps (Rossiter et al., 2022, 10.5194/soil-8-559-2022)

Changes in spatial patterns resulting from the inclusion of small woody elements in land use maps (Golicz et al., 2021, 10.3390/land10101028)

Questions and challenges

Depending on the problem:

Is the categorical raster the type of data we should use?
Do raster cells represent objects, or do they are the result of some classification/aggregation?
How to incorporate the meaning of categories into the analysis?
How to decide on the extent of the study area?
What is the optimal data resolution?
What is the scale of the process we want to study? How to decide which scale is valid?
…

Challenges:

How to use these ideas for supervised learning?
To what extent can modern interpolation/machine learning methods preserve spatial patterns?
Other signatures, their selection, and application: signatures for multiple spatial scales or signatures for multiple information layers (e.g., Remmel and Csillag, 2006, 10.1080/01431160500419303, Dmowska et al., 2020, 10.1016/j.apgeog.2020.102239, Nowosad and Stepinski, 2021, 10.1080/13658816.2021.1893324)
Selection of appropriate distance measures (for the given problem, data, and used signature)
Development of methods and tools for working with continuous data (e.g., Netzel et al., 2018, 10.5281/zenodo.3907385 https://zenodo.org/record/1291123, Nowosad and Stepinski, 2022, 10.1016/j.jag.2022.102935, Nowosad, 2022, https://github.com/Nowosad/spquery)

Spatial patterns of non-categorical rasters

Non-categorical rasters

library(terra); library(sf)
past = rast("data/past.tif")
cmip_tmin_pl = rast("data/cmip_tmin_pl.tif")

WorldClim version 2.1 climate data
for 1970-2000

CMIP6 downscaled future climate projection for 2061-2080 [model: CNRM-ESM2-1; ssp: “585”]

Minimum temperature (°C)

{spquery}

https://jakubnowosad.com/spquery/

How to find and compare areas with similar spatial patterns in non-categorical rasters (e.g., raster time-series)?

library(spquery)

pzn = st_sf(st_sfc(st_point(c(16.934167, 52.408333)), crs = "EPSG:4326"))
vec = as.numeric(extract(past, pzn, ID = FALSE))
search_tmin = spq_search(vec, cmip_tmin_pl, dist_fun = "euclidean")

Compare

compare_tmin = spq_compare(past, cmip_tmin_pl, dist_fun = "euclidean")

{patternogram}

https://jakubnowosad.com/patternogram/, https://jakubnowosad.com/ecem-2023

How to detect and describe a range of spatial similarity (spatial autocorrelation) for multiple variables?

It can be used to:

explore spatial autocorrelations of predictors in machine learning models
detect spatial autocorrelation in various data structures
compare the spatial autocorrelation of variables over time
investigate spatial autocorrelation of categorical spatial patterns

{patternogram}

library(patternogram)
pg1 = patternogram(past, sample_size = 1000)
pg2 = patternogram(cmip_tmin_pl, sample_size = 1000)

{supercells}

https://jakubnowosad.com/supercells/, https://jakubnowosad.com/foss4g-2022/

supercells: an extension of SLIC (Simple Linear Iterative Clustering; Achanta et al. (2012), doi:10.1109/TPAMI.2012.120) that can be applied to non-imagery geospatial rasters that carry:

pattern information (co-occurrence matrices)
compositional information (histograms)
time-series information (ordered sequences)
other forms of information for which the use of Euclidean distance may not be justified

Segmentation/regionalization: partitioning space into smaller segments while minimizing internal inhomogeneity and maximizing external isolation

A way to improve the output and reduce the cost of segmentation.

{supercells}

Great Britain. WorldClim gridded climate data was normalized to be between 0 and 1.

The goal: to regionalize Great Britain’s climates

{supercells}

Extended SLIC workflow uses the dynamic time warping (DTW) distance function rather than the Euclidean distance.

{supercells}

Extended SLIC: a more homogeneous regionalization.

Original SLIC: more isolated regions.

SLIC	Inhomogeneity	Isolation
extended	0.30	0.59
original	0.37	0.75

The raster of time series compressed from 24 dimensions to three principal components preserving 99% of variability.

Code examples

library(terra)
library(supercells)
# Version 1
mintemp_zones = supercells(cmip_tmin_pl, k = 150, compactness = 4)
plot(cmip_tmin_pl[[1]]); plot(mintemp_zones, add = TRUE, col = NA)
# Version 2
mintemp_zones = supercells(cmip_tmin_pl, k = 50, compactness = 4)
plot(cmip_tmin_pl[[1]]); plot(mintemp_zones, add = TRUE, col = NA)
# Version 3
mintemp_zones = supercells(cmip_tmin_pl, k = 150, compactness = 1)
plot(cmip_tmin_pl[[1]]); plot(mintemp_zones, add = TRUE, col = NA)

Summary

Landscape metrics are still a useful tool for quantification of spatial patterns in categorical raster data
Pattern-based spatial analysis, on the other hand, enables searching, change detection, clustering, and segmentation of spatial patterns in categorical raster data
Spatial patterns of non-categorical rasters (e.g., time-series) can be analyzed using the ideas of pattern-based spatial analysis, and also using new methods
Many methods are already implemented in open-source software
Methods and tools are made to solve scientific problems: what are we missing? What still needs to be done?

Contact

Mastodon: fosstodon.org/@nowosad

Website: https://jakubnowosad.com

Materials

Software: R packages landscapemetrics, motif, spquery, patternogram, supercells
Blog posts: https://jakubnowosad.com/posts.html#category=motif, https://jakubnowosad.com/posts.html#category=geopat, https://jakubnowosad.com/posts.html#category=segmentation, https://jakubnowosad.com/posts.html#category=landscape-ecology