```
clc_diff = clc2018_tartu != clc2000_tartu
plot(clc_diff)
```

Methods for comparing spatial patterns in raster data

This is the fourth part of a blog post series on comparing spatial patterns in raster data. More information about the whole series can be found in part one.

This blog post focuses on the comparison of spatial patterns in categorical raster data for overlapping regions. In other words, here we have two rasters with the same number of rows and columns, and we want to compare their spatial patterns.

For this blog post, we use two categorical raster datasets: the CORINE Land Cover (CLC) datasets for Tartu (Estonia) for the years 2000 and 2018.

```
library(terra)
clc2000_tartu = rast("https://github.com/Nowosad/comparing-spatial-patterns-2024/raw/refs/heads/main/data/clc2000_tartu.tif")
clc2018_tartu = rast("https://github.com/Nowosad/comparing-spatial-patterns-2024/raw/refs/heads/main/data/clc2018_tartu.tif")
plot(clc2000_tartu, main = "Tartu (2000)")
plot(clc2018_tartu, main = "Tartu (2018)")
```

In short, the red areas in the rasters represent urban areas, the green areas represent forests, the blue areas represent water bodies, and the yellow areas represent agricultural land.

The simplest way to compare two categorical rasters is to create a binary raster that indicates whether the values of the two rasters are the same or different. Here, the output highlights the areas where the values of the two rasters differ (yellow) and where they are the same (purple).

```
clc_diff = clc2018_tartu != clc2000_tartu
plot(clc_diff)
```

To get a more detailed comparison, we can calculate the confusion matrix (also known as, e.g., contingency table) of the two rasters. It shows the number of pixels that have the same value in both rasters (diagonal) and the number of pixels with different values (off-diagonal).

```
cm = table(values(clc2000_tartu), values(clc2018_tartu))
cm
```

```
1 2 3 4 6 7
1 4821 357 2 10 0 0
2 1342 67915 684 389 0 0
3 59 415 33670 2896 9 22
4 122 638 1435 7040 229 24
6 0 3 13 20 2177 37
7 0 0 0 0 3 1995
```

For example, the table shows that there are 4821 pixels with the value 1 in both rasters, and 357 pixels with the value 2 in the first raster and the value 1 in the second raster.

Confusion matrices is also a building block for many other statistics, including accuracy, that can be calculated to compare the two sets of categorical data.

A binary difference raster can be also used to calculate the proportion of changed pixels by dividing the number of changed pixels by the total number of non-NA pixels.

```
clc_diff = clc2018_tartu != clc2000_tartu
changed_pixels = freq(clc_diff)$count[2]
na_pixels = freq(clc_diff, value = NA)$count
total_pixels = ncell(clc_diff) - na_pixels
proportion_changed = changed_pixels / total_pixels
proportion_changed
```

`[1] 0.06894013`

This outcome shows that about 0.07 of the pixels have changed between the two rasters.

The overall comparison of the two rasters can be done by calculating the difference between the frequencies of the values of the two rasters (Pontius 2002).

```
clc2000_tartu_freq = freq(clc2000_tartu)
clc2018_tartu_freq = freq(clc2018_tartu)
freq = merge(clc2000_tartu_freq, clc2018_tartu_freq, by = "value", all = TRUE)
freq$diff = abs(freq$count.x - freq$count.y)
sum_diff = sum(freq$diff)
na_pixels = freq(clc_diff, value = NA)$count
total_pixels = ncell(clc_diff) - na_pixels
1 - sum_diff / total_pixels
```

`[1] 0.9640774`

The overall comparison tells what is the percentage of pixels that have the same class in both rasters (however, it does not consider if the pixels are in the same location).

More statistics of the differences between the values of the two rasters can be calculated using the **diffeR** package (Pontius Jr. and Santacruz 2023). These statistics are based on the confusion matrix and include the overall allocation disagreement, overall difference, overall exchange disagreement, overall quantity disagreement, and overall shift disagreement.^{1}

```
library(diffeR)
clc_ct = crosstabm(clc2000_tartu, clc2018_tartu)
diffeR_df = data.frame(
overallAllocD = overallAllocD(clc_ct),
overallDiff = overallDiff(clc_ct),
overallExchangeD = overallExchangeD(clc_ct),
overallQtyD = overallQtyD(clc_ct),
overallShiftD = overallShiftD(clc_ct)
)
diffeR_df
```

```
overallAllocD overallDiff overallExchangeD overallQtyD overallShiftD
1 6440 8709 5280 2269 1160
```

To include the spatial context in the comparison, we can calculate the difference between a focal measure of two rasters. An example of such a measure is the relative mutual information (`relmutinf`

) metric, which quantifies the clumpiness of the landscape – the larger the value, the more clumped the landscape is [^See a blog post about this metric].

The below code chunk uses the **landscapemetrics** package (Hesselbarth et al. 2019) to specify a moving window of 5 by 5 and calculate the `relmutinf`

metric for the two rasters. Next, it calculates the absolute difference between the two rasters and plots the result.

```
library(landscapemetrics)
window = matrix(1, nrow = 5, ncol = 5)
clc2000_tartu_relmutinf_mw = window_lsm(clc2000_tartu, window = window,
what = "lsm_l_relmutinf")
clc2018_tartu_relmutinf_mw = window_lsm(clc2018_tartu, window = window,
what = "lsm_l_relmutinf")
clc2018_tartu_relmutinf_diff = abs(clc2018_tartu_relmutinf_mw[[1]][[1]] -
clc2000_tartu_relmutinf_mw[[1]][[1]])
plot(clc2018_tartu_relmutinf_diff)
```

The largest values in the output indicate the areas where the clumpiness of the landscape has changed the most between the two rasters.

Alternatively, if we calculate the regular difference, the output will show the areas where the clumpiness of the landscape has increased (positive values) and decreased (negative values).

```
plot_div = function(r, ...){
r_range = range(values(r), na.rm = TRUE, finite = TRUE)
max_abs = max(abs(r_range))
new_range = c(-max_abs, max_abs)
plot(r, col = hcl.colors(100, palette = "prgn"), range = new_range, ...)
}
clc2018_tartu_relmutinf_diff2 = clc2018_tartu_relmutinf_mw[[1]][[1]] -
clc2000_tartu_relmutinf_mw[[1]][[1]]
plot_div(clc2018_tartu_relmutinf_diff2)
```

The moving window approach is also used in the `raster.change()`

function from the **spatialEco** package (Evans and Murphy 2023). Its first two arguments are the two rasters to compare, the `s`

argument specifies the size of the moving window, and the `stat`

argument specifies the statistic to calculate. For example, `stat = "cross-entropy"`

calculates the cross-entropy loss function, where the larger the value, the more different the two rasters are.

```
library(spatialEco)
clc_ce = raster.change(clc2000_tartu, clc2018_tartu, s = 5,
stat = "cross-entropy")
plot(clc_ce)
```

*Note that the above calculation may take a few minutes to complete.*

Various statistics from categorical data can be calculated at multiple scales with the **waywiser** package (Mahoney 2023). Here, the `ww_multi_scale()`

function calculates the accuracy of the two rasters at different scales from 500 to 3000 meters (map units).

```
library(waywiser)
cell_sizes = seq(500, 3000, by = 500)
clc_multi_scale = ww_multi_scale(truth = as.factor(clc2000_tartu),
estimate = as.factor(clc2018_tartu),
metrics = list(yardstick::accuracy),
cellsize = cell_sizes,
progress = FALSE)
clc_multi_scale
```

```
# A tibble: 6 × 6
.metric .estimator .estimate .grid_args .grid .notes
<chr> <chr> <dbl> <list> <list> <list>
1 accuracy multiclass 0.781 <tibble [1 × 1]> <sf [6,400 × 5]> <tibble>
2 accuracy multiclass 0.607 <tibble [1 × 1]> <sf [1,600 × 5]> <tibble>
3 accuracy multiclass 0.453 <tibble [1 × 1]> <sf [729 × 5]> <tibble>
4 accuracy multiclass 0.358 <tibble [1 × 1]> <sf [400 × 5]> <tibble>
5 accuracy multiclass 0.246 <tibble [1 × 1]> <sf [256 × 5]> <tibble>
6 accuracy multiclass 0.270 <tibble [1 × 1]> <sf [196 × 5]> <tibble>
```

The output shows the accuracy of the two rasters at each scale: the largest value is at the scale of 500 meters, and the smallest value is at the scale of 3000 meters. It shows that with the increase of the scale, the agreement between the two rasters decreases – both rasters are similar at the local scale, but they differ at the regional one.

The **sabre** package (Nowosad and Stepinski 2018) provides a function to calculate a few measures of spatial association between two categorical maps.^{2}

```
library(sabre)
clc_sabre = vmeasure_calc(clc2000_tartu, clc2018_tartu)
clc_sabre
```

```
The SABRE results:
V-measure: 0.77
Homogeneity: 0.78
Completeness: 0.76
The spatial objects can be retrieved with:
$map1 - the first map
$map2 - the second map
```

Its output returns three values: the homogeneity, the completeness, and the V-measure. The homogeneity measures how well regions from the first map fit inside of regions from the second map, the completeness measures how well regions from the second map fit inside of regions from the first map, and the V-measure is the weighted harmonic mean of homogeneity and completeness. All of them range from 0 to 1, where larger values indicate better spatial agreement.

Additionally, the output contains two sets of maps of regions’ inhomogeneities (rih): the first set shows how the regions from the first map are inhomogenous in regard to the regions from the second map, and the second set shows how the regions from the second map are inhomogenous in regard to the regions from the first map.

```
plot(clc_sabre$map1[[2]])
plot(clc_sabre$map2[[2]])
```

Evans, Jeffrey S., and Melanie A. Murphy. 2023. *spatialEco*. Manual.

Hesselbarth, Maximilian H. K., Marco Sciaini, Kimberly A. With, Kerstin Wiegand, and Jakub Nowosad. 2019. “*Landscapemetrics* : An Open-Source *R* Tool to Calculate Landscape Metrics.” *Ecography* 42 (10): 1648–57. https://doi.org/gf4n9j.

Mahoney, Michael J. 2023. “Waywiser: Ergonomic Methods for Assessing Spatial Models.” https://doi.org/10.48550/arXiv.2303.11312.

Nowosad, J., and T. F. Stepinski. 2018. “Spatial Association Between Regionalizations Using the Information-Theoretical *V* -Measure.” *International Journal of Geographical Information Science* 32 (12): 2386–2401. https://doi.org/gf283f.

Pontius Jr., Robert Gilmore, and Ali Santacruz. 2023. *diffeR: Metrics of Difference for Comparing Pairs of Maps or Pairs of Variables*. Manual.

Pontius, R Gil. 2002. “Statistical Methods to Partition Effects of Quantity and Location During Comparison of Categorical Maps at Multiple Resolutions.” *Photogrammetric Engineering*.

And more, as can be found in the package documentation – see

`?diffeR::overallAllocD`

.↩︎Its output actually gives a few values and two maps, but the most important one is the V-measure.↩︎

BibTeX citation:

```
@online{nowosad2024,
author = {Nowosad, Jakub},
title = {Comparison of Spatial Patterns in Categorical Raster Data for
Overlapping Regions Using {R}},
date = {2024-11-03},
url = {https://jakubnowosad.com/posts/2024-11-03-spatcomp-bp4/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2024. “Comparison of Spatial Patterns in
Categorical Raster Data for Overlapping Regions Using R.”
November 3, 2024. https://jakubnowosad.com/posts/2024-11-03-spatcomp-bp4/.

Methods for comparing spatial patterns in raster data

This is the third part of a blog post series on comparing spatial patterns in raster data. More information about the whole series can be found in part one.

This blog post focuses on the comparison of spatial patterns in continuous raster data for arbitrary regions. Thus, the shown methods require two continuous rasters, which may have different extents, resolutions, etc. The outcome of such comparisons is, most often, a single value, which indicates the difference/similarity between the spatial patterns of the two rasters.

Three continuous raster datasets are used in this blog post: the Normalized Difference Vegetation Index (NDVI) datasets for Tartu (Estonia) for the years 2018 and 2023, and Poznań (Poland) for the year 2023.

```
library(terra)
ndvi2018_tartu = rast("https://github.com/Nowosad/comparing-spatial-patterns-2024/raw/refs/heads/main/data/ndvi2018_tartu.tif")
ndvi2023_tartu = rast("https://github.com/Nowosad/comparing-spatial-patterns-2024/raw/refs/heads/main/data/ndvi2023_tartu.tif")
ndvi2023_poznan = rast("https://github.com/Nowosad/comparing-spatial-patterns-2024/raw/refs/heads/main/data/ndvi2023_poznan.tif")
plot(ndvi2018_tartu, main = "Tartu (2018)")
plot(ndvi2023_tartu, main = "Tartu (2023)")
plot(ndvi2023_poznan, main = "Poznan (2023)")
```

Rasters consist of values, and thus, it seems possible to compare the distributions of these values. However, it may not be straightforward as they may have different lengths, ranges, etc. There are many possible ways to create distributions from rasters, but here I show just one:

- Extract the non-missing values from the rasters.
- Rescale the values to the range of 0 to 1.
- Bin the values to create histograms.
- Normalize the histogram counts to get probability distributions.

Importantly, the above approach involves many decisions, for example, should we use the minimum and maximum values of both rasters or each separately; how many bins should we use; etc.

```
# 1. Extract the non-missing values from the rasters
values1 = na.omit(values(ndvi2023_tartu)[, 1])
values2 = na.omit(values(ndvi2023_poznan)[, 1])
# 2. Scale the values to the range of 0 to 1
values1_rescaled = (values1 - min(values1)) / (max(values1) - min(values1))
values2_rescaled = (values2 - min(values2)) / (max(values2) - min(values2))
# 3. Bin the values to create histograms
bin_edges = seq(0, 1, length.out = 33)
hist1 = hist(values1_rescaled, breaks = bin_edges, plot = FALSE)
hist2 = hist(values2_rescaled, breaks = bin_edges, plot = FALSE)
# 4. Normalize the histogram counts to get probability distributions
prob1 = hist1$counts / sum(hist1$counts)
prob2 = hist2$counts / sum(hist2$counts)
```

Next, we can calculate the dissimilarity between the two distributions, for example, using the Kullback-Leibler divergence implemented in the **philentropy** package (HG 2018).^{1}

`philentropy::distance(rbind(prob1, prob2), method = "kullback-leibler")`

```
kullback-leibler
0.1033112
```

Lower values of the Kullback-Leibler divergence suggest that the distributions are more similar.

The above approach can be generalized with just one modification – the maximum and minimum values are external parameters.

```
get_min_max = function(rast_list){
min_v = min(vapply(rast_list,
FUN = function(r) min(na.omit(values(r)[, 1])),
FUN.VALUE = numeric(1)))
max_v = max(vapply(rast_list,
FUN = function(r) max(na.omit(values(r)[, 1])),
FUN.VALUE = numeric(1)))
return(c(min_v, max_v))
}
prepare_hist = function(r, min_v, max_v){
values_r = na.omit(values(r)[, 1])
values_r_rescaled = (values_r - min_v) / (max_v - min_v)
bin_edges = seq(0, 1, length.out = 33) # 32 bins
hist_r = hist(values_r_rescaled, breaks = bin_edges, plot = FALSE)
prob_r = hist_r$counts / sum(hist_r$counts)
return(prob_r)
}
min_max = get_min_max(list(ndvi2018_tartu, ndvi2023_tartu, ndvi2023_poznan))
tartu2018_hist = prepare_hist(ndvi2018_tartu, min_max[1], min_max[2])
tartu2023_hist = prepare_hist(ndvi2023_tartu, min_max[1], min_max[2])
poznan2023_hist = prepare_hist(ndvi2023_poznan, min_max[1], min_max[2])
philentropy::distance(rbind(tartu2018_hist, tartu2023_hist, poznan2023_hist),
method = "kullback-leibler")
```

```
v1 v2 v3
v1 0.000000 2.48398816 2.65464473
v2 2.483988 0.00000000 0.08389761
v3 2.654645 0.08389761 0.00000000
```

The results suggest that the distributions of NDVI values for Tartu and Poznan in 2023 are more similar to each other than to the distribution of NDVI values for Tartu in 2018.

As a bonus, we can visualize the histograms of the NDVI values for the three rasters.

```
df_hist = data.frame(
values = c(tartu2018_hist, tartu2023_hist, poznan2023_hist),
group = rep(c("Tartu 2018", "Tartu 2023", "Poznan 2023"), each = 32),
bin = rep(1:32, 3))
library(ggplot2)
ggplot(df_hist, aes(x = bin, y = values, color = group)) +
geom_line() +
theme_minimal()
```

To include the spatial context in the comparison of continuous raster data, we can use focal measures.

The **geodiv** package (Smith et al. 2023) provides more than a dozen of functions to calculate surface metrics for continuous rasters. One of them is the surface roughness (`sa()`

) function, which calculates the absolute deviation of raster value heights from the mean value.

```
library(geodiv)
ndvi2018_tartu_sa = sa(ndvi2018_tartu) # 0.219
ndvi2023_tartu_sa = sa(ndvi2023_tartu) # 0.150
ndvi2023_poznan_sa = sa(ndvi2023_poznan) # 0.141
abs(ndvi2023_tartu_sa - ndvi2018_tartu_sa)
```

`[1] 0.06886273`

`abs(ndvi2023_poznan_sa - ndvi2018_tartu_sa)`

`[1] 0.07695052`

`abs(ndvi2023_poznan_sa - ndvi2023_tartu_sa)`

`[1] 0.008087792`

The absolute differences show that the NDVI values for Tartu in 2023 have much more similar variability to the NDVI values for Poznan in 2023 than to the NDVI values for Tartu in 2018. On the other hand, the calculation of the differences could reveal the direction of the change, e.g., if the NDVI values for Tartu in 2023 are more or less variable than in 2018.

Another example of calculating and then comparing focal measures is the calculation of one of the GLCM texture metrics and then comparing the outcome values. Here, we compute the average of the homogeneity metric, which values are higher for more homogeneous textures, using the **GLCMTextures** package (Ilich 2020). Next, we calculate the absolute differences between the mean homogeneity values for the three rasters.

```
library(GLCMTextures)
ndvi2018_tartu_hom = glcm_textures(ndvi2018_tartu, n_levels = 16, shift = c(1, 0),
metric = "glcm_homogeneity", quantization = "equal prob")
ndvi2023_tartu_hom = glcm_textures(ndvi2023_tartu, n_levels = 16, shift = c(1, 0),
metric = "glcm_homogeneity", quantization = "equal prob")
ndvi2023_poznan_hom = glcm_textures(ndvi2023_poznan, n_levels = 16, shift = c(1, 0),
metric = "glcm_homogeneity", quantization = "equal prob")
ndvi2018_tartu_homv = global(ndvi2018_tartu_hom, "mean", na.rm = TRUE)
ndvi2023_tartu_homv = global(ndvi2023_tartu_hom, "mean", na.rm = TRUE)
ndvi2023_poznan_homv = global(ndvi2023_poznan_hom, "mean", na.rm = TRUE)
abs(ndvi2023_tartu_homv - ndvi2018_tartu_homv)
```

```
mean
glcm_homogeneity 0.01857073
```

`abs(ndvi2023_poznan_homv - ndvi2018_tartu_homv)`

```
mean
glcm_homogeneity 0.02247696
```

`abs(ndvi2023_poznan_homv - ndvi2023_tartu_homv)`

```
mean
glcm_homogeneity 0.003906225
```

The results show that the NDVI rasters for Tartu in 2023 and Poznan in 2023 have more similar homogeneity than the NDVI raster for Tartu in 2018.

Both, the surface roughness and the homogeneity metrics represent a given aspect of the spatial pattern of the continuous raster data. Alternatively, we can want to consider the complexity of the spatial pattern of the continuous raster data as a whole. This may be done using Gao’s entropy metric, which is based on aggregating the values of the input raster, and then calculating the possible ways to disaggregate the new raster into the original one (Gao and Li 2019).

The below example uses the **bespatial** package (Nowosad 2024).

```
library(bespatial)
ndvi2018_tartu_bes = bes_g_gao(ndvi2018_tartu, method = "hierarchy", relative = TRUE)
ndvi2023_tartu_bes = bes_g_gao(ndvi2023_tartu, method = "hierarchy", relative = TRUE)
ndvi2023_poznan_bes = bes_g_gao(ndvi2023_poznan, method = "hierarchy", relative = TRUE)
abs(ndvi2023_tartu_bes$value - ndvi2018_tartu_bes$value)
```

`[1] 15402.11`

`abs(ndvi2023_poznan_bes$value - ndvi2018_tartu_bes$value)`

`[1] 26013.8`

`abs(ndvi2023_tartu_bes$value - ndvi2023_poznan_bes$value)`

`[1] 10611.69`

The results show that the NDVI rasters for Tartu in 2023 and Poznan in 2023 have more similar Gao entropy than the NDVI raster for Tartu in 2018.

Fairly recent advances in deep learning have enabled the extraction of feature maps from pre-trained models. These feature maps can be used to compare the spatial patterns of continuous raster data (Malik and Robertson 2021).

This example uses the **keras3** (Kalinowski, Allaire, and Chollet 2024) and **philentropy** (HG 2018) packages. First, the NDVI rasters are normalized to the range of 0 to 1 and converted to a matrix format. Then, they are reshaped to the format required by the VGG16 model: a 3D array with three channels. The VGG16 model is loaded in the next step, and the feature maps are extracted. Finally, the first feature maps are reshaped to a vector format and compared using the Euclidean distance.

```
library(keras3)
library(philentropy)
# keras3::install_keras(backend = "tensorflow")
normalize_raster = function(r) {
min_val = terra::global(r, "min", na.rm = TRUE)[[1]]
max_val = terra::global(r, "max", na.rm = TRUE)[[1]]
r = terra::app(r, fun = function(x) (x - min_val) / (max_val - min_val))
return(r)
}
ndvi2023n_tartu = normalize_raster(ndvi2023_tartu)
ndvi2023n_poznan = normalize_raster(ndvi2023_poznan)
ndvi2023_tartu_mat = as.matrix(ndvi2023n_tartu, wide = TRUE)
ndvi2023_poznan_mat = as.matrix(ndvi2023n_poznan, wide = TRUE)
ndvi2023_tartu_mat = array(rep(ndvi2023_tartu_mat, 3),
dim = c(nrow(ndvi2023_tartu_mat), ncol(ndvi2023_tartu_mat), 3))
ndvi2023_poznan_mat = array(rep(ndvi2023_poznan_mat, 3),
dim = c(nrow(ndvi2023_poznan_mat), ncol(ndvi2023_poznan_mat), 3))
model = keras3::application_vgg16(weights = "imagenet", include_top = FALSE,
input_shape = c(nrow(ndvi2023_tartu_mat),
ncol(ndvi2023_tartu_mat), 3))
ndvi2023_tartu_mat = keras3::array_reshape(ndvi2023_tartu_mat,
c(1, dim(ndvi2023_tartu_mat)))
ndvi2023_poznan_mat = keras3::array_reshape(ndvi2023_poznan_mat,
c(1, dim(ndvi2023_poznan_mat)))
features2023_tartu = predict(model, ndvi2023_tartu_mat, verbose = 0)
features2023_poznan = predict(model, ndvi2023_poznan_mat, verbose = 0)
# [1, height, width, layer]
feature_map_tartu_1 = as.vector(features2023_tartu[1, , , 1])
feature_map_poznan_1 = as.vector(features2023_poznan[1, , , 1])
distance(rbind(feature_map_tartu_1, feature_map_poznan_1))
```

```
euclidean
7.007962
```

The **SpatialPack** package (Vallejos, Osorio, and Bevilacqua 2020) provides the `CQ()`

function, which calculates the similarity index based on the co-dispersion coefficient. However, this function (1) requires the input data to be in the matrix format, and (2) does not work with missing values. Thus, the code chunk below has not been evaluated, but it shows how to use the `CQ()`

function for data fulfilling these requirements.

```
library(SpatialPack)
ndvi2023_tartu_mat = as.matrix(ndvi2023_tartu, wide = TRUE)
ndvi2023_poznan_mat = as.matrix(ndvi2023_poznan, wide = TRUE)
ndvi_CQ = CQ(ndvi2023_tartu_mat, ndvi2023_poznan_mat)
ndvi_CQ$CQ
```

Gao, Peichao, and Zhilin Li. 2019. “Aggregation-Based Method for Computing Absolute Boltzmann Entropy of Landscape Gradient with Full Thermodynamic Consistency.” *Landscape Ecology* 34 (8): 1837–47. https://doi.org/gkb47k.

HG, Drost. 2018. “Philentropy: Information Theory and Distance Quantification with R.” *Journal of Open Source Software* 3 (26): 765.

Ilich, Alexander R. 2020. “GLCMTextures.” https://doi.org/10.5281/zenodo.4310186.

Kalinowski, Tomasz, JJ Allaire, and François Chollet. 2024. *Keras3: R Interface to ’Keras’*. Manual.

Malik, Karim, and Colin Robertson. 2021. “Landscape Similarity Analysis Using Texture Encoded Deep-Learning Features on Unclassified Remote Sensing Imagery.” *Remote Sensing* 13 (3): 492. https://doi.org/10.3390/rs13030492.

Nowosad, Jakub. 2024. *Bespatial: Boltzmann Entropy for Spatial Data*. Manual.

Smith, Annie C., Phoebe Zarnetske, Kyla Dahlin, Adam Wilson, and Andrew Latimer. 2023. *Geodiv: Methods for Calculating Gradient Surface Metrics*. Manual.

Vallejos, Ronny, Felipe Osorio, and Moreno Bevilacqua. 2020. *Spatial Relationships Between Two Georeferenced Variables: With Applications in R*. Springer Nature.

This is another example of a decision – which dissimilarity measure to use.↩︎

Any suggestions on how to improve this example are welcome – please let me know!↩︎

BibTeX citation:

```
@online{nowosad2024,
author = {Nowosad, Jakub},
title = {Comparison of Spatial Patterns in Continuous Raster Data for
Arbitrary Regions Using {R}},
date = {2024-10-27},
url = {https://jakubnowosad.com/posts/2024-10-27-spatcomp-bp3/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2024. “Comparison of Spatial Patterns in
Continuous Raster Data for Arbitrary Regions Using R.” October
27, 2024. https://jakubnowosad.com/posts/2024-10-27-spatcomp-bp3/.

Methods for comparing spatial patterns in raster data

This is the second part of a blog post series on comparing spatial patterns in raster data. More information about the whole series can be found in part one.

This blog post shows various methods for comparing spatial patterns in continuous raster data for overlapping regions, i.e., how to compare two rasters for the same region, but in different moments in time (or, in some cases, with different variables)^{1} using R programming language.

Two continuous raster datasets are used in this blog post: the Normalized Difference Vegetation Index (NDVI) for Tartu (Estonia) for the years 2000 and 2018.

```
library(terra)
ndvi2018_tartu = rast("https://github.com/Nowosad/comparing-spatial-patterns-2024/raw/refs/heads/main/data/ndvi2018_tartu.tif")
ndvi2023_tartu = rast("https://github.com/Nowosad/comparing-spatial-patterns-2024/raw/refs/heads/main/data/ndvi2023_tartu.tif")
plot(ndvi2018_tartu, main = "Tartu (2000)")
plot(ndvi2023_tartu, main = "Tartu (2018)")
```

The most basic approach to compare two continuous rasters is to calculate the difference between their values for each cell.

```
ndvi_diff = ndvi2023_tartu - ndvi2018_tartu
plot(ndvi_diff)
```

The output is a raster with positive values indicating that the values of the first raster are higher than the values of the second raster, while negative values indicate the opposite.^{2}

The differences may be even more highlighted by using a diverging color palette. The below function `plot_div()`

visualizes a raster with a diverging color palette, where the middle color represents zero.

```
plot_div = function(r, ...){
r_range = range(values(r), na.rm = TRUE, finite = TRUE)
max_abs = max(abs(r_range))
new_range = c(-max_abs, max_abs)
plot(r, col = hcl.colors(100, palette = "prgn"), range = new_range, ...)
}
plot_div(ndvi_diff)
```

It shows that most of the area has negative values, indicating that the NDVI values were lower in 2023 than in 2018.

```
ndvi_diff = ndvi2023_tartu - ndvi2018_tartu
hist(ndvi_diff)
```

A similar approach, also based on the values of each cell independently, is to calculate the statistics of the differences between the two rasters’ values (i.e., error metrics).

One of the most common error metrics is the Root Mean Square Error (RMSE), which quantifies the average difference between the values of the two rasters. Its lower value indicates a better agreement between the two rasters. RMSE is implemented in many R packages, and here we use the version from the **yardstick** package (Kuhn, Vaughan, and Hvitfeldt 2024).^{4}

In general, error metrics require two sets of values; thus, we are using the `values()`

function to extract the values of the rasters before calculating the RMSE.

```
library(yardstick)
ndvi_rmse = rmse_vec(values(ndvi2023_tartu)[, 1], values(ndvi2018_tartu)[, 1])
ndvi_rmse
```

`[1] 0.2191853`

The **diffeR** package (Pontius Jr. and Santacruz 2023), on the other hand, is directly aimed at comparing rasters. One of its functions, `MAD()`

, calculates the Mean Absolute Difference (MAD) between two rasters.

```
library(diffeR)
ndvi_mad = MAD(ndvi2023_tartu, ndvi2018_tartu)
```

`[1] "The mean of grid1 is less than the mean of grid2"`

`ndvi_mad$Total`

`[1] 0.184306`

This measure has a similar interpretation to RMSE, but it is less sensitive to outliers.

The main general approach of incorporating spatial context into comparison of rasters is the use of calculations in a moving window. This way, we do not only consider the values of each cell independently, but also the values of the surrounding cells.^{5}

The `focalPairs()`

function from the **terra** package (Hijmans 2024) uses a moving window to extract values from the focal regions of two rasters and calculates the correlation coefficient between them. It requires two rasters with the same number of rows and columns, the size of the moving window, and the function to calculate the correlation coefficient.^{6}

```
ndvi_cor = focalPairs(c(ndvi2023_tartu, ndvi2018_tartu), w = 5,
fun = "pearson", na.rm = TRUE)
plot_div(ndvi_cor)
```

The values of the output raster range from -1 to 1, where 1 indicates a perfect positive correlation in the focal regions, 0 indicates no correlation (the focal values have no relationship to each other), and -1 indicates a perfect negative correlation.

Similarly to the correlation coefficient, other statistics can be calculated in a moving window. This option is offered by several R packages, including **geodiv** (Smith et al. 2023), **GLCMTextures** (Ilich 2020), and **rasterdiv** (Rocchini et al. 2021).

The **geodiv** package provides a function `focal_metrics()`

that calculates more than 20 different focal statistics. In the below example, we calculate the average surface roughness (SA) in a moving window separately for the two rasters and then calculate the difference between them.

```
library(geodiv)
window = matrix(1, nrow = 5, ncol = 5)
ndvi2018_tartu_sa_mw = focal_metrics(ndvi2018_tartu, window = window,
metric = "sa", progress = FALSE)
ndvi2023_tartu_sa_mw = focal_metrics(ndvi2023_tartu, window = window,
metric = "sa", progress = FALSE)
ndvi_diff_sa_mw = ndvi2023_tartu_sa_mw$sa - ndvi2018_tartu_sa_mw$sa
plot_div(ndvi_diff_sa_mw)
```

This outcome shows the difference in the average surface roughness between the two rasters in a moving window. The possitive values indicate that the surface roughness (in this case, the heterogeneity of NDVI) was higher in 2023 than in 2018, while the negative values indicate the opposite. Interestingly, the most extreme (negative) values are located in just a few small areas.

We can also compare various texture GLCM (Gray-Level Co-occurrence Matrix) metrics (Haralick, Shanmugam, and Dinstein 1973) in a moving window that characterizes many aspects of the spatial structure of the rasters. This example uses the **GLCMTextures** package (Ilich 2020). We calculate the homogeneity of the NDVI values in a moving window for both rasters and calculate the difference between them.

The positive values in the resulting map indicate that the homogeneity of the NDVI values was higher in 2023 than in 2018, while the negative values indicate the opposite.

The Rao’s quadratic entropy is a measure of diversity that takes into account not only the abundance of the classes (values), but also the dissimilarity between them (Rao 1982).

This measure can be calculated using the `paRao()`

function from the **rasterdiv** package (Rocchini et al. 2021). The function requires the raster with integer values (thus we multiply the NDVI values by 100) and the size of the moving window.

```
library(rasterdiv)
ndvi2018_tartu_int = as.int(ndvi2018_tartu * 100)
ndvi2023_tartu_int = as.int(ndvi2023_tartu * 100)
ndvi2018_tartu_rao = paRao(ndvi2018_tartu_int, window = 5, progBar = FALSE)
ndvi2023_tartu_rao = paRao(ndvi2023_tartu_int, window = 5, progBar = FALSE)
ndvi_rao_diff = ndvi2023_tartu_rao[[1]][[1]] - ndvi2018_tartu_rao[[1]][[1]]
plot_div(ndvi_rao_diff)
```

*Note that the above calculations may take a few minutes to complete.*

The positive values in the resulting map indicate that the diversity (in this case, the heterogeneity of NDVI) was higher in 2023 than in 2018, while the negative values indicate the opposite.

An alternative approach for computing metrics in two rasters and then calculate the difference between them is to calculate the difference between the two rasters, and then compute the spatial autocorrelation of the differences (Cliff 1970).

The **terra** package provides a function `autocor()`

that calculates the spatial autocorrelation of a raster using a selected method. By default, it uses a three by three moving window (excluding the focal cell) to calculate the Moran’s I index. The `global = FALSE`

argument specifies that the spatial autocorrelation should be calculated for each cell separately, and thus the output is a raster.

```
ndvi_diff = ndvi2023_tartu - ndvi2018_tartu
ndvi_diff_autocor = autocor(ndvi_diff, method = "moran", global = FALSE)
plot_div(ndvi_diff, main = "Difference")
plot_div(ndvi_diff_autocor, main = "Moran's I of the difference")
```

The resulting map shows the spatial autocorrelation of the differences between the two rasters. Its high values indicate that the differences are spatially clustered (positive changes are close to positive changes, and negative changes are close to negative changes), while low values indicate that the differences are spatially random. Areas with no spatial autocorrelation (values around zero) indicate that the differences are spatially independent.

Structural similarity index (SSIM) (Wang et al. 2004) is a measure used to compare similarity between two images (no surprises here). It is based on the comparison of three aspects of the images: luminance, contrast, and structure, and its idea is to mimic the human perception of similarity.

The **SSIMmap** package (Ha and Long 2023) provides a function `ssim_raster()`

that calculates the SSIM between two rasters.^{7}

```
library(SSIMmap)
ndvi_ssim = ssim_raster(ndvi2018_tartu, ndvi2023_tartu, global = FALSE, w = 5)
plot_div(ndvi_ssim[[1]])
```

The interpretation of the SSIM values is similar to the correlation coefficient: 1 indicates a perfect similarity, 0 indicates no similarity, and -1 indicates a perfect dissimilarity.

All of the single value outcomes can be also calculated at multiple scales, i.e., instead of using each original cell value, the values are aggregated in windows of different sizes.

We have already seen the use of RMSE to compare two rasters based on the values of each cell independently. Now, using the **waywiser** package (Mahoney 2023), we can calculate the RMSE at multiple scales.

Its function `ww_multi_scale()`

requires the two rasters, the metrics to calculate, and the sizes of the windows (in cells) to aggregate the values. Here, we calculate the RMSE at six different scales, from 50 to 300 meters (map units).

```
library(waywiser)
cell_sizes = seq(50, 300, by = 50)
ndvi_multi_scale = ww_multi_scale(truth = ndvi2018_tartu, estimate = ndvi2023_tartu,
metrics = list(yardstick::rmse),
cellsize = cell_sizes,
progress = FALSE)
ndvi_multi_scale
```

```
# A tibble: 6 × 6
.metric .estimator .estimate .grid_args .grid .notes
<chr> <chr> <dbl> <list> <list> <list>
1 rmse standard 0.205 <tibble [1 × 1]> <sf [10,000 × 5]> <tibble>
2 rmse standard 0.198 <tibble [1 × 1]> <sf [2,500 × 5]> <tibble>
3 rmse standard 0.195 <tibble [1 × 1]> <sf [1,156 × 5]> <tibble>
4 rmse standard 0.193 <tibble [1 × 1]> <sf [625 × 5]> <tibble>
5 rmse standard 0.193 <tibble [1 × 1]> <sf [400 × 5]> <tibble>
6 rmse standard 0.193 <tibble [1 × 1]> <sf [289 × 5]> <tibble>
```

This result (note the `.estimate`

column) shows the RMSE values at each scale: the largest value is at the scale of 50 meters, and the smallest value is at the scale of 200, 250, and 300 meters. It indicates that with the increase of the scale, the agreement between the two rasters increases.

A similar approach can be found in the `MAD()`

function from the **diffeR** package, which calculates the Mean Absolute Difference (MAD) between two rasters at multiple scales. Here, these scales start at the original resolution of the rasters and increase by a factor of 2.

```
library(diffeR)
ndvi_mad = MAD(ndvi2023_tartu, ndvi2018_tartu, eval = "multiple")
```

`[1] "The mean of grid1 is less than the mean of grid2"`

`ndvi_mad`

```
Multiples Resolution Quantity Strata Element Total
1 1 10 0.1783152 0 0.0059907211 0.1843060
2 2 20 0.1783152 0 0.0038461843 0.1821614
3 4 40 0.1783152 0 0.0016917875 0.1800070
4 8 80 0.1783152 0 0.0003588876 0.1786741
5 16 160 0.1783152 0 0.0001143444 0.1784296
6 32 320 0.1783152 0 0.0001143444 0.1784296
7 64 640 0.1783152 0 0.0000000000 0.1783152
8 128 1280 0.1783152 0 0.0000000000 0.1783152
9 256 2560 0.1783152 0 0.0000000000 0.1783152
10 512 5120 0.1783152 0 0.0000000000 0.1783152
```

Note the last column of the result, which shows the MAD values at each scale. It is similar to the RMSE result, with the largest value at the original resolution of the rasters and the smallest value at the largest scale.

The average (global) SSIM value can be calculated by averaging the SSIM values of all cells in the raster. It is a single value that indicates the overall similarity between the two rasters (Wang et al. 2004; Robertson et al. 2014).

It can be calculated using the `ssim_raster()`

function from the **SSIMmap** package(Ha and Long 2023).

```
library(SSIMmap)
ssim_raster(ndvi2018_tartu, ndvi2023_tartu, global = TRUE)
```

`SSIM: 0.63845 SIM: 0.90432 SIV: 0.90778 SIP: 0.75671 `

The result is a single value that ranges from -1 to 1, where 1 indicates a perfect similarity, 0 indicates no similarity, and -1 indicates a perfect dissimilarity.

Cliff, Andrew D. 1970. “Computing the Spatial Correspondence Between Geographical Patterns.” *Transactions of the Institute of British Geographers*, no. 50 (July): 143. https://doi.org/10.2307/621351.

Ha, Hui Jeong (Hailyee), and Jed Long. 2023. *SSIMmap: The Structural Similarity Index Measure for Maps*. Manual.

Haralick, Robert M, Karthikeyan Shanmugam, and Its’ Hak Dinstein. 1973. “Textural Features for Image Classification.” *IEEE Transactions on Systems, Man, and Cybernetics*, no. 6: 610–21.

Hijmans, Robert J. 2024. *Terra: Spatial Data Analysis*. Manual.

Ilich, Alexander R. 2020. “GLCMTextures.” https://doi.org/10.5281/zenodo.4310186.

Kuhn, Max, Davis Vaughan, and Emil Hvitfeldt. 2024. *Yardstick: Tidy Characterizations of Model Performance*. Manual.

Mahoney, Michael J. 2023. “Waywiser: Ergonomic Methods for Assessing Spatial Models.” https://doi.org/10.48550/arXiv.2303.11312.

Pontius Jr., Robert Gilmore, and Ali Santacruz. 2023. *diffeR: Metrics of Difference for Comparing Pairs of Maps or Pairs of Variables*. Manual.

Rao, C.Radhakrishna. 1982. “Diversity and Dissimilarity Coefficients: A Unified Approach.” *Theoretical Population Biology* 21 (1): 24–43. https://doi.org/10.1016/0040-5809(82)90004-1.

Robertson, Colin, Jed A. Long, Farouk S. Nathoo, Trisalyn A. Nelson, and Cameron C. F. Plouffe. 2014. “Assessing Quality of Spatial Models Using the Structural Similarity Index and Posterior Predictive Checks.” *Geographical Analysis* 46 (1): 53–74. https://doi.org/10.1111/gean.12028.

Rocchini, Duccio, Elisa Thouverai, Matteo Marcantonio, Martina Iannacito, Daniele Da Re, Michele Torresani, Giovanni Bacaro, et al. 2021. “Rasterdiv - An Information Theory Tailored R Package for Measuring Ecosystem Heterogeneity from Space: To the Origin and Back.” *Methods in Ecology and Evolution* 12 (6): 2195. https://doi.org/10.1111/2041-210X.13583.

Smith, Annie C., Phoebe Zarnetske, Kyla Dahlin, Adam Wilson, and Andrew Latimer. 2023. *Geodiv: Methods for Calculating Gradient Surface Metrics*. Manual.

Wang, Z., A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. “Image Quality Assessment: From Error Visibility to Structural Similarity.” *IEEE Transactions on Image Processing* 13 (4): 600–612. https://doi.org/10.1109/TIP.2003.819861.

In fact, they may also be used for comparing rasters for other regions, but they still should have the same number of rows and columns.↩︎

Alternatively, the absolute difference can be calculated to show the difference in the magnitude of the values.↩︎

Read the comment on this by Mathieu Gravey.↩︎

Implementation of RMSE is also straightforward:

`rmse = function(x, y) sqrt(mean((x - y)^2))`

.↩︎This leaves a tough question of how to choose the size of the moving window.↩︎

The “pearson” method is the only one available by name, but other methods can be used by providing a custom function.↩︎

This function also returns three other rasters: SIM, SIV, and SIP that relate to the luminance, contrast, and structure, respectively.↩︎

BibTeX citation:

```
@online{nowosad2024,
author = {Nowosad, Jakub},
title = {Comparison of Spatial Patterns in Continuous Raster Data for
Overlapping Regions Using {R}},
date = {2024-10-20},
url = {https://jakubnowosad.com/posts/2024-10-20-spatcomp-bp2/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2024. “Comparison of Spatial Patterns in
Continuous Raster Data for Overlapping Regions Using R.” October
20, 2024. https://jakubnowosad.com/posts/2024-10-20-spatcomp-bp2/.

Methods for comparing spatial patterns in raster data

This is the first part of a blog post series on comparing spatial patterns in raster data.

Comparison of spatial patterns in raster data is a part of many types of spatial analysis. With this task, we want to know how the physical arrangement of observations in one raster differs from the physical arrangement of observations in another raster.

This blog post series will explain the motivation for comparing spatial patterns in raster data, the general considerations when selecting a method for comparison, and the inventory of methods for comparing spatial patterns in raster data. Next, it will show how to use R to compare spatial patterns in continuous and categorical raster data. Lastly, it will discuss the methods’ properties, their applicability, and how they can be extended.

In general, four main reasons for comparing spatial patterns in raster data were identified by Long and Robertson (2018, https://doi.org/10.1111/gec3.12356):

- Study of change (i.e., how the spatial pattern of a landscape has changed over time)
- Study of similarity (i.e., how similar the spatial pattern of two landscapes is)
- Study of association (i.e., how the spatial pattern of one theme is associated with the spatial pattern of another theme)
- Spatial model assessment (i.e., how well a model output matches the spatial pattern of the reference data)

For now, I want to focus on the situation in which only two rasters are compared at a time.^{1} Then, we can think about comparing spatial patterns in raster data as the type of operation that is performed:

- Comparison of the same variable for two different areas
- Comparison of two different variables (or e.g., sensors) for the same area
- Comparison of the same variable for the same area at two different times

To think about this topic, let’s consider the examples of the Corine Land Cover (CLC) data for Tartu, Estonia in 2000 and 2018, and for Poznan, Poland in 2018:

We can start comparing the spatial patterns in these rasters just by looking at them: the land cover for Tartu in 2000 and 2018 looks similar, but it is different for Poznan in 2018 (much more urban areas, less forests). A comparison of Tartu in 2000 and 2018 suggests that urban areas have expanded into areas that were previously mostly covered by agricultural land. Visual inspection is a good starting point, as the human eye can detect multiple patterns that are not easily quantifiable. At the same time, it is subjective and may not be suitable for large datasets.

Alternatively, we can use quantitative methods to compare the spatial patterns in these rasters. Figure 1 shows general considerations when thinking about the properties of the methods for comparing spatial patterns in raster data.

**The first main aspect to consider when comparing spatial patterns in raster data is whether or not we are dealing with arbitrary regions.** Working on overlapping (i.e., non-arbitrary) regions, e.g., CLC in Tartu in 2000 and 2018, allows for different approaches than working on arbitrary regions, e.g., CLC in Tartu in 2018 and Poznan in 2018. With non-arbitrary regions, each cell in one raster (or each cell in a moving window) can be compared to a corresponding cell in another raster. Thus, one possible outcome of the comparison is another raster, which highlights where the spatial patterns are similar or different. This is not possible with arbitrary regions, and the comparison usually includes spatial patterns of whole rasters.

**The second main aspect to consider when comparing spatial patterns in raster data is whether the method used allows the integration of the spatial context of the analysis.** A difference between the values of two cells is straightforward to calculate and interpret, but it does not consider the other local values. Alternatively, some methods use the spatial context of the analysis, e.g., by comparing the values in a moving window or a local neighborhood.

**It is also worth noting that the comparison of spatial patterns in raster data can result in different types of data.** For overlapping regions, the outcome can be a single value, multiple values, or a raster, and for arbitrary regions, it is usually a single value (multiple values are also possible, but often as a collection of single values’ results).

The above considerations can be applied to both continuous and categorical raster data, but the methods used for comparing spatial patterns in these two types of data are different.

An inventory of methods for comparing spatial patterns in raster data is presented in the following tables.^{2} Methods for comparing two layers of spatial continuous raster data are shown in Table 1, and methods for spatial categorical raster data are in Table 2.

Arbitrary regions | Context | Outcome | Examples of methods |
---|---|---|---|

no | non-spatial | raster | The difference between values of two rasters for each cell |

no | spatial | raster | Correlation coefficient between focal regions of two rasters; The difference between a focal measure of two rasters (e.g., based on a GLCM-based texture measure (Haralick et al., 1973)); Spatial autocorrelation analysis of the differences (Cliff, 1970); Structural Similarity Index (Robertson et al., 2014; Wang et al., 2004) |

no | non-spatial | single value | Statistics of the differences between rasters’ values (e.g., RMSE) |

yes | non-spatial | single value | Dissimilarity between the distributions of two rasters’ values |

no | spatial | single value | Average of Structural Similarity Index (Robertson et al., 2014; Wang et al., 2004); Complex Wavelet Structural Similarity (Malik and Robertson, 2020; Sampat et al., 2009); Comparison of deep learning-based feature maps using a dissimilarity measure (Malik and Robertson, 2021) |

yes | spatial | single value | Comparison of deep learning-based feature maps using a dissimilarity measure (Malik and Robertson, 2021) |

no | non-spatial | multiple values | The distribution of the difference between values of two rasters; Statistics of the differences between rasters’ values (e.g., RMSE) calculated at many scales |

Arbitrary regions | Context | Outcome | Examples of methods |
---|---|---|---|

no | non-spatial | raster | The binary difference between two rasters |

no | spatial | raster | The difference between a focal measure of two rasters (e.g., selected landscape metric); Dissimilarity between spatial signatures of focal regions of two rasters |

no | non-spatial | single value | The proportion of changed pixels; The overall comparison (Pontius, 2002), A statistic of the differences between rasters’ values |

yes | non-spatial | single value | Comparison of the values of a non-spatial landscape metric |

no | spatial | single value | Multiple resolution procedure (Costanza, 1989); Expanding window approach (Kuhnert et al., 2005); Fuzzy Kappa (Hagen-Zanker, 2009); Spatial association between regionalizations using V-measure (Nowosad and Stepinski, 2018) |

yes | spatial | single value | Comparison of the values of a landscape metric (Turner et al., 1989) or fractal dimensions (Batty and Longley, 1994); Dissimilarity of a spatial signature between two rasters (Jasiewicz and Stepinski, 2013) |

no | non-spatial | multiple values | The confusion matrix |

no | spatial | multiple values | Comparison of mutual information spectra (Remmel and Csillag, 2006) |

Both tables are adapted from the “Comparing spatial patterns in raster data using R” paper published in the ISPRS Archives (Nowosad, 2024, https://doi.org/10.5194/isprs-archives-XLVIII-4-W12-2024-127-2024).^{3} There you can find the complete list of the references for the methods presented in the tables.

In the next blog posts, I will focus on the practical aspects of comparing spatial patterns: how to use R to compare spatial patterns in continuous and categorical raster data. It will include the interpretation of the results and the discussion of the characteristics of the methods. Stay tuned!

I may expand this to the comparison of multiple rasters in a future blog post.↩︎

The tables omit the visual inspection of the data, which is often the first step in the comparison of spatial patterns in raster data.↩︎

However, I decided to use the term “arbitrary regions” instead of “disjoint areas”.↩︎

BibTeX citation:

```
@online{nowosad2024,
author = {Nowosad, Jakub},
title = {Inventory of Methods for Comparing Spatial Patterns in Raster
Data},
date = {2024-10-13},
url = {https://jakubnowosad.com/posts/2024-10-13-spatcomp-bp1/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2024. “Inventory of Methods for Comparing Spatial
Patterns in Raster Data.” October 13, 2024. https://jakubnowosad.com/posts/2024-10-13-spatcomp-bp1/.

I am excited about the project and looking forward to the next two years, but this blog post is not about the project itself. Instead, I want to share some thoughts on the process of applying for the MSCA-PF grant. I hope this post will be helpful for others applying for this grant in the future, but also for grant providers and reviewers who might be interested in improving the process.^{2}

In short, the MSCA-PF grant is a competitive grant aimed at early career researchers (up to 8 years after the PhD).^{3} The European Postdoctoral Fellowship allows researchers to work on a project of their choice for up to two in a host institution in Europe or associated countries. The aim of this grant is not only to support the research project but also to develop new skills and promote knowledge transfer between the researcher and the host institution. The MSCA-PF grant website also lists the following benefits of the grant: a living allowance, a mobility allowance (plus possibly also family, long-term leave, and special needs allowances) and funding for research, training, and networking activities, and management and indirect costs.

The grant proposal consists of **Part A** (the administrative part filled in the online portal) and **Part B** (the scientific part uploaded as a PDF). **Part B** is divided into two subparts: **B1** and **B2**.

**Part B1** is the core part of the proposal and should contain details of the proposed research and training activities along with the practical arrangements proposed to implement them, etc. It is strictly restricted to 10 A4-sized pages, with font size and margin limitations. **Part B2** has no page limit and contains the researcher’s CV, the capacity of the participating organization(s), and other related information.

The grant process has extensive documentation, which includes the main website, Q&A Blog, and many other documents, such as “The guide for Applicants”, “PF Handbook”, “Evaluation Form”, “How to complete your ethics self-assessment”, and many more. The extensive documentation, while comprehensive, can be overwhelming for applicants, as it far exceeds the length of the proposal itself.

After reading the documentation, the application process seems straightforward: you just fill out the online form, write **Part B**, and submit everything. I initially thought that my main issue would be the page limit of **Part B1**: how to fit all the ideas about the project into just 10 pages? The reality turned out to be quite different–the main issue was understanding what was actually expected in the proposal.

**Part B1** has three main sections: (1) “Excellence”, (2) “Impact”, and (3) “Quality and Efficiency of the Implementation”. Then, each section has a number of subsections. For example, the “Excellence” section has the following subsections:

- “Quality and pertinence of the project’s research and innovation objectives (and the extent to which they are ambitious, and go beyond the state of the art).”
- “Soundness of the proposed methodology (including interdisciplinary approaches, consideration of the gender dimension and other diversity aspects if relevant for the research project, and the quality of open science practices).”
- “Quality of the supervision, training and of the two-way transfer of knowledge between the researcher and the host.”
- “Quality and appropriateness of the researcher’s professional experience, competences and skills.”

Then, each subsection has several bullet points that you are supposed to address in your proposal. For example, the first subsection has two bullet points, and the second subsection has five bullet points, etc. *Spoiler alert: not addressing these bullet points may result in a lower reviewer’s score.*

Moreover, throughout the grant proposal template, you may encounter various new terms and concepts that you need to understand and address. Some are probably well-known by seasoned grant writers but not by all early-career researchers. For example, you need to know the differences between dissemination, exploitation, and communication of the results; what’s “Data Management Plan (DMP)” or “Career Development Plan (CDP)”; what’s “Mobility declaration” or “Evaluation questionnaire”; how to address “Gender dimension and other diversity aspects”, “Environmental considerations in light of the MSCA Green Charter”, etc. These are defined in various documents available online, but it takes time to understand them and it is required to address them in the proposal.

To make things more complicated, the grant documentation also contains several hidden expectations. I got great help by talking to a previous MSCA-PF grant holder, local university advisors, and a KoWi advisor.^{4} Interestingly, they all pointed out different aspects (and hidden expectations) of the grant proposal. Thus, the process can feel like navigating a complex puzzle, with some elements not immediately apparent.

My rough estimation is that about 7 out of 10 pages of **Part B1** relate to the expected information: you must address the bullet points, explain the concepts, and meet the hidden expectations. The remaining three pages, scattered throughout the proposal, are for the actual ideas behind the project (and some references).

The proposal’s structure may inadvertently encourage applicants to focus more on meeting specific criteria than on fully elaborating their research ideas. For example, there should be a part about “Open science practices” within the second subsection of **Part B1**. It does not matter if you are actually doing (or thinking about doing) open science^{5} – you need to write about it to get points. I think it teaches the wrong behavior to young researchers.

Let’s say you wrote a first draft of the text and are happy with it. Now, you need to format it to meet all formal expectations, such as font size, margins, page limits, and more. You may spend many hours moving the text around, changing the font sizes, and so on. Thus, I suggest leaving the formatting as one of the last steps of the proposal writing process.

In total, I spent about three months of part-time work on the grant proposal.^{6} I also suspect that the cumulative time investment across all applicants (of this single-person grant) is substantial, highlighting an opportunity to explore ways to streamline the application process.^{7}

After you submit the grant proposal (the deadline is usually in September), it goes through a reviewing process and the results are announced the following March. You can find the evaluation form at https://ec.europa.eu/info/funding-tenders/opportunities/docs/2021-2027/horizon/temp-form/ef/ef_he-msca_en.pdf. The form mainly focuses on the expected information. On the one hand, it makes sense: the reviewers need to try to be objective and evaluate the proposals based on the same criteria. On the other hand, there’s a chance that proposals that best meet the formal criteria score highly, which may not always align perfectly with identifying the most innovative research ideas.^{8}

Now, let’s say that you got the grant. It is time for ~~celebration~~ filling out the grant agreement. This brings me to the topic of the online portals related to the MSCA-PF grant.^{9} Yes, portal**s**. This is because there are several portals that you need to use during the application and project management phases. These portals are not very user-friendly, and each of them uses a different technology and visual style. It takes a lot of time to get used to them, understand which portal you need to use for which task, and how to navigate them. There seems to be room for improving the user experience and integration of these systems.

Let’s end up with one important piece of information for those who are considering applying for the MSCA-PF grant. The grant promotes itself with a list of benefits: a living allowance, a mobility allowance (plus possibly also family, long-term leave, and special needs allowances), funding for research, training, networking activities, and management and indirect costs. The living allowance (that is supposed to cover your salary) depends on the country where you will be working, based on a correction factor for the cost of living, while the rest of the allowances are fixed. For example, in the 2023 edition of the grant, the living allowance for Germany was about 5,000 EUR per month, plus the mobility allowance of 600 EUR per month, and the family allowance of 660 EUR per month (plus some money for the research, training, and networking activities, and management and indirect costs).

What is not directly mentioned, however, is that every country (and even institutions in one country) has different rules about salary, taxes, and other benefits. For example, some universities in Germany will just hire you as a regular employee, and you will get a salary based on the pay scale. Thus, you won’t get the mobility and family allowances.^{10} Moreover, the grant funding can be treated in some (?) institutions as gross gross (*brutto brutto*), which means that the money will be first used to cover the employer’s costs, and then you will need to pay the taxes on the remaining money. Thus, the actual amount the researcher receives (netto) is significantly less than the total amount granted.

Given that the way the grant is treated may vary between countries and institutions, I think it is essential to ask about the details before applying for the grant. The best way is probably to contact a previous MSCA-PF grant holder from the institution where you plan to work and ask about all these details.

This may not be obvious after reading this blog post, but I am very happy I got the grant and excited about the project. If I could go back in time knowing all of the above, I would still apply for the MSCA-PF grant. This grant format is an excellent opportunity for researchers to move to a different environment, learn new skills, and develop new ideas.

That being said, I think the grant proposal process could be greatly improved. The complexity of the application process (evidenced by the extensive documentation required for a relatively short proposal) suggests that there may be opportunities to improve the procedure. Currently, it puts a lot of pressure and a high time burden on the applicants and may lead to a situation where the best proposals are not funded.^{11} This, combined with many hidden expectations, document formatting, and user-unfriendly online portals, makes the whole process even more time-consuming: it could require thousands of hours of young researchers’ work.

I think the grant proposal process should be much simplified and streamlined. While I understand the need for evaluation criteria and objectivity, I think the current system is not the best way to achieve this. In my opinion, the focus of the proposal should be on the research ideas, the potential of the applicants, and the transfer of knowledge, not on the ability to fill in the expected information. A good example of a grant proposal process that I think is much better^{12} is the Humboldt Research Fellowship, which has one simple online portal and fairly straightforward expectations – the focus is on the research ideas and the potential of the applicants, and the whole application process is less surprising and much less time-consuming.

I hope you found this post helpful – if you have any questions or comments, feel free to email me. And now, it’s time to pack my bags.

I also plan to write a few blog posts about the project, so stay tuned!↩︎

It will be useful for me to clear my thoughts and reflect on the process.↩︎

The acceptance rate in 2024 was about 15%.↩︎

Thank you!↩︎

I strongly encourage it.↩︎

Of course, I was also working on several other projects, teaching, etc., at the same time.↩︎

About 8,000 applicants multiplied by 3 months is 2,000 years of part-time work of a highly educated person.↩︎

As pointed out by a peer, these types of reviews are still very good to filter out bad proposals. I agree with this statement. At the same time, I also think that science is a strong link problem, and we should focus our efforts on finding the best ideas.↩︎

And, I assume, to other EU grants as well.↩︎

This creates a situation where a person with a family gets the same salary as a person without a family.↩︎

And possibly not even written when the applicants do not have enough privilege to spend three months on a single-person grant proposal.↩︎

But still not perfect.↩︎

BibTeX citation:

```
@online{nowosad2024,
author = {Nowosad, Jakub},
title = {Navigating the Maze: {Reflections} on Applying for the
{Marie} {Skłodowska-Curie} {Actions} {Postdoctoral} {Fellowship}},
date = {2024-07-22},
url = {https://jakubnowosad.com/posts/2024-07-22-msca-bp1/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2024. “Navigating the Maze: Reflections on
Applying for the Marie Skłodowska-Curie Actions Postdoctoral
Fellowship.” July 22, 2024. https://jakubnowosad.com/posts/2024-07-22-msca-bp1/.

`B`

and `J`

, which control the external pressure and the local autocorrelation tendency, respectively. Both of them have a strong effect on the results of the spatial kinetic Ising model. Thus, the question is how to find the best values of these parameters for a given situation.
This blog post shows how to use the simulated annealing algorithm to find the best values of `B`

and `J`

that minimize the difference between the metrics of the expected map and the metrics of the last simulation.

To reproduce the calculations in the following post, you need to attach the following packages:

```
library(terra)
library(spatialising)
library(ggplot2)
```

We will also use and process the same input data as in the previous blog post.

```
twomaps = rast("/vsicurl/https://github.com/Nowosad/bp-data/raw/main/spatialising-bp/twomaps.tif")
rcl = matrix(c(1, -1, 2, 1), byrow = TRUE, ncol = 2)
twomaps = classify(twomaps, rcl)
map1 = twomaps[[1]]
map2 = twomaps[[2]]
plot(c(map1, map2))
```

In this case, we have two maps: the first map (`map1`

) is the initial map, and the second map (`map2`

) is the expected map. Our main goal is to find the best values of `B`

and `J`

that, given the initial map (`map1`

), will result in a simulated map that is as similar as possible to the expected map (`map2`

).

One possible approach to find these best values of `B`

and `J`

is to use an optimization algorithm. This algorithm’s goal is to minimize the difference between the metrics of the expected map (e.g., `map2`

) and the metrics of the last simulation.

For that purpose, we can use the `optim_sa()`

function from the **optimization** package that implements the simulated annealing algorithm. Firstly, we need to define a function that takes one variable (`x`

) and returns a single numeric value. Here, we created the `optimize_model()`

function that takes the `x`

variable, which is a vector of two values: `B`

and `J`

. The function then runs the spatial kinetic Ising model with the given values of `B`

and `J`

, calculates the composition and texture indexes for the expected map (`map2`

) and the last simulation, and returns the distance between the metrics of the expected map (`map2`

) and the metrics of the last simulation.^{1}

```
optimize_model = function(x){
sim = spatialising::kinetic_ising(x = map1, B = x[1], J = x[2],
updates = 4)
map2_metrics = c(composition_index(map2), texture_index(map2))
sim_metrics = c(composition_index(sim[[4]]), texture_index(sim[[4]]))
dist(rbind(map2_metrics, sim_metrics))[[1]]
}
```

Secondly, we need to use the `optim_sa()`

function, which takes the `optimize_model()`

function, the initial values of `B`

and `J`

, and the lower and upper bounds for `B`

and `J`

. Here, we set the bounds for `B`

to be between `0`

and `0.9`

as we know that the proportion of the `1`

(forest) values should increase over time. This operation may take about one minute in this case.

```
optim_params = optimization::optim_sa(fun = optimize_model,
start = c(0.4, 0.4),
lower = c(0, 0),
upper = c(0.9, 0.9))
```

The output of the `optim_sa()`

function is a list with several elements, including the best values of `B`

and `J`

in the `par`

element.

`optim_params$par`

`[1] 0.78 0.51`

Here, our optimal values of `B`

and `J`

are 0.78 and 0.51, respectively. Now, we can use the newly derived values to simulate the spatial kinetic Ising model similar to the second map (`map2`

).

```
sim_optim = kinetic_ising(map1,
B = optim_params$par[1], J = optim_params$par[2],
updates = 4)
plot(c(map2, sim_optim[[4]]))
```

The map on the left is the expected, true map, and the map on the right is our simulation. While both maps are not identical, they have a similar spatial pattern with a dominance of forest (`1`

) (especially in the northeast part of the map) and a similar configuration of the patches.

In addition to looking at just the final map, we can also retrace the entire simulation process and its effect on the metrics of spatial patterns. Firstly, we can plot all of the simulated rasters:

```
names(sim_optim) = paste0("sim_year", 2:5)
plot(sim_optim, nr = 1)
```

Secondly, we can calculate their metrics of spatial patterns and compare their changes over time.

```
ci_df = data.frame(year = 1:5, metric = "composition index",
value = composition_index(c(map1, sim_optim)))
ti_df = data.frame(year = 1:5, metric = "texture index",
value = texture_index(c(map1, sim_optim)))
pred_df = rbind(ci_df, ti_df)
ggplot(pred_df, aes(year, value)) +
geom_line() +
facet_wrap(~metric, scales = "free_y")
```

As expected, the composition index increases over time, from negative values indicating a dominance of the `-1`

values to positive values indicating a dominance of the `1`

values. The texture index, on the other hand, decreases for the first two simulations and then increases for the last two simulations. There is a good (and well-known) reason for that: the composition of values has an impact on the spatial texture. The larger the dominance of one category, the more clustered the values are, and thus the higher the texture index is.

Given the assumption that the external pressure and the local autocorrelation tendency will remain the same, we can use the `kinetic_ising()`

function to predict future spatial patterns.

```
map2_pred = kinetic_ising(map2,
B = optim_params$par[1], J = optim_params$par[2],
updates = 4)
names(map2_pred) = paste0("sim_year", 6:9)
plot(map2_pred, nr = 1)
```

The above map shows the predicted spatial patterns for the years 6, 7, 8, and 9.

This blog post showed how to use the simulated annealing algorithm to find the best values of `B`

and `J`

that minimize the difference between the metrics of the expected map and the metrics of the last simulation. It allows not only the simulation of spatial patterns similar to the expected map but also the analysis of the simulation process and prediction of future spatial patterns.

Of course, there are many caveats and limitations of this approach. For example, the spatial kinetic Ising model assumes that the external pressure and the local autocorrelation tendency are constant over time over the entire area. Moreover, the model is not able to simulate patterns that create or modify linear features (e.g., rivers or roads).

To learn more about the spatial kinetic Ising model, its background, possible applications, and limitations, I encourage you to read Tomasz F. Stepinski (2023) and Tomasz F. Stepinski and Nowosad (2023).

Stepinski, Tomasz F. 2023. “Spatially Explicit Simulation of Deforestation Using the Ising-Like Neutral Model.” *Environmental Research: Ecology* 2 (2): 025003. https://doi.org/10.1088/2752-664x/acdbd2.

Stepinski, Tomasz F., and Jakub Nowosad. 2023. “The Kinetic Ising Model Encapsulates Essential Dynamics of Land Pattern Change.” *Royal Society Open Science* 10 (10): 231005. https://doi.org/10.1098/rsos.231005.

The optimization is also explained at https://jakubnowosad.com/spatialising/articles/Optimizing_spatialising_parameters.html.↩︎

BibTeX citation:

```
@online{nowosad2024,
author = {Nowosad, Jakub},
title = {Optimizing the Parameters of the Spatial Kinetic {Ising}
Model to Simulate Spatial Patterns},
date = {2024-01-07},
url = {https://jakubnowosad.com/posts/2024-01-07-spatialising-bp2/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2024. “Optimizing the Parameters of the Spatial
Kinetic Ising Model to Simulate Spatial Patterns.” January 7,
2024. https://jakubnowosad.com/posts/2024-01-07-spatialising-bp2/.

The above idea can be, in principle, applied to other two-dimensional systems, such as geographical (*spatial*) data. For example, we can think of a binary spatial raster as a two-dimensional system, where each cell can have one of two states: `-1`

or `1`

. These numbers can represent, for example, two land cover categories: forest and non-forest. Next, we can introduce two parameters influencing the state of each cell: `B`

and `J`

. `B`

is an external pressure: it tries to align cells’ values with its sign: are we more likely to have more `-1`

or `1`

values? `J`

is a strength of the local autocorrelation tendency—it tries to align signs of neighboring cells: are we more likely to have more cells of the same value in the neighborhood? The spatial kinetic Ising model is a simulation of such a system, where each cell is given *an opportunity* to flip its value (`1`

to `-1`

or `-1`

to `1`

). The probability of a flip depends on the value of the cell, the values of its four neighbors (top, left, bottom, and right), and the values of `B`

and `J`

.

This blog post shows how to use the spatial kinetic Ising model to simulate the change in the spatial pattern of a binary raster using the **spatialising** R package. The package is available on GitHub at https://github.com/Nowosad/spatialising/.

To reproduce the calculations in the following post, you need to attach the following packages:

```
library(terra)
library(spatialising)
library(ggplot2)
```

The `twomaps.tif`

file contains two maps of the same area, but for different years. The first map represents the year 1, and the second map represents the year 5. Both of them have only two values: `1`

for non-forest and `2`

for forest:

```
twomaps = rast("/vsicurl/https://github.com/Nowosad/bp-data/raw/main/spatialising-bp/twomaps.tif")
map1 = twomaps[[1]]
map2 = twomaps[[2]]
plot(c(map1, map2))
```

The spatial kinetic Ising model requires binary raster data with just two values: `-1`

and `1`

. Thus, the first step in our case is to reclassify the original binary (`1`

, `2`

) map into new (`-1`

, `1`

) values. This can be done with the `classify()`

function from the **terra** package.^{1} Here, we replace the `1`

value with `-1`

and the `2`

value with `1`

:

```
rcl = matrix(c(1, -1, 2, 1), byrow = TRUE, ncol = 2)
map1 = classify(map1, rcl)
map2 = classify(map2, rcl)
plot(c(map1, map2))
```

Now, our data is ready for use in the **spatialising** package. Its main function is `kinetic_ising()`

, which simulates the spatial kinetic Ising model. It requires the input raster (`x`

), the strength of the external pressure (`B`

), the strength of the local autocorrelation tendency (`J`

), and also has some optional arguments, such as the number of updates.

Our goal is to simulate the change in the spatial pattern of the first map (`map1`

) to make it similar to the second map (`map2`

). The code below simulates the spatial kinetic Ising model for the first map (`map1`

) with the value of `B`

of 0.3 (meaning that the external pressure is toward increased forest cover) and the value of `J`

of 0.7 (meaning that the local autocorrelation tendency is strong). We also set the number of updates to 4, which means that it will create four simulations (rasters), each of which will be based on the previous one.^{2}

```
sim1 = kinetic_ising(map1, B = 0.3, J = 0.7, updates = 4)
plot(sim1, nr = 1)
```

The result consists of four simulated rasters, which are stored in the `sim1`

object. Each of them represents successive simulations of the spatial kinetic Ising model.

We can also compare the final simulation (`sim1[[4]]`

) with the first (`map1`

) and the second map (`map2`

).

`plot(c(map1, map2, sim1[[4]]), nr = 1)`

Compared to the first map, the last simulation has slightly more forest cover, which is in line with the provided external pressure (`B`

) toward increased forest cover. The forest category also tends to be spatially clustered, which is in line with the set local autocorrelation tendency (`J`

). On the other hand, the simulation is still quite different from the second map (`map2`

), possibly indicating that we should increase the value of `B`

to make the simulation more like the second map.

The **spatialising** package also provides two functions to calculate metrics of spatial patterns of binary rasters: `composition_index()`

and `texture_index()`

. The composition imbalance index (`composition_index()`

) is a sum of cell’s values over the entire site divided by the number of cells in the site. It has a range from -1 (site completely dominated by the -1 values) to 1 (site completely dominated by the 1 values). The value of 0 indicates that the site is equally divided between the two values.

`composition_index(c(map1, map2, sim1[[4]]))`

`[1] -0.5000 0.5000 -0.4184`

In our case, `map1`

has a dominance of the `-1`

values, `map2`

has a dominance of the `1`

values, and `sim1[[4]]`

has a dominance of the `-1`

values, but it is less pronounced than in `map1`

.

The texture index (`texture_index()`

) is a measure of the spatial autocorrelation of the values of a raster. Its value is between 0 (fine texture), and 1 (coarse texture).

`texture_index(c(map1, map2, sim1[[4]]))`

`[1] 0.6477551 0.6216327 0.8387755`

In our examples, `map1`

and `map2`

have a rather similar texture, while `sim1[[4]]`

has a slightly coarser texture (its values have stronger spatial autocorrelation).

How does the spatial kinetic Ising model work? The simulation starts with the input binary (`-1`

, `1`

) raster and proceeds with one randomly selected cell at a time. The selected cell is given *an opportunity* to flip its value (`1`

to `-1`

or `-1`

to `1`

). The probability of a flip depends on the value of the cell and the values of its four neighbors (top, left, bottom, and right). It also depends on the values of `B`

and `J`

. `B`

(positive or negative) is an external pressure: it tries to align cells’ values with its sign. `J`

(always positive) is a strength of the local autocorrelation tendency: it tries to align signs of neighboring cells.

We can also control the model using a few additional arguments. The `iter`

argument controls the number of iterations—how many times the flip of a cell value is attempted before a new simulated raster is returned. By default, its value equals to the number of cells in the input raster. Next, `updates`

controls the number of simulated rasters returned—each of which is based on the previous one. The `inertia`

parameter (`0`

, by default), when positive makes it less likely for a cell of `-1`

to change its value to `1`

when surrounded by other `-1`

cells. As the effect, it minimizes the possibility of a “salt and pepper” effect, where cells of different values are mixed together. The last important argument is `rule`

, which controls how the probability of a flip is calculated: either using the `"glauber"`

(default) or `"metropolis"`

rule.

The code below compares the results of the spatial kinetic Ising model for different values of `B`

and `J`

.

```
sim2 = kinetic_ising(map1, B = -0.7, J = 0.1, updates = 4, inertia = 1)
sim3 = kinetic_ising(map1, B = -0.7, J = 0.7, updates = 4, inertia = 1)
sim4 = kinetic_ising(map1, B = 0, J = 0.1, updates = 4, inertia = 1)
sim5 = kinetic_ising(map1, B = 0, J = 0.7, updates = 4, inertia = 1)
sim6 = kinetic_ising(map1, B = 0.7, J = 0.1, updates = 4, inertia = 1)
sim7 = kinetic_ising(map1, B = 0.7, J = 0.7, updates = 4, inertia = 1)
all_sims = c(sim2[[4]], sim3[[4]], sim4[[4]], sim5[[4]], sim6[[4]], sim7[[4]])
names(all_sims) = c("B: -0.7, J: 0.1", "B: -0.7, J: 0.7", "B: 0, J: 0.1",
"B: 0, J: 0.7", "B: 0.7, J: 0.1", "B: 0.7, J: 0.7")
plot(all_sims, nc = 2)
```

The top row shows the results for `B`

values equal to `-0.7`

, the middle row shows the results for `B`

values equal to `0`

, and the bottom row shows the results for `B`

values equal to `0.7`

. The left column shows the results of the spatial kinetic Ising model for values of `J`

equal to `0.1`

, while the right column shows the results for values of `J`

equal to `0.7`

.

Quick visual comparison underlines that both parameters have a strong effect on the results of the spatial kinetic Ising model. Negative values of `B`

tend to decrease the forest cover, while positive values of `B`

tend to increase the forest cover. The effect of `J`

is, on the other hand, more related to the configuration of the values, with lower values of `J`

leading to more dispersed values, and higher values of `J`

leading to more clustered values.

This blog post showed how to use the **spatialising** package to simulate the spatial kinetic Ising model, how selected parameters influence the results, and how to calculate metrics of spatial patterns. However, it leaves one important question unanswered: how do you find the best values of `B`

and `J`

to make the simulation more like the second map? That is the topic of the next blog post.

To learn more about the spatial kinetic Ising model, I encourage you to read Tomasz F. Stepinski (2023) and Tomasz F. Stepinski and Nowosad (2023).

Stepinski, Tomasz F. 2023. “Spatially Explicit Simulation of Deforestation Using the Ising-Like Neutral Model.” *Environmental Research: Ecology* 2 (2): 025003. https://doi.org/10.1088/2752-664x/acdbd2.

Stepinski, Tomasz F., and Jakub Nowosad. 2023. “The Kinetic Ising Model Encapsulates Essential Dynamics of Land Pattern Change.” *Royal Society Open Science* 10 (10): 231005. https://doi.org/10.1098/rsos.231005.

This function can also be used to binarize continuous data or data with many categories.↩︎

Here, we can think of each simulation as a year, and the number of updates as the number of years.↩︎

BibTeX citation:

```
@online{nowosad2023,
author = {Nowosad, Jakub},
title = {Simulating Spatial Patterns with the Spatial Kinetic {Ising}
Model},
date = {2023-12-17},
url = {https://jakubnowosad.com/posts/2023-12-17-spatialising-bp1/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2023. “Simulating Spatial Patterns with the
Spatial Kinetic Ising Model.” December 17, 2023. https://jakubnowosad.com/posts/2023-12-17-spatialising-bp1/.

`lsp_search()`

and `lsp_compare()`

functions of the At the same time, it is possible to create other, more customized workflows. Here, I will show how to compare spatial patterns of two different areas and find the most unique land cover spatial pattern in the process.

To reproduce the calculations in the following post, you need to download all of the relevant datasets using the code below:

```
library(osfr)
dir.create("data")
osf_retrieve_node("xykzv") |>
osf_ls_files(n_max = Inf) |>
osf_download(path = "data",
conflicts = "skip")
```

You should also attach the following packages:

```
library(sf)
library(terra)
library(motif)
library(dplyr)
library(readr)
library(cluster)
```

The `data/land_cover.tif`

contains land cover data for Africa. It is a categorical raster of the 300-meter resolution that can be read into R using the `rast()`

function.

`lc = rast("data/land_cover.tif")`

Additionally, the `data/lc_palette.csv`

file contains information about the labels and colors of each land cover category.

`lc_palette_df = read.csv("data/lc_palette.csv")`

We will use this file to integrate labels and colors into the raster object:

```
levels(lc) = lc_palette_df[c("value", "label")]
coltab(lc) = lc_palette_df[c("value", "color")]
plot(lc)
```

First, we need to define the areas for which we want to compare spatial patterns. For the example purpose, we use two African countries: Cameroon and Congo. We can download their areas using the **rnaturalearth** package, and use them to crop the `lc`

raster object to their borders:

```
library(rnaturalearth)
# download
cameroon = ne_countries(country = "Cameroon", returnclass = "sf") |>
select(name) |>
st_transform(crs = st_crs(lc))
congo = ne_countries(country = "Republic of the Congo", returnclass = "sf") |>
select(name) |>
st_transform(crs = st_crs(lc))
# crop
lc_cameroon = crop(lc, cameroon, mask = TRUE)
lc_congo = crop(lc, congo, mask = TRUE)
# plot
plot(lc_cameroon)
plot(lc_congo)
```

Both countries have similar shares of land cover categories, with the domination of forests and some agricultural and grassland areas, as we can see by calculating their `"composition"`

signatures.

```
lc_cameroon_composition = lsp_signature(lc_cameroon, type = "composition", classes = 1:9)
lc_congo_composition = lsp_signature(lc_congo, type = "composition", classes = 1:9)
round(lc_cameroon_composition$signature[[1]], 2)
```

```
1 2 3 4 5 6 7 8 9
[1,] 0.15 0.77 0.03 0 0 0.04 0 0 0.01
```

`round(lc_congo_composition$signature[[1]], 2)`

```
1 2 3 4 5 6 7 8 9
[1,] 0.1 0.82 0.04 0 0 0.03 0 0 0
```

We can also look at their spatial patterns (both composition and configuration) by calculating the `"cove"`

signature.

```
lc_cameroon_cove = lsp_signature(lc_cameroon, type = "cove", classes = 1:9)
lc_congo_cove = lsp_signature(lc_congo, type = "cove", classes = 1:9)
```

Next, these signatures can be compared using dissimilarity measures. The **philentropy** package provides a wide range of such measures, including the Jensen-Shannon divergence. Here, we use this measure to calculate the dissimilarity between the spatial patterns (as represented with `"cove"`

) of Cameroon and Congo.

```
library(philentropy)
dist_cove = dist_one_one(lc_cameroon_cove$signature[[1]],
lc_congo_cove$signature[[1]],
method = "jensen-shannon")
dist_cove
```

`[1] 0.008919291`

This value is small (approximately 0.009), which means that, in general, Cameroon’s and Congo’s spatial patterns are fairly similar.

We can also look at the local spatial patterns of Cameroon and Congo, here on a scale of 100 by 100 cells (i.e., 30 by 30 km):

```
lc_cameroon_cove100 = lsp_signature(lc_cameroon, type = "cove",
window = 100, classes = 1:9)
lc_congo_cove100 = lsp_signature(lc_congo, type = "cove",
window = 100, classes = 1:9)
```

To compare these signatures, we can calculate the Jensen-Shannon divergence for each pair of signatures in both datasets. This can be done using the `dist_many_many()`

function from the **philentropy** package, which expects two matrices as input.

```
lc_cameroon_cove100_mat = do.call(rbind, lc_cameroon_cove100$signature)
lc_congo_cove100_mat = do.call(rbind, lc_congo_cove100$signature)
dist_cove_100 = dist_many_many(lc_cameroon_cove100_mat,
lc_congo_cove100_mat,
method = "jensen-shannon")
```

The result is a matrix with the Jensen-Shannon divergence between each pair of areas in both countries, in which rows represent areas in Cameroon and columns represent areas in Congo. Lower values indicate more similar spatial patterns, while higher values indicate more dissimilar spatial patterns. This matrix shows that there are some areas with similar spatial patterns in both countries, and some are even identical (given the source data scale/resolution and scope/number and variety of categories):

`summary(c(dist_cove_100))`

```
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.03228 0.08722 0.16375 0.22835 0.69315
```

Identifiers of the identical areas can be found using the `which()`

function. For example, area `341`

in Cameroon and area `4`

in Congo have the same spatial pattern:

`head(which(dist_cove_100 == 0, arr.ind = TRUE))`

```
row col
[1,] 341 4
[2,] 418 4
[3,] 423 4
[4,] 440 4
[5,] 462 4
[6,] 477 4
```

We can add spatial information to the `lc_cameroon_cove100`

and `lc_congo_cove100`

objects using the `lsp_add_sf()`

function. Then, we are able to visualize these areas by cropping the land cover data using the new objects. In this case, both areas are fully covered by forest (although the second one is located at the border, and thus contains some NA values).

```
lc_cameroon_cove100_sf = lsp_add_sf(lc_cameroon_cove100)
lc_congo_cove100_sf = lsp_add_sf(lc_congo_cove100)
plot(crop(lc_cameroon, lc_cameroon_cove100_sf[341, ]), main = "Cameroon")
plot(crop(lc_congo, lc_congo_cove100_sf[4, ]), main = "Congo")
```

We can group areas with similar spatial patterns of land cover using the `pam()`

function from the **cluster** package. For this example, we will divide the areas into six groups.

`my_pam = pam(rbind(lc_cameroon_cove100_mat, lc_congo_cove100_mat), 6)`

Next, we can add the clustering results to the spatial object by naming both existing `sf`

objects, combining them into one, and adding the clustering results as a new column.

```
lc_cameroon_cove100_sf$name = "Cameroon"
lc_congo_cove100_sf$name = "Congo"
lc_cove100_sf = rbind(lc_cameroon_cove100_sf, lc_congo_cove100_sf)
lc_cove100_sf$k = as.factor(my_pam$clustering)
```

Visualization of the results is shown below:

```
plot(subset(lc_cove100_sf, name == "Cameroon")["k"], pal = palette.colors, main = "Cameroon")
plot(subset(lc_cove100_sf, name == "Congo")["k"], pal = palette.colors, main = "Congo")
```

You may quickly notice that the sixth and fifth clusters exist prominently in both countries. On the other hand, cluster 2 only exists in Cameroon.

We can look at each cluster representative by subsetting the `lc_cove100_sf`

object using the `id.med`

column from the `my_pam`

object.

```
lc_cove100_sf_subset = lc_cove100_sf[my_pam$id.med, ]
for (i in seq_len(nrow(lc_cove100_sf_subset))){
plot(crop(lc, lc_cove100_sf_subset[i, ]), main = i)
}
```

Cluster 6 represents forest areas, and cluster 4 consists of areas predominantly covered by forests and some agricultural and grassland areas. Cluster 5 is represented by forest, but with a substantial share of agriculture and grasslands and cluster 1 is a mix of highly aggregated agriculture and forest. Cluster 3 are areas with a large share of shrublands, agricultural and forest areas. Finally, cluster 2, which only exists in Cameroon, represents large (30 by 30 km) areas of agriculture.

The `dist_cove_100`

object contains the Jensen-Shannon divergence between each pair of areas in both countries, where rows represent areas in Cameroon and columns represent areas in Congo. Usually, it may be used to find the most similar areas (areas with the smallest divergence), but here, we will look for the most unique areas.

This can be done in two steps. First, we need to calculate the smallest value in each row and column, which can be done using the `apply()`

function. This allows us to find what is the smallest divergence between each area in Cameroon and Congo; in other words, how dissimilar is an area in one country to the most similar area in the other country.

```
lc_cameroon_cove100_sf$min_dist = apply(dist_cove_100, 1, min)
plot(lc_cameroon_cove100_sf["min_dist"], main = "Cameroon")
lc_congo_cove100_sf$min_dist = apply(dist_cove_100, 2, min)
plot(lc_congo_cove100_sf["min_dist"], main = "Congo")
```

Second, we can find the area with the largest value in the `lc_cameroon_cove100_sf$min_dist`

column, which is the most unique area in Cameroon, and the area with the largest value in `lc_congo_cove100_sf$min_dist`

, which is the most unique area in Congo. In other words, these areas are the most dissimilar to any area in the other country.

```
most_unique_cameroon = lc_cameroon_cove100_sf[which.max(lc_cameroon_cove100_sf$min_dist), ]
plot(crop(lc_cameroon, most_unique_cameroon), main = "Cameroon")
most_unique_congo = lc_congo_cove100_sf[which.max(lc_congo_cove100_sf$min_dist), ]
plot(crop(lc_congo, most_unique_congo), main = "Congo")
```

In the case of Cameroon, such an area is a mosaic of agriculture and grasslands; for Congo, it is a complex area with grasslands, agriculture, forest, and some shrublands. Interestingly, both areas are located at the border of the countries.^{1}

In this post, we have seen how to compare spatial patterns of land cover in two different areas. It also showed how to find the most unique land cover spatial pattern (try to find the most unique area in your country as compared to the rest of the world!) This approach can be used to find areas with unique spatial land cover patterns or any other categorical rasters. To learn more about the **motif** package, see the other blog posts in the “motif” category.

You could change the

`threshold`

parameter in`lsp_signature()`

to 0 to only include areas completely inside the countries’ borders.↩︎

BibTeX citation:

```
@online{nowosad2023,
author = {Nowosad, Jakub},
title = {Finding the Most Unique Land Cover Spatial Pattern},
date = {2023-12-03},
url = {https://jakubnowosad.com/posts/2023-12-03-motif-bp8/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2023. “Finding the Most Unique Land Cover Spatial
Pattern.” December 3, 2023. https://jakubnowosad.com/posts/2023-12-03-motif-bp8/.

To reproduce the calculations in the following post, you need to download all of the relevant datasets using the code below:

```
library(osfr)
dir.create("data")
osf_retrieve_node("xykzv") |>
osf_ls_files(n_max = Inf) |>
osf_download(path = "data",
conflicts = "skip")
```

You should also attach the following packages:

```
library(sf)
library(terra)
library(motif)
library(dplyr)
library(readr)
library(cluster)
library(ggplot2)
```

The `data/land_cover.tif`

contains land cover data for Africa. It is a categorical raster of the 300-meter resolution that can be read into R using the `rast()`

function.

`lc = rast("data/land_cover.tif")`

Additionally, the `data/lc_palette.csv`

file contains information about the labels and colors of each land cover category.

`lc_palette_df = read.csv("data/lc_palette.csv")`

We will use this file to integrate labels and colors into the raster object:

```
levels(lc) = lc_palette_df[c("value", "label")]
coltab(lc) = lc_palette_df[c("value", "color")]
plot(lc)
```

As already shown in the previous blog posts about **motif**, the `lsp_signature()`

function can be used to extract spatial signatures from a categorical raster object that can be used to describe spatial patterns of land cover. The most fundamental signature is *co*-occurrence *ma*trix (*coma*), which is a matrix of co-occurrence frequencies of each pair of land cover categories. The `lsp_signature()`

function can be used to extract the coma signature in 300 by 300 cells non-overlapping windows (i.e., 90 by 90 km) as follows:

`lc_coma = lsp_signature(lc, type = "coma", window = 300)`

The output is a data frame with 3,843 rows and 3 columns. The most important one is the `signature`

column, which contains *coma* signatures in each window.

The co-occurrence matrix can be thought of as a compression of information about the composition and configuration of land cover categories in a given window. However, as it consists of many numbers (here, 81), it is not easy to directly analyze, visualize, or interpret. Gladly, we can further extract information from this signature using metrics from the information theory, such as marginal entropy and relative mutual information (for more details, see Nowosad and Stepinski (2019) and the “Information theory provides a consistent framework for the analysis of spatial patterns” blog post).

The `it_metric()`

function from the **comat** package can be used to calculate these metrics for each *coma* signature. Here, we calculate marginal entropy (`"ent"`

) and relative mutual information (`"relmutinf"`

), and add them to the `lc_coma`

data frame.

```
lc_coma$ent = vapply(lc_coma$signature, comat::it_metric,
FUN.VALUE = numeric(1), metric = "ent")
lc_coma$relmutinf = vapply(lc_coma$signature, comat::it_metric,
FUN.VALUE = numeric(1), metric = "relmutinf")
```

In short, the marginal entropy is a measure of diversity (thematic complexity, composition) of spatial categories — the larger the entropy, the more diverse the categories in the window. The relative mutual information is a measure of spatial autocorrelation (configuration) of spatial categories – the larger the relative mutual information, the more autocorrelated the categories in the window are.

Importantly, both metrics are uncorrelated, which means that they describe different aspects of spatial patterns of land cover:

`plot(lc_coma$ent, lc_coma$relmutinf)`

We can visualize the spatial distribution of these metrics’ values by removing the `signature`

column and converting the `lc_coma`

object to an `sf`

class:

```
lc_coma$signature = NULL
lc_coma_sf = lsp_add_sf(lc_coma)
plot(lc_coma_sf["ent"], border = NA)
plot(lc_coma_sf["relmutinf"], border = NA)
```

We are also able to look at some examples of areas with representative values of these metrics. For that purpose, we use the `pam()`

method (*Partitioning Around Medoids*) to cluster the `lc_coma`

data frame into six groups based on the scaled values of `ent`

and `relmutinf`

.

`pam = pam(scale(lc_coma[, c("ent", "relmutinf")]), 6)`

We can see all of the groups on the map by adding a new column with cluster labels to the `lc_coma_sf`

object and plotting it:

```
lc_coma_sf$cluster = pam$clustering
plot(lc_coma_sf["cluster"], border = NA, pal = palette.colors(6))
```

Then, we can select one representative from each cluster in a loop using the `crop()`

function and visualize it using `plot()`

.

```
lc_coma_sf_subset = lc_coma_sf[pam$id.med, ]
for (i in seq_len(nrow(lc_coma_sf_subset))){
ent_sel = round(lc_coma_sf_subset[i, "ent", drop = TRUE], 2)
relmutinf_sel = round(lc_coma_sf_subset[i, "relmutinf", drop = TRUE], 2)
plot(crop(lc, lc_coma_sf_subset[i, ]),
main = paste0(i, " ent: ", ent_sel, " relmutinf: ", relmutinf_sel))
}
```

As you can see above, the `ent`

and `relating`

metrics can be used to describe various spatial patterns of land cover. The last group, `6`

, is the simplest one with just one land cover category. Next, groups `4`

and `5`

represent areas with a low diversity of land cover categories but with different spatial autocorrelation: Group `4`

has lower spatial autocorrelation (is more fragmented), while group `5`

has higher spatial autocorrelation (is less fragmented). Groups `2`

and `3`

are areas with medium diversity of land cover categories, but with different spatial autocorrelation: group `2`

has higher spatial autocorrelation, while group `3`

has lower spatial autocorrelation. Finally, group `1`

is an area with a high diversity of land cover categories and a medium spatial autocorrelation.

Importantly, these metrics do not provide any information about the actual land cover categories. Thus, to look at the results in more depth, we can add information about land cover shares in each window to the `lc_coma_sf`

data frame and use it in further analysis.

Here, we can use the `"composition"`

type of signature to extract information about land cover shares in each window, restructure it from a list column to a set of columns, and add it to the `lc_coma_sf`

data frame.

```
lc_composition = lsp_signature(lc, type = "composition", window = 300)
lc_composition = lsp_restructure(lc_composition)
lc_coma_sf = left_join(lc_coma_sf, lc_composition)
```

Now, you are able to subset your data frame based on the land cover shares and analyze the spatial patterns for various types of areas. You can also repeat the above calculations for two time periods or two areas and compare the results.

This blog post shows how to extract information about the composition and configuration of spatial patterns, visualize it on a map, and look at representative examples. For more details about the information theory-based metrics, see the “Information theory provides a consistent framework for the analysis of spatial patterns” blog post. To learn more about the **motif** package, see the other blog posts in the “motif” category.

Nowosad, Jakub, and Tomasz F. Stepinski. 2019. “Information Theory as a Consistent Framework for Quantification and Classification of Landscape Patterns.” *Landscape Ecology* 34 (9): 2091–101. https://doi.org/10.1007/s10980-019-00830-x.

BibTeX citation:

```
@online{nowosad2023,
author = {Nowosad, Jakub},
title = {Extracting Information about Spatial Patterns from Spatial
Signatures},
date = {2023-11-18},
url = {https://jakubnowosad.com/posts/2023-11-18-motif-bp7/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2023. “Extracting Information about Spatial
Patterns from Spatial Signatures.” November 18, 2023. https://jakubnowosad.com/posts/2023-11-18-motif-bp7/.

The main idea of supercells is to create groupings of adjacent cells that share common characteristics. This process often results in an over-segmentation – a situation when supercells are internally homogeneous, but not very different from their neighbors. Thus, we need to find the best way to merge similar adjacent supercells into larger entities (regions). It can be done with various approaches, both supervised and unsupervised. Here, we will focus on two examples of unsupervised approaches. The first one, k-means, is a general clustering method, while the second one, SKATER, is a spatial clustering method. The main goals of this blog post are to show how the merging of supercells can be performed and how to evaluate the obtained results.

Let’s start by attaching the packages and reading the input data. We will use the same packages as in the first blog post with one addition – **dplyr**.

```
library(supercells)
library(terra)
library(sf)
library(regional)
library(tmap)
library(dplyr)
```

Here, we will also use the same dataset cropped to an area of 50 by 100 cells.

```
flc = rast("/vsicurl/https://github.com/Nowosad/supercells-examples/blob/main/raw-data/all_ned.tif?raw=true")
flc1 = project(flc, "EPSG:3035", method = "near")
flc1 = flc[101:150, 1:100, drop = FALSE]
```

Our next preparation step will be to create a set of supercells representing areas with homogeneous arrangements of fractional land cover values (Figure 1).

```
flc_sc = supercells(x = flc1,
step = 12, compactness = 0.1,
dist_fun = "jensen-shannon")
```

```
tm_shape(flc1[[c(1, 5)]]) +
tm_raster(style = "cont", palette = "cividis", title = "Fraction:") +
tm_shape(flc_sc) +
tm_borders(col = "red")
```

As a final preparation step, we will also calculate supercells areas and keep them in the `area_km2`

column.

`flc_sc$area_km2 = as.numeric(st_area(flc_sc)) / 1000000`

K-means clustering partitions observations into a selected number (*k*) of clusters. In our case, each observation is a supercell, and thus, each k-means cluster will represent a set of supercells. Notably, the k-means algorithm is unaware of spatial relationships in our data – it does not know that some supercells are adjacent to others. Therefore, clusters returned by the k-means algorithm may consist of many supercells located in different places in our area of interest.

To use the k-means algorithm, we need to extract only variables (columns) with fractional land cover values:

```
vars = c("Forest", "Shrubland", "Grassland", "Bare.Sparse.vegatation",
"Cropland", "Built.up", "Seasonal.inland.water",
"Permanent.inland.water")
flc_sc_df = st_drop_geometry(flc_sc)[vars]
```

Then, we can apply the clustering by making one major decision – what is the expected number of clusters (*k*/the `centers`

argument). For this example, I created four clusters:^{1}

```
set.seed(2022-11-10)
km_results = kmeans(flc_sc_df, centers = 4)
```

Next, we need to add a new column to our data, which represents our clusters.

```
flc_sc_kmeans = flc_sc |>
mutate(kmeans = km_results$cluster)
```

Importantly, this operation does not change our geometries, and so our polygons are still the same as the input supercells.^{2} However, some clusters lie next to each other and can be merged. This can be done with a combination of `group_by()`

and `summarise()`

:^{3}

```
flc_sc_kmeans = flc_sc_kmeans |>
group_by(kmeans) |>
summarise() |>
st_make_valid()
```

Now, the `flc_sc_kmeans`

object has only four rows – one for each cluster, but as we mentioned before, they may consist of polygons located in different places in our area of interest. Thus, another step is needed here – to move each separate polygon into each own feature (row):

```
flc_sc_kmeans2 = st_cast(flc_sc_kmeans, "POLYGON")
flc_sc_kmeans2$kmeans = seq_along(flc_sc_kmeans2$kmeans)
```

We also replaced our `kmeans`

column with a unique identifier for each row. Now, we have eleven polygons representing groupings of adjacent cells that share common characteristics. We may also calculate what the average is^{4} fraction of each land cover in each polygon. This can be done by extracting values from our original raster and then summarizing them:

```
names(flc1) = vars
flc_sc_kmeans2_vals = extract(flc1, flc_sc_kmeans2, weights = TRUE) |>
group_by(ID) |>
summarise(across(all_of(vars), \(x) weighted.mean(x, weight)))
flc_sc_kmeans2 = cbind(flc_sc_kmeans2, flc_sc_kmeans2_vals)
```

Figure 2 summarized the results. Firstly, we created four clusters using the k-means algorithm (the left panels). You can notice that the first cluster relates to regions with large fractions of forests, while the third cluster has regions with large fractions of croplands. The second and fourth clusters have dominant shares of forests and croplands, respectively. Then each cluster is split into a few regions:

```
tm1 = tm_shape(flc1[[c(1, 5)]]) +
tm_raster(legend.show = FALSE) +
tm_facets(ncol = 1) +
tm_shape(flc_sc_kmeans) +
tm_polygons(col = "kmeans", style = "cat", palette = "Set1", title = "k:")
tm2 = tm_shape(flc1[[c(1, 5)]]) +
tm_raster(legend.show = FALSE) +
tm_facets(ncol = 1) +
tm_shape(flc_sc_kmeans2) +
tm_polygons(col = "kmeans", style = "cat", palette = "Set1", title = "ID:")
tm3 = tm_shape(flc1[[c(1, 5)]]) +
tm_raster(style = "cont", palette = "cividis", title = "Fraction:") +
tm_facets(ncol = 1) +
tm_shape(flc_sc_kmeans2) +
tm_borders(col = "red")
tmap_arrange(tm1, tm2, tm3)
```

In this case, an alternative group of methods exists. Instead of using clustering methods, such as k-means, we can use some regionalization methods (they are sometimes referred to as spatial clustering methods).^{5} SKATER (Spatial ’K’luster Analysis by Tree Edge Removal) is a procedure based on a graph representation of the input data. It prunes the graph to its minimum spanning tree (MST) and then iteratively partitions the graph by identifying edges (connections between neighbors) whose removal increases the objective function (between-group dissimilarity) the most. This iterative process stops when a specified number of regions is obtained.

Here, we will use the SKATER procedure as implemented in `skater()`

from the **rgeoda** package, but we need to start by preparing a few input objects.

The first one is a data frame with the variables of interest for each supercell:

`library(rgeoda)`

`Loading required package: digest`

```
vars = c("Forest", "Shrubland", "Grassland", "Bare.Sparse.vegatation",
"Cropland", "Built.up", "Seasonal.inland.water",
"Permanent.inland.water")
flc_sc_df = st_drop_geometry(flc_sc)[vars]
```

The second one is an object of class `Weight`

that contains information about neighbors in our data:

`queen_w = queen_weights(flc_sc)`

The third (optional) object is a vector of distances between values of our supercells. In this case, we will calculate the Jensen-Shannon distance (the same that we used to create our supercells):

```
weight_dist = philentropy::distance(flc_sc_df,
method = "jensen-shannon",
as.dist.obj = TRUE)
weight_dist = as.vector(weight_dist)
```

Now, we can use the `skater()`

function by providing the expected number of regions (eleven in this case to match the k-means result) and the previously created objects:

```
skater_results = skater(11, w = queen_w,
df = flc_sc_df, rdist = weight_dist)
```

`aaa0x56039dda6d50after gda_skater`

Next, we need to add a new column to our data that represents our regions:

```
flc_sc_skater = flc_sc |>
mutate(skater = skater_results$Clusters)
```

Similarly to the k-means example, we will:

- merge supercells belonging to the same regions:

```
#1
flc_sc_skater = flc_sc_skater |>
group_by(skater) |>
summarise() |>
st_make_valid()
```

- add columns with average fractions of each land cover for each region:

```
#2
names(flc1) = vars
flc_sc_skater_vals = extract(flc1, flc_sc_skater, weights = TRUE) |>
group_by(ID) |>
summarise(across(all_of(vars), \(x) weighted.mean(x, weight)))
flc_sc_skater = cbind(flc_sc_skater, flc_sc_skater_vals)
```

Figure 3 shows the SKATER results.

```
tm1s = tm_shape(flc1[[c(1, 5)]]) +
tm_raster(legend.show = FALSE) +
tm_facets(ncol = 1) +
tm_shape(flc_sc_skater) +
tm_polygons(col = "skater", style = "cat", palette = "Set1")
tm2s = tm_shape(flc1[[c(1, 5)]]) +
tm_raster(style = "cont", palette = "cividis", title = "Fraction:") +
tm_facets(ncol = 1) +
tm_shape(flc_sc_skater) +
tm_borders(col = "red")
tmap_arrange(tm1s, tm2s)
```

The quality of the regionalization results can be evaluated in two main ways: external and internal. The external evaluation compares the obtained regions with, for example, other existing regionalizations. This can be done in R using the **sabre** package (J. Nowosad and Stepinski 2018).

Here, we will focus on internal evaluation. Internal evaluation checks how well the regions encapsulate similar values and differ from their neighbors. This evaluation can be performed with the two functions of the **regional** package:

`reg_inhomogeneity()`

that calculates the internal inhomogeneity of each region (the lower the value, the better)`reg_isolation()`

that calculates the isolation of each region from its neighbors (the higher the value, the better)

Let’s start by calculating the inhomogeneity of each region for our two approaches:

```
vars = c("Forest", "Shrubland", "Grassland", "Bare.Sparse.vegatation",
"Cropland", "Built.up", "Seasonal.inland.water",
"Permanent.inland.water")
flc_sc_kmeans2$inh = reg_inhomogeneity(flc_sc_kmeans2[vars], flc1,
dist_fun = "jensen-shannon",
sample_size = 200)
flc_sc_skater$inh = reg_inhomogeneity(flc_sc_skater[vars], flc1,
dist_fun = "jensen-shannon",
sample_size = 200)
```

The visual comparison shows that, in general, k-means-based regions have slightly smaller inhomogeneity values.

```
my_breaks = seq(0,
max(flc_sc_kmeans2$inh, flc_sc_skater$inh) + 0.05,
by = 0.05)
tm_inh2 = tm_shape(flc_sc_kmeans2) +
tm_polygons(col = "inh", breaks = my_breaks, style = "cont", palette = "-viridis") +
tm_layout(title = "k-means2")
tm_inh3 = tm_shape(flc_sc_skater) +
tm_polygons(col = "inh", breaks = my_breaks, style = "cont", palette = "-viridis") +
tm_layout(title = "SKATER")
tmap_arrange(tm_inh2, tm_inh3, nrow = 1)
```

This can also be confirmed by calculating area-weighted inhomogeneity, which is a bit smaller for k-means-based regionalization.

```
area_weighted_inhomogeneity = function(x){
x$area_km2 = as.numeric(st_area(x)) / 1000000
round(weighted.mean(x$inh, x$area_km2), 3)
}
area_weighted_inhomogeneity(flc_sc_kmeans2)
```

`[1] 0.08`

`area_weighted_inhomogeneity(flc_sc_skater)`

`[1] 0.089`

We can also compare how different our regions are from their neighbors with `reg_isolation()`

:

```
flc_sc_kmeans2$iso = reg_isolation(flc_sc_kmeans2[vars], flc1,
dist_fun = "jensen-shannon",
sample_size = 200)
flc_sc_skater$iso = reg_isolation(flc_sc_skater[vars], flc1,
dist_fun = "jensen-shannon",
sample_size = 200)
```

In this case, the maps suggest that SKATER has slightly larger isolation values.

```
my_breaks2 = seq(0,
max(flc_sc_kmeans2$iso, flc_sc_skater$iso) + 0.05,
by = 0.05)
tm_iso2 = tm_shape(flc_sc_kmeans2) +
tm_polygons(col = "iso", breaks = my_breaks2, style = "cont", palette = "-magma") +
tm_layout(title = "k-means2")
tm_iso3 = tm_shape(flc_sc_skater) +
tm_polygons(col = "iso", breaks = my_breaks2, style = "cont", palette = "-magma") +
tm_layout(title = "SKATER")
tmap_arrange(tm_iso2, tm_iso3, nrow = 1)
```

This is confirmed by calculating the average isolation for both approaches, which is larger (better) for SKATER:

`round(mean(flc_sc_kmeans2$iso), 3)`

`[1] 0.219`

`round(mean(flc_sc_skater$iso), 3)`

`[1] 0.222`

This blog post showed two examples of how to process supercells to create spatially homogeneous regions: one approach used the k-means algorithm, while the other applied the SKATER procedure. It is worth mentioning, however, that many more clustering techniques exist that could be used for this purpose; it includes non-spatial algorithms (e.g., hierarchical clustering) and spatial algorithms (e.g., REDCAP).

The results obtained in this blog post suggested that the k-means approach resulted in (slightly) more homogeneous regions, while regions from the SKATER approach were (also slightly) more isolated from their neighbors. However, you need to remember that the example area was very small and consisted of just a handful of supercells. Our study for a larger area (J. Nowosad, Stepinski, and Iwicki 2022) suggests that SKATER give better results in term of both inhomogeneity and isolation. SKATER (and other spatial clustering methods) should be, in general, preferred over k-means (and other non-spatial clustering methods). They not only result in better regionalizations, but also allow to specify of the output number of regions directly. That being said, the k-means algorithm could also be useful in some cases, especially for large data with hundreds of thousands of supercells (polygons).

If you want to learn more about **supercells** and regionalizations based on them, we encourage you to try a few entry points. One is the Jakub Nowosad and Stepinski (2022) article that explains the main ideas and applies them to three examples of non-imagery data. You can also see slides from a talk entitled “A method for universal superpixels-based regionalization” that I gave during the FOSS4G 2022 conference at https://jakubnowosad.com/foss4g-2022/. Finally, the package has extensive documentation, including several vignettes, that can be found at https://jakubnowosad.com/supercells/.

Aydin, Orhun, Mark. V. Janikas, Renato Martins Assunção, and Ting-Hwan Lee. 2021. “A Quantitative Comparison of Regionalization Methods.” *International Journal of Geographical Information Science* 35 (11): 2287–2315. https://doi.org/gnk4b4.

Nowosad, Jakub, and Tomasz F. Stepinski. 2022. “Extended SLIC Superpixels Algorithm for Applications to Non-Imagery Geospatial Rasters.” *International Journal of Applied Earth Observation and Geoinformation* 112 (August): 102935. https://doi.org/10.1016/j.jag.2022.102935.

Nowosad, J., and T. F. Stepinski. 2018. “Spatial Association Between Regionalizations Using the Information-Theoretical *V* -Measure.” *International Journal of Geographical Information Science* 32 (12): 2386–2401. https://doi.org/gf283f.

Nowosad, J., T. F. Stepinski, and M. Iwicki. 2022. “A method for universal supercells-based regionalization (preliminary results).” *The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences* XLVIII-4/W1-2022 (August): 337–44. https://doi.org/10.5194/isprs-archives-XLVIII-4-W1-2022-337-2022.

There are several approaches for deciding the number of clusters. Some are based on prior knowledge and some use various methods to suggest an optimal

*k*number (e.g., Elbow method).↩︎Try

`plot(flc_sc_kmeans["kmeans"])`

↩︎Additionally, we used

`st_make_valid()`

to make sure that our result has valid geometries.↩︎Area-weighted mean, to be precise.↩︎

Read Aydin et al. (2021) to see a comparison of regionalizations methods.↩︎

BibTeX citation:

```
@online{nowosad2023,
author = {Nowosad, Jakub},
title = {Spatial Regionalization Using Universal Superpixels
Algorithm},
date = {2023-05-15},
url = {https://jakubnowosad.com/posts/2023-05-15-supercells-bp2/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2023. “Spatial Regionalization Using Universal
Superpixels Algorithm.” May 15, 2023. https://jakubnowosad.com/posts/2023-05-15-supercells-bp2/.

Segmentation is a process of partitioning space into smaller segments. For example, imagine looking at your family photo and trying to distinct individual people. Similarly, we can look at a satellite image (in RGB colors) with the goal of delineating where are the buildings, fields, roads, etc. In geography, segmentation can also be associated with regionalization. Here, our goal is not to detect objects (e.g., people or buildings) but rather areas with similar properties.

Segmentation/regionalization methods should minimize internal inhomogeneity and maximize external isolation. First, each segment (you can think of them as polygons) should contain similar values.^{1} For example, let’s consider two segments representing two different roofs. One roof is entirely covered by red tiles, while the roof of second one looks like a chessboard with red and brown tiles. Both of the segments are homogeneous, however, the latter is more complex than the former. Second segmentation property is the maximization of external isolation. This means that any given segment is different from its neighbors as much as possible.

Segmentation is an optimization problem – trying and testing all of possible segments borders and their properties may take a very long time, even for relatively small data. For that reason, some heuristics need to be used. One way to improve the output and reduce the time/processing cost of segmentation is to perform a preprocessing stage with superpixels.

The main idea of superpixels is to create groupings of adjacent cells that share common characteristics. This process often results in an over-segmentation – this is a situation when our segments (superpixels) are internally homogeneous, but not always very different from their neighbors.

Superpixels are used for two main reasons:

- Pixels are not natural entities. They are rather a consequence of the discrete representation of data. For example, depending on the data resolution, our roofs from the previous example can consist of 20 pixels or be just a fraction of one pixel.
- Superpixels, as groupings of adjacent cells, reduce the dimensionality of the data making further segmentation tasks easier. For example, we may end up with 5,000 superpixels, instead of 150,000 original pixels.

Many superpixels algorithms have been developed; see Stutz, Hermans, and Leibe (2018). The SLIC algorithm (Achanta et al. 2012) is one of the most often used superpixel algorithms due to its simplicity, accuracy, and low computational cost. It starts with cluster centers spaced by the interval of . Each cell is assigned to the nearest cluster center, and the distance is calculated between the cluster centers and cells in the region. Afterward, new cluster centers (centroids) are updated for the new superpixels, and their color values are the average of all the cells belonging to the given superpixel. The SLIC algorithm works iteratively, repeating the above process until it reaches the expected number of iterations.

The distance is calculated as:

where is the color (spectral) distance, is the compactness parameter, is the spatial (Euclidean) distance, and is the interval between the initial cluster centers.

Typical workflow for the original SLIC algorithm is to convert RGB image into the LAB color space and then use it to create superpixels. In that case, the distance depends on (a) the Euclidean spatial distance between a cell and a superpixel centroid and (b) the Euclidean color distance between a cell’s LAB values and a superpixel centroid average LAB values. As originally implemented by its authors, the SLIC algorithm has the RGB image hard-wired as input data. Thus, its geospatial applications remain restricted to images, RGB, multispectral, or hyperspectral.

In Nowosad and Stepinski (2022), we propose an extension of SLIC that can be applied to non-imagery geospatial rasters. This includes rasters that carry:

- Pattern information (co-occurrence matrices)
- Compositional information (histograms)
- Time-series information (ordered sequences)
- Other forms of information for which the use of Euclidean distance may not be justified
^{2}

The extended SLIC allows using any distance measure to calculate the semantic distance – can be replaced with any distance/dissimilarity measure. We implemented the above idea as an R package {supercells}. Note that we decided to use the super**cells** name (instead of super**pixels**) to highlight that the method can be applied to various spatial raster data. The package installation instructions can be found at https://jakubnowosad.com/supercells/.

This blog post presents a short example of using spatial raster data with compositional information (histograms). For the study site located in the eastern Netherlands, we downloaded fractions of a pixel’s area covered by different land cover classes (source: Copernicus Global Land Service: 2019 Land Cover 100m-resolution data). Our goal is to create superpixels with similar fractions of land cover classes (Figure 1).

Let’s start by attaching the packages and reading the input data:

```
library(supercells)
library(terra)
library(sf)
library(regional)
library(tmap)
flc = rast("/vsicurl/https://github.com/Nowosad/supercells-examples/blob/main/raw-data/all_ned.tif?raw=true")
```

The input data, `flc`

, is a raster of 507 by 1105 cells and eight layers (fractions of different land cover classes). We will resample our raster into a projected CRS and, for a simplicity case and to see our results easier, we will crop it to a 50 by 100 cells area:

```
flc1 = project(flc, "EPSG:3035", method = "near")
flc1 = flc[101:150, 1:100, drop = FALSE]
```

The `flc1`

object represents an area mostly covered by croplands, with some forests in its south-eastern parts, and smaller fractions of grasslands and built-up classes:

```
tm_shape(flc1) +
tm_raster(style = "cont", palette = "cividis", title = "Fraction:")
```

Now, we are able to create supercells using the **supercells** package and its `supercells()`

function.^{3} This function is very flexible and its results can be much customized.^{4} Here, we will just use its basic arguments:

`x`

: our input raster with one or more layers;`flc1`

in our case`step`

: our interval between initial cluster centers; here we use the value of`12`

(cells). Decreasing this value with give us more supercells, and increasing it results in fewer supercells`compactness`

: , the compactness parameter; here we use the value of`0.1`

– the lower the value, the more impact the value distance has on the result`dist_fun`

: distance function used; here we use the Jensen-Shannon distance (`"jensen-shannon"`

), which is suitable for measuring the dissimilarity between histograms

```
flc_sc = supercells(x = flc1,
step = 12, compactness = 0.1,
dist_fun = "jensen-shannon")
```

The `flc_sc`

result is an `sf`

(spatial vector) object with 28 polygons. We can visualize them on top of the two most prominent land cover classes for this area, forest and cropland (Figure 2):

```
tm_shape(flc1[[c(1, 5)]]) +
tm_raster(style = "cont", palette = "cividis", title = "Fraction:") +
tm_shape(flc_sc) +
tm_borders(col = "red")
```

This visual inspection allows us to see that supercells serve their purpose: they delineate areas with homogeneous arrangements of fractional land cover values. Areas with dominating fractions of forests are encapsulated in different polygons compared to there with dominating fractions of croplands, or some mixes of land cover classes. At the same time, some supercells are more homogeneous than others. This is due to: (a) the set interval value (a lower value would result in a large number of more homogeneous supercells), and (b) the fact that supercells are not designed especially for roads (or other linear features) detection.

The quality of our result can also be determined numerically: we can calculate “inhomogeneity” of our supercells. The inhomogeneity metric represents an average distance between cells belonging to the same supercell. This value is small when all cells have similar values (land cover classes’ fractions, in our case), and large when cells’ values are very different.

Inhomogeneity can be calculated using the **regional** package’s function `reg_inhomogeneity()`

. We just need to provide our “regions” (supercells), raster with values, and a distance function. Comparing values of many cells may take a lot of time; thus, usually, it is more efficient to use some subset of them for this comparison. We can specify the subset size with `sample_size`

.

```
vars = c("Forest", "Shrubland", "Grassland", "Bare.Sparse.vegatation",
"Cropland", "Built.up", "Seasonal.inland.water", "Permanent.inland.water")
flc_sc$inh = reg_inhomogeneity(flc_sc[vars], flc1,
dist_fun = "jensen-shannon", sample_size = 100)
```

The resulting inhomogeneity values can also be visualized, signaling the most and the least consistent supercells (Figure 3):

```
tm_shape(flc_sc) +
tm_polygons("inh", title = "Inhomogeneity:", style = "cont") +
tm_layout(legend.outside = TRUE)
```

Additionally, we can calculate an area-weighted inhomogeneity as a general metric of all the supercells:

```
flc_sc$area_km2 = as.numeric(st_area(flc_sc)) / 1000000
weighted.mean(flc_sc$inh, flc_sc$area_km2)
```

`[1] 0.08516878`

Finally, you may notice that several adjacent supercells are very similar, and thus should be merged. Several approaches to merging supercells into larger segments/regions exist. I will discuss them in the next blog post.

We propose the SLIC algorithm extension to work with non-imagery data structures without data reduction and conversion to the false-color image. It allows for using a data distance measure most appropriate to a particular data structure and a custom function for averaging values of clusters centers. If you want to learn more about **supercells**, we encourage you to try a few entry points. One is the Nowosad and Stepinski (2022) article that explains the whole idea in more detail and compares our extension and original SLIC algorithms on three examples of non-imagery data. Code related to these examples is available at https://github.com/Nowosad/supercells-examples. You can also see slides from a talk entitled “A method for universal superpixels-based regionalization” that I gave during the FOSS4G 2022 conference at https://jakubnowosad.com/foss4g-2022/. Finally, the package has extensive documentation, including several vignettes, that can be found at https://jakubnowosad.com/supercells/.

Achanta, R., A. Shaji, K. Smith, A. Lucchi, P. Fua, and Sabine Süsstrunk. 2012. “SLIC Superpixels Compared to State-of-the-Art Superpixel Methods.” *IEEE Transactions on Pattern Analysis and Machine Intelligence* 34 (11): 2274–82. https://doi.org/f39g5f.

Nowosad, Jakub, and Tomasz F. Stepinski. 2022. “Extended SLIC Superpixels Algorithm for Applications to Non-Imagery Geospatial Rasters.” *International Journal of Applied Earth Observation and Geoinformation* 112 (August): 102935. https://doi.org/10.1016/j.jag.2022.102935.

Stutz, David, Alexander Hermans, and Bastian Leibe. 2018. “Superpixels: An Evaluation of the State-of-the-Art.” *Computer Vision and Image Understanding* 166 (January): 1–27. https://doi.org/gcvnsc.

On a side note: homogeneity does not always imply simplicity.↩︎

Let me know (email/twitter) if you have any examples of such data!↩︎

Supercells!↩︎

Read the “The supercells() function” vignette for more details.↩︎

BibTeX citation:

```
@online{nowosad2023,
author = {Nowosad, Jakub},
title = {Supercells: Universal Superpixels Algorithm for Applications
to Geospatial Data},
date = {2023-04-30},
url = {https://jakubnowosad.com/posts/2023-04-30-supercells-bp1/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2023. “Supercells: Universal Superpixels Algorithm
for Applications to Geospatial Data.” April 30, 2023. https://jakubnowosad.com/posts/2023-04-30-supercells-bp1/.

`ent`

). This output is easy to visualize on a map - each polygon can be colored accordingly to its value.
There are, although, two more levels of landscape metrics - the class- and patch-levels. Calculation of class-level metrics returns as many values as unique classes in a local landscape, while the patch-level calculations result in as many values as there are patches^{1} in each polygon. This makes simple visualizations of class- and patch-level metrics not as straightforward as landscape-level metrics. The main goal of this blog post is to present different approaches for visualization of class- and patch-level metrics.

To reproduce the following results on your own computer, install and attach the packages:

```
library(landscapemetrics) # landscape metrics calculation
library(raster) # spatial raster data reading and handling
library(sf) # spatial vector data reading and handling
library(dplyr) # data manipulation
library(tidyr) # data manipulation
library(tmap) # spatial viz
library(geofacet) # geofacet
library(ggplot2) # geofacet
```

The first step is to read the input data. Here, we are going to use example data that is already included in the **landscapemetrics** package (Hesselbarth et al. 2019).

```
data("augusta_nlcd")
my_raster = augusta_nlcd
```

It is also possible to read any spatial raster file with the `raster()`

function, for example `my_raster = raster("path_to_my_file.tif")`

^{2}. However, the input file should fulfill two requirements: (1) contain only integer values that represent categories, and (2) be in a projected coordinate reference system. You can check if your file meets the requirements using the `check_landscape()`

function, and learn more about coordinate reference systems in the Geocomputation with R book (Lovelace, Nowosad, and Muenchow 2019).

Our example data looks like that:

`plot(my_raster)`

The next step is to create borders of local landscapes using the `st_make_grid()`

function. This function accepts an `sf`

object as the first argument, therefore we need to create a new object based on the bounding box of the input raster. Next, we also need to provide a second argument, either `cellsize`

or `n`

:

`cellsize`

- vector of length 1 or 2 - the side length of each grid cell in map units (usually meters)`n`

- vector of length 1 or 2 - the number of grid cells in a row/column

```
my_grid_geom = st_make_grid(st_as_sfc(st_bbox(my_raster)), cellsize = 1500)
my_grid_template = st_sf(geom = my_grid_geom)
```

We should also add a unique identification number (`id`

) to each grid cell (local landscape).

`my_grid_template$plot_id = seq_len(nrow(my_grid_template))`

Next, we can overlay the newly created grid on top of our input raster:

```
plot(my_raster)
plot(st_geometry(my_grid_template), add = TRUE)
```

Note that some cells cover smaller areas with data than the others.

The calculation of landscape metrics for each cell can be done with the `sample_lsm()`

function. It requires an input raster as the first argument, and a grid as the second one^{3}. The function calculates the selected landscape metric independenly for each cell. Next, we can specify which landscape metrics we want to calculate. For this example, we use aggregation index (*ai*) to be calculated on a class level^[The complete list of the implemented metrics can be obtained with the `list_lsm()`

function. Let us know if you are missing some metrics.

```
my_metric1 = sample_lsm(my_raster, my_grid_template,
level = "class", metric = "ai")
my_metric1
```

```
# A tibble: 1,313 × 8
layer level class id metric value plot_id percentage_inside
<int> <chr> <int> <int> <chr> <dbl> <int> <dbl>
1 1 class 11 NA ai 83.3 1 100
2 1 class 21 NA ai 34.5 1 100
3 1 class 22 NA ai 23.0 1 100
4 1 class 31 NA ai 86.8 1 100
5 1 class 41 NA ai 63.9 1 100
6 1 class 42 NA ai 73.2 1 100
7 1 class 43 NA ai 39.5 1 100
8 1 class 52 NA ai 43.1 1 100
9 1 class 71 NA ai 66.9 1 100
10 1 class 81 NA ai 85.8 1 100
# … with 1,303 more rows
```

Each row in the `my_metric`

object corresponds to one calculated value of *ai*, while the `plot_id`

column specifies to which grid cell the results are related ^{4}. Because there are several classes in each cell, there are also several *ai* values for each cell present. Next, we can connect spatial grid (`my_grid_template`

) with the calculation results (`my_metric1`

) using the `left_join()`

function:

`my_grid1 = left_join(my_grid_template, my_metric1, by = "plot_id")`

For the class-level results, each local landscape has as many values as unique classes belong to it. It prevents us from creating a single comprehensive map, but, on the other hand, allows for some other visualizations.

The most basic approach is to create a map just for a selected class, which can be done with the `subset()`

function:

```
my_grid1_class1 = subset(my_grid1, class == 11)
plot(my_grid1_class1["value"])
```

The above plot shows the distribution of *ai* values for class 11.

It is also possible to quickly visualize all of the subsets at the same time with the **tmap** package (Tennekes 2018).

```
tm_shape(my_grid1) +
tm_polygons("value", style = "cont", title = "ai") +
tm_facets(by = "class", free.coords = FALSE)
```

The result here contains a separate panel for each unique class in our dataset, while colors represent different AI values.

As our local landscapes are made of a regular grid, we can also test some less traditional visualizations. One possibility is to use the **geofacet** package (Hafen 2020) - it allows to create many regular plots (such as histograms, scatterplots, boxplots, etc.), but arrange them spatially. Visit the package website at https://github.com/hafen/geofacet to find more examples.

The first step here is to create a plotting grid with `grid_auto()`

:

`grd = grid_auto(my_grid_template, names = "plot_id")`

Next, we can create a plot with the **ggplot2** syntax (Wickham 2016) adding `facet_geo(~plot_id, grid = grd)`

to it:

```
ggplot(my_grid1, aes(as.factor(class), value, fill = as.factor(class))) +
geom_col() +
facet_geo(~plot_id, grid = grd) +
labs(x = NULL, y = "ai", fill = "Class") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
strip.background = element_blank(),
strip.text.x = element_blank())
```

Here, we can see a visualization that consists of many separate bar plots. Each bar plot represents *ai* values for each local landscape.

The calculation of patch-level metrics can also be done with the `sample_lsm()`

function. In this example, we use euclidean nearest-neighbor distance (enn) - a metric that can only be calculated on a patch level.

```
my_metric2 = sample_lsm(my_raster, my_grid_template,
level = "patch", metric = "enn")
my_metric2
```

```
# A tibble: 10,479 × 8
layer level class id metric value plot_id percentage_inside
<int> <chr> <int> <int> <chr> <dbl> <int> <dbl>
1 1 patch 11 1 enn 201. 1 100
2 1 patch 11 2 enn 201. 1 100
3 1 patch 11 3 enn 495. 1 100
4 1 patch 21 4 enn 433. 1 100
5 1 patch 21 5 enn 67.1 1 100
6 1 patch 21 6 enn 67.1 1 100
7 1 patch 21 7 enn 67.1 1 100
8 1 patch 21 8 enn 67.1 1 100
9 1 patch 21 9 enn 60 1 100
10 1 patch 21 10 enn 60 1 100
# … with 10,469 more rows
```

The result contains a large number of rows, where each row is related to a unique combination of a patch, a class, and a grid cell. We can connect spatial grid (`my_grid_template`

) with the calculation results (`my_metric2`

) again using the `left_join()`

function:

`my_grid2 = left_join(my_grid_template, my_metric2, by = "plot_id")`

Visualization of the patch-level results could be the most challenging to create. Here, each local landscape can have as little as one value (a single large patch) and as many values as cells. Patches also can be related to just a few or to many different classes.

The most straightforward approach here is to use the `spatialize_lsm()`

function, which calculates a selected patch-level metric for a given raster, and returns a new raster with the metric values.

`my_metric_r_all = spatialize_lsm(my_raster, level = "patch", metric = "enn")`

The output of `spatialize_lsm()`

is a nested list. Each list element is used to represent different input layers (e.g., land cover data for different years), while each sub-list element relates to the calculated metric (you are able to calculate many metrics at the same time).

`plot(my_metric_r_all$layer_1$lsm_p_enn)`

The `spatialize_lsm()`

function is fine when we need to present patch-level values for a whole raster. However, if we have many local landscapes (or sub-rasters) then we need to repeat the `spatialize_lsm()`

calculations for each area.

This can be done with the help of the following code.

```
patch_raster = function(my_raster, my_grid){
result = vector(mode = "list", length = nrow(my_grid))
for (i in seq_len(nrow(my_grid))){
my_small_raster = crop(my_raster, my_grid[i, ])
result[[i]] = spatialize_lsm(my_small_raster,
level = "patch", metric = "enn")
}
return(result)
}
my_metric_r = patch_raster(my_raster, my_grid_template)
my_metric_r = unlist(my_metric_r)
names(my_metric_r) = NULL
my_metric_r = do.call(merge, my_metric_r)
```

In it, we go through each grid cell, crop a raster to its extent, and calculate our metric of interest. The result is a large list consisting of many separate small rasters, that we can combine to get a full-size raster in return with `do.call()`

and `merge()`

.

```
plot(my_metric_r)
plot(st_geometry(my_grid_template), add = TRUE)
```

The resulting visualization looks much different from the previous one. In it, each local landscape is treated independently, therefore, a patch belonging to many grid cells is split into many patches.

The **landscapemetrics** package, together with many other open-source R packages, allows for spatial visualizations of class- and patch-level metrics. The class-level metrics can be presented independently for each class, or as a *geofacet* plot, while patch-level metrics might be visualized in combination with the `spatialize_lsm()`

function. It is also worth to mention that patch-level metrics can also use the **geofacet** package; however, this could work best when the number of local landscapes is relatively small. Otherwise, the resulting plot could be hard to read. To learn more about landscape metrics and the **landscapemetrics** package, visit https://r-spatialecology.github.io/landscapemetrics/ and http://dx.doi.org/10.1111/ecog.04617.

Many thanks to Maximilian H.K. Hesselbarth for reading and improving a draft of this blog post.

Hafen, Ryan. 2020. *Geofacet: ’Ggplot2’ Faceting Utilities for Geographical Data*. https://CRAN.R-project.org/package=geofacet.

Hesselbarth, Maximilian H. K., Marco Sciaini, Kimberly A. With, Kerstin Wiegand, and Jakub Nowosad. 2019. “Landscapemetrics: An Open-Source r Tool to Calculate Landscape Metrics.” *Ecography* 42: 1648–57.

Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2019. *Geocomputation with R*. CRC Press.

Tennekes, Martijn. 2018. “tmap: Thematic Maps in R.” *Journal of Statistical Software* 84 (6): 1–39. https://doi.org/10.18637/jss.v084.i06.

Wickham, Hadley. 2016. *Ggplot2: Elegant Graphics for Data Analysis*. Springer-Verlag New York. https://ggplot2.tidyverse.org.

A patch is a group of adjacent cells of the same category.↩︎

Currently

**landscapemetrics**accepts also objects from the**terra**and**stars**packages.↩︎This function also allows for many more possibilities, including specifying a 2-column matrix with coordinates, SpatialPoints, SpatialLines, SpatialPolygons, sf points or sf polygons as the second argument. You can learn all of the valid options using

`?sample_lsm`

.↩︎To learn more about the structure of the output read the Efficient landscape metrics calculations for buffers around sampling points blog post.↩︎

BibTeX citation:

```
@online{nowosad2022,
author = {Nowosad, Jakub},
title = {How to Visualize Landscape Metrics for Local Landscapes?},
date = {2022-02-17},
url = {https://jakubnowosad.com/posts/2022-02-17-lsm-bp3/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2022. “How to Visualize Landscape Metrics for
Local Landscapes?” February 17, 2022. https://jakubnowosad.com/posts/2022-02-17-lsm-bp3/.

This is a last blog post in a series about **motif** - an R package aimed for pattern-based spatial analysis. It sums up previous posts, but also underlines potential considerations when working with spatial patterns. Finally, it lists underexplored topics and future ideas related to pattern-based spatial analysis.

The first blog post in this series introduces a basic concept of categorical data spatial patterns, and why commonly used landscape metrics are not best suited for finding areas with similar spatial patterns. A better approach is to derive a spatial signature - a multi-number description that compactly stores information about the composition and configuration of a spatial pattern.

In the second blog post presents some basic spatial signatures, including *coma* (co-occurrence matrix) for single-variable categorical rasters, *wecoma* (weighted co-occurrence matrix) for single-variable categorical rasters that have another continuous raster representing the intensity of categories, and *incoma* (integrated co-occurrence matrix) for categorical rasters with two or more variables. All of the mentioned spatial signatures can be converted into a 1D vector - a probability function, and similarity between probability functions can be calculated using one of many distance measures (e.g., Jenson-Shannon distance). Now, having spatial signatures for two areas, we can find out how similar (or dissimilar) they are. This allows us to find the most similar rasters, describe changes between rasters, or group (cluster) rasters based on the spatial patterns.

The third blog post shows how we can search for areas with similar spatial patterns to a query region based on an example of finding areas of similar topography to the area of Suwalski Landscape Park. In the search process, spatial signatures are derived for the query region and many sub-areas of the search space, and distances between them are calculated. Next, sub-areas with the smallest distances from the query region are assumed to be the most similar to it.

The fourth blog post focused on finding areas with the largest change of land cover patterns in the Amazon between 1992 and 2018. The land cover data from the Amazon in 1992 and 2018 were subdivided into areas of 90 by 90 kilometers, and a spatial signature was calculated for each subarea in each year. Then, a distance between spatial signatures for each subarea was derived, with large distance values indicating a large change of spatial patterns.

The fifth blog post showcases clustering of similar spatial patterns of joint spatial patterns of land cover and landforms in Africa. In this process, Africa was divided into many sub-areas and spatial signatures were derived for each sub-area. Distances between signatures for each sub-area were calculated and stored in a distance matrix, which was used as a basis for the creation of clusters of similar spatial patterns. The quality of clusters was assessed visually using a pattern mosaic and with dedicated quality metrics.

The role of the presented examples is to highlight the universality and extensibility of the pattern-based methods. They could be used in a wide range of local, regional, and global studies of global environmental changes, land management, sustainable development, environmental protection, forest cover change, urban growth monitoring, or agriculture expansion studies. Some example research ideas include:

- studying global environmental changes by analysis of changes in patterns of different environmental features, such as land cover,
- delineating of ecoregions - regionalization of land into homogeneous units of similar ecological and physiographic features (land cover, landform, soils, climate),
- clustering of forest patterns, which results could be used for conservation, planning, and management
- identifying spatial patterns of cropland usage
- inventorying of landscape patterns and analysis of landscape spatial configuration

Additionally, the pattern-based spatial analysis methods and tools could be useful in various other disciplines that use categorical images, for example, medical science, astronomy, or social studies.

However, no matter if we analyze patterns in an environmental raster, demographic map, or categorized microscope image, we need to consider several questions.

How should we preprocess the input data? For example, do we need all 18 categories in our data, or is it better to simplify the number of categories to improve analysis and streamline interpretation of the results? When we are interested in forest fragmentation, do we really need several other land cover classes, or can we merge them into one or two categories? Additionally, preprocessing can be applied to derive new categories from the data. An example of this was shown in the third blog post, where elevation data was first converted into geomorphons before applying any other steps. Reprojecting of the input data may also be important in some cases. In the pattern-based spatial analysis, each cell is treated equally, which means that we usually want to apply data in some equal-area projection.

What is the scale of the process we want to study? Are we interested in investigating patterns in 10 by 10 cell windows or maybe 100 by 100 cell windows? If we do not have any prior information or expectation about spatial scale, then there are two general approaches that could help. Firstly, we could apply the same analysis steps a few times using different sizes of a local window, and decide on a proper spatial scale afterward. Secondly, we could use the smallest meaningful windows we can think of^{1}, for example, 10 by 10 cells, and then apply the clustering process. After merging similar areas into larger regions, we can decide the spatial scale of homogeneous spatial patterns.

Which signature should we apply? The *coma* representation was developed for single-variable categorical rasters, *wecoma* for single-variable categorical rasters that have another continuous raster representing the intensity of categories, and *incoma* for categorical rasters with two or more variables. Which of the above representation suits your problem the best? Or maybe you need to create some new signature focused on the specifics of your case?

Which distance measure should we use? A few dozen of distance/dissimilarity measures exist^{2}. Our previous experiences showed that the Jensen-Shannon distance is suitable to describe relations between spatial patterns of land cover data. However, there is no free lunch in selecting a distance measure, and I would usually recommend trying out a few measures before deciding on one of them.

There are also general considerations that would gain from establishing a consistent methodology. For example, how to decide which scale is valid? What type of signatures are still missing and should be developed? How to integrate categorical and continuous spatial patterns in an analysis? What are the advantages and disadvantages of using different distance measures? What are the missing workflows that can be added to the pattern-based spatial analysis?

I encourage everyone to submit their issues or enhancement requests to the **motif** package, which will help me to prioritize my work. Furthermore, if you have any questions or ideas related to the pattern-based spatial analysis, please email me at nowosad.jakub@gmail.com.

This depends on the number of categories, their spatial arrangements, etc.↩︎

Read https://users.uom.gr/~kouiruki/sung.pdf for a comprehensive review of distance measures.↩︎

BibTeX citation:

```
@online{nowosad2021,
author = {Nowosad, Jakub},
title = {Considerations for the Pattern-Based Spatial Analysis},
date = {2021-03-10},
url = {https://jakubnowosad.com/posts/2021-03-10-motif-bp6/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2021. “Considerations for the Pattern-Based
Spatial Analysis.” March 10, 2021. https://jakubnowosad.com/posts/2021-03-10-motif-bp6/.

Clustering similar spatial patterns requires one or more raster datasets for the same area. Input data is divided into many sub-areas, and spatial signatures are derived for each sub-area. Next, distances between signatures for each sub-area are calculated and stored in a distance matrix. The distance matrix can be used to create clusters of similar spatial patterns. Quality of clusters can be assessed visually using a pattern mosaic or with dedicated quality metrics.

To reproduce the calculations in the following post, you need to download all of relevant datasets using the code below:

```
library(osfr)
dir.create("data")
osf_retrieve_node("xykzv") %>%
osf_ls_files(n_max = Inf) %>%
osf_download(path = "data",
conflicts = "overwrite")
```

You should also attach the following packages:

```
library(sf)
library(stars)
library(motif)
library(tmap)
library(dplyr)
library(readr)
```

The `data/land_cover.tif`

contains land cover data and `data/landform.tif`

is landform data for Africa. Both are single categorical rasters of the same extent and the same resolution (300 meters) that can be read into R using the `read_stars()`

function.

```
lc = read_stars("data/land_cover.tif")
lf = read_stars("data/landform.tif")
```

Additionally, the `data/lc_palette.csv`

file contains information about colors and labels of each land cover category, and `data/lf_palette.csv`

stores information about colors and labels of each landform class.

```
lc_palette_df = read_csv("data/lc_palette.csv")
lf_palette_df = read_csv("data/lf_palette.csv")
names(lc_palette_df$color) = lc_palette_df$value
names(lf_palette_df$color) = lf_palette_df$value
```

Both datasets can be visualized with **tmap**.

```
tm_lc = tm_shape(lc) +
tm_raster(style = "cat",
palette = lc_palette_df$color,
labels = lc_palette_df$label,
title = "Land cover:") +
tm_layout(legend.position = c("LEFT", "BOTTOM"))
tm_lc
```

```
tm_lf = tm_shape(lf) +
tm_raster(style = "cat",
palette = lf_palette_df$color,
labels = lf_palette_df$label,
title = "Landform:") +
tm_layout(legend.outside = TRUE)
tm_lf
```

We can combine these two datasets together with the `c()`

function.

`eco_data = c(lc, lf)`

The problem now is how to find clusters of similar spatial patterns of both land cover categories and landform classes.

The basic step in clustering spatial patterns is to calculate a proper signature for each spatial window using the `lsp_signature()`

function. Here, we use the *in*tegrated *co*-occurrence *ve*ctor (`type = "cove"`

) representation. In this example, we use a window of 300 cells by 300 cells (`window = 300`

). This means that our search scale will be 90 km (300 cells x data resolution) - resulting in dividing the whole area into about 7,500 regular rectangles of 90 by 90 kilometers.

This operation could take a few minutes.

```
eco_signature = lsp_signature(eco_data,
type = "incove",
window = 300)
```

The output, `eco_signature`

contains numerical representation for each 90 by 90 km area. Notice that it has 3,838 rows (not 7,500) - this is due to removing areas with a large number of missing values before calculations^{1}.

Next, we can calculate the distance (dissimilarity) between patterns of each area. This can be done with the `lsp_to_dist()`

function, where we must provide the output of `lsp_signature()`

and a distance measure used (`dist_fun = "jensen-shannon"`

). This operation also could take a few minutes.

`eco_dist = lsp_to_dist(eco_signature, dist_fun = "jensen-shannon")`

The output, `eco_dist`

, is of a `dist`

class, where small values show that two areas have a similar joint spatial pattern of land cover categories and landform classes.

`class(eco_dist)`

`[1] "dist"`

Objects of class `dist`

can be used by many existing R functions for clustering. It includes different approaches of hierarchical clustering (`hclust()`

, `cluster::agnes()`

, `cluster::diana()`

) or fuzzy clustering (`cluster::fanny()`

). In the below example, we use hierarchical clustering using `hclust()`

, which expects a distance matrix as the first argument and a linkage method as the second one. Here, we use the Ward’s minimum variance method (`method = "ward.D2"`

) that minimizes the total within-cluster variance.

```
eco_hclust = hclust(eco_dist, method = "ward.D2")
plot(eco_hclust)
```

Graphical representation of the hierarchical clustering is called a dendrogram, and based on the obtained dendrogram, we can divide our local landscapes into a specified number of groups using `cutree()`

. In this example, we use eight classes (`k = 8`

) to create a fairly small number of clusters to showcase the presented methodology.

`clusters = cutree(eco_hclust, k = 8)`

However, a decision about the number of clusters in real-life cases should be based on the goal of the research.

The `lsp_add_clusters`

function adds: a column `clust`

with a cluster number to each area, and converts the result to an `sf`

object.

```
eco_grid_sf = lsp_add_clusters(eco_signature,
clusters)
```

The clustering results can be further visualized using **tmap**.

```
tm_clu = tm_shape(eco_grid_sf) +
tm_polygons("clust", style = "cat", palette = "Set2", title = "Cluster:") +
tm_layout(legend.position = c("LEFT", "BOTTOM"))
tm_clu
```

Most clusters form continuous regions, so we could merge areas of the same clusters into larger polygons.

```
eco_grid_sf2 = eco_grid_sf %>%
dplyr::group_by(clust) %>%
dplyr::summarize()
```

The output polygons can then be superimposed on maps of land cover categories and landform classes.

```
tm_shape(eco_data) +
tm_raster(style = "cat",
palette = list(lc_palette_df$color, lf_palette_df$color)) +
tm_facets(ncol = 2) +
tm_shape(eco_grid_sf2) +
tm_borders(col = "black") +
tm_layout(legend.show = FALSE,
title.position = c("LEFT", "TOP"))
```

We can see that many borders (black lines) contain areas with both land cover or landform patterns distinct from their neighbors. Some clusters are also only distinct for one variable (e.g., look at Sahara on the land cover map).

We can also calculate the quality of the clusters with the `lsp_add_quality()`

function. It requires an output of `lsp_add_clusters()`

and an output of `lsp_to_dist()`

, and adds three new variables: `inhomogeneity`

, `distinction`

, and `quality`

.

`eco_grid_sfq = lsp_add_quality(eco_grid_sf, eco_dist, type = "cluster")`

Inhomogeneity (`inhomogeneity`

) measures a degree of mutual distance between all objects in a cluster. This value is between 0 and 1, where the small value indicates that all objects in the cluster represent consistent patterns, so the cluster is pattern-homogeneous. Distinction (`distinction`

) is an average distance between the focus cluster and all the other clusters. This value is between 0 and 1, where the large value indicates that the cluster stands out from the rest of the clusters. Overall quality (`quality`

) is calculated as `1 - (inhomogeneity / distinction)`

. This value is also between 0 and 1, where increased values indicate better quality of clustering.

We can create a summary of each clusters’ quality using the code below.

```
eco_grid_sfq2 = eco_grid_sfq %>%
group_by(clust) %>%
summarise(inhomogeneity = mean(inhomogeneity),
distinction = mean(distinction),
quality = mean(quality))
```

clust | inhomogeneity | distinction | quality |
---|---|---|---|

1 | 0.5064706 | 0.7724361 | 0.3443204 |

2 | 0.4038704 | 0.7023297 | 0.4249561 |

3 | 0.3377875 | 0.7065250 | 0.5219029 |

4 | 0.1161293 | 0.7921515 | 0.8534002 |

5 | 0.3043422 | 0.7366735 | 0.5868696 |

6 | 0.2774136 | 0.6849140 | 0.5949657 |

7 | 0.2926504 | 0.7149212 | 0.5906537 |

8 | 0.3486704 | 0.7579511 | 0.5399830 |

The created clusters show a different degree of quality metrics. The fourth cluster has the lowest inhomogeneity and the largest distinction, and therefore the best quality. The first cluster has the most inhomogeneous patterns, and while its distinction from other clusters is relatively large, its overall quality is the worst.

```
tm_inh = tm_shape(eco_grid_sfq2) +
tm_polygons("inhomogeneity", style = "cont", palette = "magma")
tm_iso = tm_shape(eco_grid_sfq2) +
tm_polygons("distinction", style = "cont", palette = "-inferno")
tm_qua = tm_shape(eco_grid_sfq2) +
tm_polygons("quality", style = "cont", palette = "Greens")
tm_cluster3 = tmap_arrange(tm_clu, tm_qua, tm_inh, tm_iso, ncol = 2)
tm_cluster3
```

Inhomogeneity can also be assessed visually with a pattern mosaic. Pattern mosaic is an artificial rearrangement of a subset of randomly selected areas belonging to a given cluster.

Using the code below, we randomly selected 100 areas for each cluster. It could take a few minutes.

```
eco_grid_sample = eco_grid_sf %>%
filter(na_prop == 0) %>%
group_by(clust) %>%
slice_sample(n = 100)
```

Next, we can extract a raster for each selected area with the `lsp_add_examples()`

function.

`eco_grid_examples = lsp_add_examples(eco_grid_sample, eco_data)`

Finally, we can use the `lsp_mosaic()`

function, which creates raster mosaics by rearranging spatial data for sample areas. Note that this function is still experimental and can change in the future.

`eco_mosaic = lsp_mosaic(eco_grid_examples)`

The output is a `stars`

object with the third dimension (`clust`

) representing clusters, from which we can use `slice()`

to extract a raster mosaic for a selected cluster. For example, the raster mosaic for fourth cluster looks like this:

```
eco_mosaic_c4 = slice(eco_mosaic, clust, 4)
tm_shape(eco_mosaic_c4) +
tm_raster(style = "cat",
palette = list(lc_palette_df$color, lf_palette_df$color)) +
tm_facets(ncol = 2) +
tm_layout(legend.show = FALSE)
```

We can see that the land cover patterns for this cluster are very simple and homogeneous. The landform patterns are slightly more complex and less homogeneous.

And the raster mosaic for first cluster is:

```
eco_mosaic_c1 = slice(eco_mosaic, clust, 1)
tm_shape(eco_mosaic_c1) +
tm_raster(style = "cat",
palette = list(lc_palette_df$color, lf_palette_df$color)) +
tm_facets(ncol = 2) +
tm_layout(legend.show = FALSE)
```

Patterns of both variables in this cluster are more complex and heterogeneous. This result could suggest that additional clusters could be necessary to distinguish some spatial patterns.

The pattern-based clustering allows for grouping areas with similar spatial patterns. The above example shows the search based on two-variable raster data (land cover and landform), but by using a different spatial signature, it can be performed on a single variable raster as well. R code for the pattern-based clustering can be found here, with other examples described in the Spatial patterns’ clustering vignette.

See the

`threshold`

argument for more details.↩︎

BibTeX citation:

```
@online{nowosad2021,
author = {Nowosad, Jakub},
title = {Clustering Similar Spatial Patterns},
date = {2021-03-03},
url = {https://jakubnowosad.com/posts/2021-03-03-motif-bp5/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2021. “Clustering Similar Spatial
Patterns.” March 3, 2021. https://jakubnowosad.com/posts/2021-03-03-motif-bp5/.

Quantifying changes of spatial patterns requires two datasets for the same variable in the same area. Both datasets are divided into many sub-areas, and spatial signatures are derived for each sub-area for each dataset. Next, distances for each pair of areas are calculated. Sub-areas with the largest distances represent the largest change.

To reproduce the calculations in the following post, you need to download all of relevant datasets using the code below:

```
library(osfr)
dir.create("data")
osf_retrieve_node("xykzv") %>%
osf_ls_files(n_max = Inf) %>%
osf_download(path = "data",
conflicts = "overwrite")
```

You should also attach the following packages:

```
library(stars)
library(motif)
library(tmap)
library(readr)
```

A standard approach for detecting changes between two rasters is to calculate a change for each cell independently. This allows quantifying how cells changed their values for A to B, and how many from B to A. However, this approach does not tell us if the spatial pattern actually had changed or stayed the same. For example, consider a regular checkerboard and a checkerboard with all colors reversed. While every cell changed its value, we still have the classes, and their spatial arrangement is the same.

Here, we are interested in changes of spatial patterns, therefore, instead of looking at pixel-by-pixel change, we focus on pattern-by-pattern change ^{1}.

The `data/lc_am_1992.tif`

contains land cover data for the year 1992 and `data/lc_am_2018.tif`

for the year 2018. Both are single categorical rasters of the same extent, the Amazon, and the same resolution - 300 meters that can be read into R using the `read_stars()`

function.

```
lc92 = read_stars("data/lc_am_1992.tif")
lc18 = read_stars("data/lc_am_2018.tif")
```

Additionally, the `data/lc_palette.csv`

file contains information about the colors and labels of each land cover category.

```
lc_palette_df = read_csv("data/lc_palette.csv")
names(lc_palette_df$color) = lc_palette_df$value
```

Both land cover dataset can be visualized with **tmap**. The `lc_palette_df`

is used to set a color palette and legend’s labels.

```
tm_compare1 = tm_shape(c(lc92, lc18)) +
tm_raster(style = "cat",
palette = lc_palette_df$color,
labels = lc_palette_df$label,
title = "Land cover:") +
tm_layout(legend.outside = TRUE,
panel.labels = c(1992, 2018))
tm_compare1
```

The above map clearly shows that there has been a large land cover change in many areas of Amazon between 1992 and 2018. The problem now is to find out what areas changed the most.

This could be solved with `lsp_compare()`

. The `lsp_compare()`

function expects two `stars`

objects with the same extent and resolution. We also need to specify the spatial scale of comparison (`window`

), signature (`type`

), and distance method (`dist_fun`

)^{2}.

In this example, we use a window of 300 cells by 300 cells (`window = 300`

). This means that our search scale will be 90 km (300 cells x data resolution) - resulting in dividing the whole area into about 1,500 regular rectangles of 90 by 90 kilometers. We also use the `"cove"`

signature and the `"jensen-shannon"`

distance here.

```
lc_am_compare = lsp_compare(lc92, lc18,
window = 300,
type = "cove",
dist_fun = "jensen-shannon")
```

By default, the output is a `stars`

object with four attributes: (1) `id`

- an id of each window, (2) `na_prop_x`

- share between 0 and 1 of NA cells for each window in the first `stars`

object, (3) `na_prop_y`

- share between 0 and 1 of NA cells for each window in the second `stars`

object, (4) `dist`

- derived distance between the pattern in the first object and the second object for each window.

`lc_am_compare`

```
stars object with 2 dimensions and 4 attributes
attribute(s):
Min. 1st Qu. Median Mean 3rd Qu.
id 1 3.637500e+02 7.265000e+02 726.50000000 1.089250e+03
na_prop_x 0 0.000000e+00 0.000000e+00 0.02093060 0.000000e+00
na_prop_y 0 0.000000e+00 0.000000e+00 0.02112938 0.000000e+00
dist 0 1.504208e-03 3.550033e-03 0.01713996 1.100637e-02
Max. NA's
id 1452.0000000 0
na_prop_x 0.4933778 620
na_prop_y 0.4933778 620
dist 0.2469178 620
dimension(s):
from to offset delta refsys point values x/y
x 1 44 -8834600 90000 Interrupted_Goode_Homolosine NA NULL [x]
y 1 33 964250 -90000 Interrupted_Goode_Homolosine NA NULL [y]
```

We can visualize the result the same as a regular `stars`

object, for example using the **tmap** package:

```
tm_compare2 = tm_shape(lc_am_compare) +
tm_raster("dist",
palette = "viridis",
style = "cont",
title = "Distance (JSD):") +
tm_layout(legend.outside = TRUE)
tm_compare2
```

The yellow color represents areas of the largest change. They are mostly located in the south and south-east part of the Amazon.

A comparison result can also be easily converted into an `sf`

object with `st_as_sf()`

for subsetting and analyzing the outcomes.

`lc_am_compare_sf = st_as_sf(lc_am_compare)`

In the previous blog post, we were interested in finding the most similar areas to the query region - smallest distance. Here, we are looking for the areas with the largest change, which is expressed by the largest `dist`

values.

We can use `slice_max()`

to subset the obtained result to a selected number of areas with the largest change between 1992 and 2018. The code below selects nine areas with the largest distance between the spatial pattern in 1992 and 2018.

```
library(dplyr)
lc_am_compare_sel = slice_max(lc_am_compare_sf, dist, n = 9)
```

If we want to look closer at the result, then we can extract each of the above regions with the `lsp_add_examples()`

function. It adds a `region`

column with a `stars`

object to each row.

`lc_am_compare_ex = lsp_add_examples(x = lc_am_compare_sel, y = c(lc92, lc18))`

It allows us to visualize area with the largest change:

```
tm_shape(lc_am_compare_ex$region[[1]]) +
tm_raster(style = "cat",
palette = lc_palette_df$color,
labels = lc_palette_df$label,
title = "Land cover:") +
tm_layout(legend.show = FALSE,
panel.labels = c(1992, 2018))
```

Here, we can see an area mostly covered by forest in 1992, which large parts were transformed into agriculture before 2018.

This approach can also be extended to plot all nine areas. We just need to create a visualization function (`create_map2()`

) and use it iteratively on each region in `lc_am_compare_ex`

. The output of this process, `map_list`

, is a list of `tmap`

s that can be plotted with `tmap_arrange()`

:

```
library(purrr)
create_map2 = function(x){
tm_shape(x) +
tm_raster(style = "cat",
palette = lc_palette_df$color,
labels = lc_palette_df$label,
title = "Land cover:") +
tm_facets(ncol = 2) +
tm_layout(legend.show = FALSE,
panel.labels = c(1992, 2018))
}
map_list = map(lc_am_compare_ex$region, create_map2)
tmap_arrange(map_list)
```

It shows that majority of changes in the Amazon are related to the forest being removed for agricultural purposes.

The pattern-based comparison allows for finding areas with the largest change in spatial patterns. The above example shows the search based on a single variable raster data (land cover), but by using a different spatial signature, it can be performed on rasters with two or more variables (think of multi-variable change). R code for the pattern-based comparison can be found here, with other examples described in the Spatial patterns’ comparision vignette.

For a more detailed explanation of spatial patterns’ changes, visit my older blog post.↩︎

If you want more explanation about these arguments, please read the previous posts in this series.↩︎

BibTeX citation:

```
@online{nowosad2021,
author = {Nowosad, Jakub},
title = {Quantifying Changes of Spatial Patterns},
date = {2021-02-24},
url = {https://jakubnowosad.com/posts/2021-02-24-motif-bp4/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2021. “Quantifying Changes of Spatial
Patterns.” February 24, 2021. https://jakubnowosad.com/posts/2021-02-24-motif-bp4/.

Finding similar spatial patterns requires data for a query region and a search space. Spatial signatures are derived for the query region and many sub-areas of the search space, and distances between them are calculated. Sub-areas with the smallest distances from the query region are the most similar to it.

To reproduce the calculations in the following post, you need to download all of relevant datasets using the code below:

```
library(osfr)
dir.create("data")
osf_retrieve_node("xykzv") %>%
osf_ls_files(n_max = Inf) %>%
osf_download(path = "data",
conflicts = "overwrite")
```

You should also attach the following packages:

```
library(sf)
library(stars)
library(tmap)
```

Spatial pattern search allows for quantifying similarity between the query region and the search space and finally finding regions that are the most similar to the query one. Here, we were interested in finding areas of similar topography to the area of Suwalski Landscape Park. Suwalski Landscape Park is a protected area in north-eastern Poland with a post-glacial landscape consisting of young morainic hills.

One possible approach to classify the topography of a given region is to use geomorphons. Geomorphons categorize cells in this area into one of ten forms: flat, summit, ridge, shoulder, spur, slope, hollow, footslope, valley, and depression ^{1}.

The `"data/geomorphons_pol.tif"`

file contains a raster with geomorphons calculated for Poland’s area, while `"data/suw_lp.gpkg"`

is a vector polygon with the Suwalski Landscape Park borders. Let’s start by reading these two files into R.

```
gm = read_stars("data/geomorphons_pol.tif")
suw_lp = read_sf("data/suw_lp.gpkg")
```

Now, we can visualize geomorphons and the location of Suwalski Landscape Park with the **tmap**.

```
tm_gm = tm_shape(gm) +
tm_raster(title = "Geomorphons:") +
tm_shape(suw_lp) +
tm_symbols(col = "black", shape = 6) +
tm_layout(legend.outside = TRUE, frame = FALSE)
tm_gm
```

The geomorphon data for Poland is our search space. Now, we also need a second raster object with a query region. The query region is an area to which we want to find other similar areas.

There are two main ways to create a query region:

- By cropping spatial data of a large area to the extent or borders of a query region.
- By reading an external file. In the second case, the values in the external file should match the values in the search space.

Here, we are using the former approach by reading the Suwalski Landscape Park borders and then using it to crop the whole-country raster.

```
suw_lp = read_sf("data/suw_lp.gpkg")
gm_suw = st_crop(gm, suw_lp)
```

The query area has irregular spatial patterns represented by slopes and a limited number of flat areas.

```
tm_gm_suw = tm_shape(gm_suw) +
tm_raster() +
tm_shape(suw_lp) +
tm_borders(col = "black") +
tm_layout(legend.show = FALSE, frame = FALSE)
tm_gm_suw
```

The searching process consists of:

- Selecting a query region and a search space. In our case, the query region is
`gm_sum`

, while the search space is the`gm`

object. - Dividing the search space using regular (non-overlapping) squares or using polygons.
- Creating numerical representation (called a signature) for the query region and the search space.
- Comparing the signature of the query region with signatures for each part of the search space using a distance measure.

The first important consideration is the search scale - what is the size of areas we want to find? This is not an easy question and largely depends on the research problem. The current version of **motif** accepts either regular (non-overlapping) squares or polygons.

The second consideration is the search signature. We are able to describe the above area in words, however, how to translate the spatial pattern properties to a computer and, at the same time, made them more objective? Again, it is a complex question, and largely depends on the type of input data.

In our case, we have a single categorical raster, and for this type of data, we found out that the *cove* signature works well. *Cove* stands for *c*o-*o*ccurrence *ve*ctor - it is a 1D vector where each value represents what is the share of one category is adjacent to some other cells^{2}. More information about *cove* can be found in the previous blog post.

We can calculate *cove* for our query region using `lsp_signature()`

with the `type`

argument set to `"cove"`

.

```
library(motif)
gm_sum_sig = lsp_signature(gm_suw, type = "cove")
gm_sum_sig
```

```
# A tibble: 1 × 3
id na_prop signature
* <int> <dbl> <list>
1 1 0.381 <dbl [1 × 100]>
```

The output object contains a `signature`

column, which stores the *cove* signature for our region. We can see this signature with `gm_sum_sig$signature[[1]]`

.

The third consideration is a distance measure. Many distance measures have been developed for different types of data, and each of them has different properties. The **motif** package allows using any distance measure implemented in the **philentropy** package, which includes more than 40 different measures^{3}.

The `lsp_search()`

function performs spatial pattern-based search. It expects two `stars`

objects: a query region (`gm_suw`

) and a search space (`gm`

). Next, we need to specify the search scale (`window`

), signature (`type`

), and distance method (`dist_fun`

).

In this example, we use a window of 100 cells by 100 cells (`window = 100`

). This means that our search scale will be 2500 meters (100 cells x data resolution) - resulting in dividing the search space into about 70,000 regular rectangles of 2500 by 2500 meters. We also use the `"cove"`

signature and the `"jensen-shannon"`

distance here.

```
gm_search = lsp_search(gm_suw, gm,
window = 100,
type = "cove",
dist_fun = "jensen-shannon")
```

The above calculation could take several minutes on a modern computer.

By default, the output of the search is a `stars`

object with three attributes:

`id`

- an id of each window,`na_prop`

- share between 0 and 1 of NA cells for each window in the search space,`dist`

- derived distance between the query region and each window in the search space.

`gm_search`

```
stars object with 2 dimensions and 3 attributes
attribute(s):
Min. 1st Qu. Median Mean 3rd Qu.
id 1.000000000 1.764075e+04 3.528050e+04 3.528050e+04 5.292025e+04
na_prop 0.000000000 0.000000e+00 0.000000e+00 3.111193e-03 0.000000e+00
dist 0.001737701 4.483980e-02 1.051218e-01 1.461936e-01 2.080158e-01
Max. NA's
id 7.056000e+04 0
na_prop 4.997000e-01 20485
dist 6.451746e-01 20485
dimension(s):
from to offset delta refsys point values x/y
x 1 288 4595300 2500 +proj=laea +lat_0=52 +lon... NA NULL [x]
y 1 245 3556600 -2500 +proj=laea +lat_0=52 +lon... NA NULL [y]
```

We can visualize the result in the same fashion as a regular `stars`

object (see the final map at the end of the post):

```
tm_search2 = tm_shape(gm_search) +
tm_raster("dist",
style = "log10",
palette = "BrBG",
title = "Distance (JSD):",
legend.is.portrait = FALSE)
```

A search result can also be easily converted into an `sf`

object with `st_as_sf()`

. This allows for straightforward analysis and subsetting of the search results.

`gm_search_sf = st_as_sf(gm_search)`

Spatial pattern-based search is similar to a search using internet search engines - we do not care about the most dissimilar areas. We just want to locate the ones most similar to the query region. Therefore, we should select only areas with the smallest distance values - this means that they are the most similar to the query region.

We can achieve it, for example, using the `slide_min()`

function. The code below selects nine areas with the smallest distance from the query region.

```
library(dplyr)
gm_search_sel = slice_min(gm_search_sf, dist, n = 9)
```

If we want to look closer at the result, then we can extract each of the above regions with the `lsp_add_examples()`

function. It adds a `region`

column with a `stars`

object to each row.

`gm_search_ex = lsp_add_examples(x = gm_search_sel, y = gm)`

It allows us to visualize any of the most similar areas.

```
tm_shape(gm_search_ex$region[[1]]) +
tm_raster() +
tm_layout(legend.show = FALSE)
```

This approach can also be extended to plot all nine of the most similar areas. We just need to create a visualization function (`create_map()`

) and use it iteratively on each region in `gm_search_ex`

. The output of this process, `map_list`

, is a list of `tmap`

s that can be plotted with `tmap_arrange()`

:

```
library(purrr)
create_map = function(x, y){
tm_shape(x) +
tm_raster() +
tm_layout(legend.show = FALSE,
title = y)
}
map_list = map2(gm_search_ex$region, gm_search_ex$id, create_map)
tmap_arrange(map_list)
```

Nine examples of the areas with the most similar patterns of geomorphons comparing to the Suwalski Landscape Park are presented above. They are also similar to each other, suggesting a high-quality result.

The final map consists of two parts: (a) a distance raster and (b) symbols representing the nine most similar areas.

```
tm_search2 +
tm_shape(gm_search_sel) +
tm_symbols(shape = 2, col = "black") +
tm_text("id", auto.placement = TRUE)
```

The brown color on the above map represents areas with the most similar patterns of geomorphons to the Suwalski Landscape Park. The majority of similar areas are located in northern Poland and forms a belt with homogeneous topography.

The pattern-based search allows for finding areas with similar spatial patterns. The above example shows the search based on a single variable raster data (geomorphons), but by using a different spatial signature, it can be performed on rasters with two or more variables. Additionally, search space can be not only divided into regular areas, but also in irregular ones - see an example. R code for the pattern-based search can be found here.

Learn more about geomorphons by reading the dedicated paper or its preprint. You can also calculate geomorphons for your own data using a GRASS GIS module r.geomorphon.↩︎

In other words, it is a vector containing a normalized form of the co-occurrence matrix.↩︎

You can check all of them using

`philentropy::getDistMethods()`

.↩︎

BibTeX citation:

```
@online{nowosad2021,
author = {Nowosad, Jakub},
title = {Finding Similar Spatial Patterns},
date = {2021-02-17},
url = {https://jakubnowosad.com/posts/2021-02-17-motif-bp3/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2021. “Finding Similar Spatial Patterns.”
February 17, 2021. https://jakubnowosad.com/posts/2021-02-17-motif-bp3/.

Spatial signatures are multi-value representations of the patterns that compress information about spatial composition and configuration. Spatial signatures can be directly compared using various distance measures.

A categorical raster shown below represents land cover data for some area. This area is mainly covered by forest, with some small patches of agriculture, grasslands, and water.

If we want to describe this area, we could start by measuring areas of different land cover categories. Then, we could know that forest cover about 0.986% and agriculture cover about 0.013%. We could also use landscape metrics to put a number on some property of this raster. Then, we would know that the entropy is 0.116, and relative mutual information is 0.331^{1}.

This approach can be applied to many categorical rasters, as you can see below.

id | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |

ent | 0.12 | 0.45 | 0.53 | 0.63 | 0.75 | 1.16 | 1.16 | 1.28 | 1.53 | 1.2 | 1.65 | 1.6 | 1.74 | 1.72 | 1.6 | 2.02 |

relmutinf | 0.33 | 0.39 | 0.34 | 0.52 | 0.44 | 0.51 | 0.39 | 0.33 | 0.42 | 0.36 | 0.5 | 0.58 | 0.43 | 0.34 | 0.2 | 0.38 |

forest | 0.99 | 0.93 | 0.9 | 0.84 | 0.83 | 0.76 | 0.69 | 0.68 | 0.59 | 0.57 | 0.53 | 0.5 | 0.44 | 0.4 | 0.39 | 0.36 |

agriculture | 0.01 | 0.05 | 0.08 | 0.16 | 0.16 | 0.12 | 0.25 | 0.23 | 0.24 | 0.39 | 0.3 | 0.36 | 0.36 | 0.39 | 0.36 | 0.3 |

Now, each rasters’ spatial properties are expressed by a vector of numbers representing its categories and selected landscape metrics.

As I mentioned in my previous blog posts, we could represent categorical rasters with a large number of landscape metrics. However, many landscape metrics are highly correlated, and some of them depend on the resolution of the input data and the size of the study area.

An alternative approach is to derive a multi-value representation of the raster that compress information about its spatial composition and configuration. One of such representations is a co-occurrence matrix (*coma*).

The *coma* representation is calculated by moving through each cell, looking at its value, and counting how many neighbors of each class our central cell has. For example, the co-occurrence matrix below shows that the forest category cells are 38,778 times adjacent to other cells of this category, 218 times to the cells of the agriculture category, and four times to the cells of the grassland category, and so on.

agriculture | forest | grassland | water | |
---|---|---|---|---|

agriculture | 272 | 218 | 4 | 0 |

forest | 218 | 38778 | 32 | 12 |

grassland | 4 | 32 | 16 | 0 |

water | 0 | 12 | 0 | 2 |

Importantly, this signature contains information about the categories and their shares (composition), and also the spatial relation between categories (configuration).

The co-occurrence matrix (*coma*) representation is two-dimensional, with values of categories in row and columns. It can be converted into a one-dimensional representation called a co-occurrence vector (*cove*).

272 | 218 | 4 | 0 | 218 | 38778 | 32 | 12 | 4 | 32 | 16 | 0 | 0 | 12 | 0 | 2 |

As you can see, some elements of this vector represent the same relations. For example, the first value of `4`

shows the relation between grassland and agriculture, and the second value of `4`

represents the relation between agriculture and grassland. We can simplify the above vector by counting all relations only once^{2}:

136 | 218 | 19389 | 4 | 32 | 8 | 0 | 12 | 0 | 1 |

This vector can be further transformed to have its values to sum up to one. The output vector is called the normalized co-occurrence vector.

0.0069 | 0.011 | 0.9792 | 0.0002 | 0.0016 | 0.0004 | 0 | 0.0006 | 0 | 0.0001 |

The role of normalization is to create a probability function, and thus be able to compare categorical rasters of different sizes using mathematical distance measures.

Let’s consider two rasters below. We want to know how similar they are to each other.

To answer this question, we need to perform three steps:

- calculate a normalized co-occurrence vector for the first raster,
- calculate a normalized co-occurrence vector for the second raster,
- calculate a numerical distance between these two signatures.

Normalized co-occurrence vector for the first raster is:

0.0069 | 0.011 | 0.9792 | 0.0002 | 0.0016 | 0.0004 | 0 | 0.0006 | 0 | 0.0001 |

Normalized co-occurrence vector for the second raster is:

0.1282 | 0.0609 | 0.8105 | 0.0002 | 0.0002 | 0.0001 | 0 | 0 | 0 | 0 |

A large number of possible distance measures between probability functions exists^{3}. In this example, we use the Jenson-Shannon distance.

`$$ JSD(A, B) = H(\frac{A + B}{2}) - \frac{1}{2}[H(A) + H(B)] $$`

It takes two probability functions (spatial signatures in our case) A and B, and calculates entropy values (H). The Jenson-Shannon distance is a value between 0 and 1, where 0 means that two probability functions are identical, and 1 means that they have nothing in common.

Jensen-Shannon distance between our two rasters is 0.068, suggesting that their spatial composition and configuration are fairly similar, but not identical. Now, let’s consider two rasters that are visually very different. One is covered mostly by forest, while the second one is mostly a mosaic of forests, agricultural areas, and grasslands.

Normalized co-occurrence vector for the first raster is:

0.0069 | 0.011 | 0.9792 | 0.0002 | 0.0016 | 0.0004 | 0 | 0 | 0 | 0 | 0 | 0.0006 | 0 | 0 | 0.0001 |

Normalized co-occurrence vector for the second raster is:

0.2033 | 0.1335 | 0.2944 | 0.1747 | 0.0562 | 0.1307 | 0.0035 | 0.0002 | 0.0004 | 0.0015 | 0.0007 | 0.0005 | 0 | 0 | 0.0005 |

The **Jensen-Shannon distance** between this pair of rasters is 0.444, indicating that two rasters are fairly different^{4}.

Calculating spatial signatures for many areas allows us to find the most similar rasters, describe changes between rasters, or group (cluster) rasters with similar spatial patterns.

The co-occurrence matrix (*coma*) is suitable to represent pattern of a single categorical variable. There are, however, other spatial signatures aimed to describe spatial patterns of multi-variable cases.

Let’s consider a situation, in which we have two rasters: one with categories, and one with weights. So, now we have not only a category of each cell, but also its intensity.

Regular co-occurrence matrix (*coma*) based just on a categorical raster looks the following:

1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|

1 | 0 | 4 | 5 | 0 | 1 |

2 | 4 | 1652 | 493 | 86 | 316 |

3 | 5 | 493 | 1148 | 38 | 509 |

4 | 0 | 86 | 38 | 6 | 14 |

5 | 1 | 316 | 509 | 14 | 818 |

It represents a spatial pattern of the categories, however, it completely omits the secondary information about the weight of each raster cell. To utilize the secondary information, a weighted co-occurrence matrix (*wecoma*) was developed. It is a modification of the co-occurrence matrix, in which each adjacency contributes to the output based on the values from the weight raster. The contributed value is calculated as the average of the weights in the two adjacent cells.

1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|

1 | 0.00 | 7.08 | 15.42 | 0.00 | 2.18 |

2 | 7.08 | 3513.53 | 1723.24 | 92.45 | 923.97 |

3 | 15.42 | 1723.24 | 4524.03 | 113.75 | 2029.07 |

4 | 0.00 | 92.45 | 113.75 | 3.72 | 36.37 |

5 | 2.18 | 923.97 | 2029.07 | 36.37 | 1574.14 |

As you can see above, the weighted co-occurrence matrix differs from regular *coma*.

Similarly to the previous case, we can also convert *wecoma* into a one-dimensional normalized representation now called a weighted co-occurrence vector (*wecove*):

0 | 0.0007 | 0.1802 | 0.0016 | 0.1767 | 0.232 | 0 | 0.0095 | 0.0117 | 0.0002 | 0.0002 | 0.0948 | 0.2081 | 0.0037 | 0.0807 |

You can also see the weighted co-occurrence matrix (wecoma) concept, there described as an exposure matrix, in action in the vignettes of the raceland package.

Another situation would be when we have two or more categorical raster variables. For example, let’s consider one raster with land cover categories and one with landform classes.

The question here is how to create a signature that incorporates spatial patterns of both land cover and landform data? The apparent solution would be to create a new raster with the joint-distribution of class labels. For example, if agriculture is represented as `1`

in the first raster and flat plains are represented as `1`

in the second raster, then a value of `101`

would represent agriculture on a flat plain in a new raster. Next, we could just calculate a regular co-occurrence matrix. However, this approach is not recommended - by creating joint labels in this example data, we would end up with 84 categories, and therefore with a co-occurrence matrix of 84 by 84. Large signatures not only occupy more storage but also are harder to meaningfuly compare.

An alternative approach is to use an integrated co-occurrence matrix (*incoma*). It consists of co-occurrence matrices (*coma*) and co-located co-occurrence matrices (*cocoma*). In the co-occurrence matrix, we only use one raster and count adjacent categories of each cell. The co-located co-occurrence matrix, on the other hand, uses two rasters and counts neighbors in the second raster for each cell in the first raster.

The *incoma* representation for two rasters consists of four sectors (see an example below):

- A co-occurrence matrix for the first raster.
- A co-located co-occurrence matrix between the first raster and the second raster. It is between the first and third column and the third and fourth row.
- A co-located co-occurrence matrix between the second and the first raster.
- A co-occurrence matrix for the second raster.

Similar to the previous signatures, it is possible to convert *incoma* to its 1D normalized representation called an integrated co-occurrence vector (*incove*).

Spatial signatures allow to store compressed information about spatial patterns of many types of data. It includes a co-occurrence matrix (*coma*) for regular categorical rasters, a weighted co-occurrence matrix (*wecoma*) for categorical rasters with related intensity rasters, and an integrated co-occurrence matrix (*incoma*) for two or more categorical rasters. A spatial signature can be represented by 1D vectors and compared using a large number of distance measures.

To learn more how different spatial signatures can be calculated read the Types of spatial patterns’ signatures, A co-occurrence matrix (coma) representation, A weighted co-occurrence matrix (wecoma) representation, and An integrated co-occurrence matrix (incoma) representation vignettes.

See the Information theory provides a consistent framework for the analysis of spatial patterns blog post.↩︎

It also means dividing the diagonal by two.↩︎

Read https://users.uom.gr/~kouiruki/sung.pdf for a comprehensive review of distance measures.↩︎

Larger values of the

**Jensen-Shannon distance**could occur when two rasters have different categories↩︎

BibTeX citation:

```
@online{nowosad2021,
author = {Nowosad, Jakub},
title = {Describing Categorical Rasters with Spatial Signatures},
date = {2021-02-10},
url = {https://jakubnowosad.com/posts/2021-02-10-motif-bp2/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2021. “Describing Categorical Rasters with Spatial
Signatures.” February 10, 2021. https://jakubnowosad.com/posts/2021-02-10-motif-bp2/.

Discovering and describing spatial patterns is an important element of many geographical studies with spatial patterns being related to ecological and sociological processes. While spatial patterns are often clearly visible on maps, it is not easy to unequivocally decide if two areas are much alike or delineate regions with similar patterns. In this talk, Jakub Nowosad will present a set of consistent ideas on how spatial patterns can be described and analyzed, with a focus on categorical raster data. The core idea is to divide raster data consisting of cells having simple content (a single value) into a large number of smaller areas, and then characterize each area using a statistical description of a pattern - a spatial signature. Spatial signatures are multi-values representations of spatial composition and configuration, and therefore can be compared using a large number of existing distance or dissimilarity measures. This enables spatial analysis such as search, change detection, clustering, and segmentation. During this talk, a number of real-life examples of finding similar spatial patterns, detecting changes over time, and grouping areas with homogeneous patterns for regional, continental, and global scales will be shown.

You can find the slides for the talk at https://nowosad.github.io/giscience-webinar-2021.

BibTeX citation:

```
@online{nowosad2021,
author = {Nowosad, Jakub},
title = {Pattern-Based Spatial Analysis: An Approach for Discovering,
Describing and Studying Geographical Patterns},
date = {2021-02-04},
url = {https://jakubnowosad.com/posts/2021-02-04-pattern-based-spatial-analysis/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2021. “Pattern-Based Spatial Analysis: An Approach
for Discovering, Describing and Studying Geographical Patterns.”
February 4, 2021. https://jakubnowosad.com/posts/2021-02-04-pattern-based-spatial-analysis/.

**motif** is an R package aimed for pattern-based spatial analysis. It allows for spatial analysis such as search, change detection, and clustering to be performed on spatial patterns. This blog post introduces basic ideas behind the pattern-based spatial analysis, and shows the types of problems to which it can be applied.

Discovering and describing patterns is a vital part of many spatial analysis. However, spatial data is gathered in many ways and forms, which requires different approaches to expressing spatial patterns. Other methods are applied when we work with numerical or categorical variables, also other methods are used to find patterns in point datasets, lines datasets, or raster datasets. Next, patterns and their relevance depend on a studied scale, with different patterns found on small or large scales, or data of different spatial resolutions. Finally, the way we describe patterns should depend on our main goal.

In this blog post, I only focus on a small subset of possible problems related to spatial patterns - I am only interested in categorical raster data. Categorical rasters, such as land cover maps, soil categories, or any other categorized images, express spatial patterns by two inter-related properties: composition and configuration. Composition shows how many different categories we have, and how much area they occupy, while configuration focuses on the spatial arrangement of the categories.

Spatial patterns in categorical raster data are most often described by landscape metrics (landscape indices). A landscape metric is a single numerical value expressing some property of a raster, such as diversity of categories or spatial aggregation of classes. In the last 40 or so years, several hundred of different spatial metrics were developed. They are widely used in the field of landscape ecology, but their application can be also found in some other distant fields, even such as clinical pathology (laboratory medicine).

The **landscapemetrics** package allows calculating various landscape metrics in R. It contains a simple categorical raster named `landscape`

with three classes, which we can use to calculate some metrics. To learn more about these ideas and the **landscapemetrics** package visit https://r-spatialecology.github.io/landscapemetrics.

```
library(landscapemetrics)
library(raster)
plot(landscape)
```

For example, the `lsm_l_shdi()`

function calculates Shannon’s diversity index, which shows how many categories we have and what are they abundance. It is 0 when only one patch is present and increases, without limit, as the number of classes increases, while their proportions are similar.

`lsm_l_shdi(landscape)`

```
# A tibble: 1 × 6
layer level class id metric value
<int> <chr> <int> <int> <chr> <dbl>
1 1 landscape NA NA shdi 1.01
```

The `lsm_l_ai()`

function focuses on the configuration of spatial patterns by calculating the aggregation index. It equals to 0 for maximally disaggregated areas and 100 for maximally aggregated ones.

`lsm_l_ai(landscape)`

```
# A tibble: 1 × 6
layer level class id metric value
<int> <chr> <int> <int> <chr> <dbl>
1 1 landscape NA NA ai 81.1
```

The above examples show how we condensed some information about raster data to just one number. It can be useful in a multitude of cases when we want to connect some aspect of a spatial pattern to external processes. However, what do to, if our goal is to find areas with similar spatial patterns?

In theory, we could calculate landscape metrics for many areas and then search for those which have the most similar values to our area of interest. This approach, however, leaves us with a number of problems, including which landscape metrics to use. Many landscape metrics are highly correlated, and their interrelations are hard to interpret.

An alternative approach, in this case, is to use a spatial signature. A spatial signature is a multi-number description that compactly stores information about the composition and configuration of a spatial pattern. Therefore, instead of having just one number representing a raster, we have several numbers that condense information about this location. We can calculate spatial signatures for many rasters, which allows us to find the most similar rasters, describe changes between rasters, or group (cluster) rasters based on the spatial patterns.

Search, change detection, and clustering of spatial patterns have been possible in GRASS GIS using the GeoPAT module or command-line tool GeoPAT 2. All of the above actions can also be now performed natively in R with the **motif** package. In a series of blog posts, I plan to show and explain several use cases. They include:

*A. Finding areas similar to the area of interest*

*B. Comparing changes between two times*

*C. Clustering areas with similar patterns of more than one layer of data*

If you do not want to wait for the next blog post, you can install the the **motif** package with:

`install.packages("motif")`

You can read more about it in the Landscape Ecology article or its preprint:

Nowosad, J. Motif: an open-source R tool for pattern-based spatial analysis. Landscape Ecol (2020). https://doi.org/10.1007/s10980-020-01135-0

You can also visit the package website at https://nowosad.github.io/motif and the GitHub repository with examples at https://github.com/Nowosad/motif-examples.

BibTeX citation:

```
@online{nowosad2021,
author = {Nowosad, Jakub},
title = {Pattern-Based Spatial Analysis in {R:} An Introduction},
date = {2021-02-03},
url = {https://jakubnowosad.com/posts/2021-02-03-motif-bp1/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2021. “Pattern-Based Spatial Analysis in R: An
Introduction.” February 3, 2021. https://jakubnowosad.com/posts/2021-02-03-motif-bp1/.

Bivariate color palettes are products of combining two separate color palettes. They are usually represented by a square with rows (one color palette) and columns (second color palette). You can more about how they are made in the blog post “Bivariate Choropleth Maps: A How-to Guide” by Joshua Stevens.

The main role of bivariate color palettes is to present the values of two variables simultaneously. For example, the map below uses a bivariate palette to represent both GDP per capita and life expectancy for countries in Africa.

The code to create this map is in the **tmap** issue tracker. Some other bivariate maps’ examples can be found in the “Bivarite Mapping with ggplot2” vignette and the “Bivariate maps with ggplot2 and sf” blog post.

The above map has one issue, though. As pointed out by Frederico R Ramos, it is not suitable for people with color vision deficiencies. They are not able to distinguish between some colors, and therefore, cannot understand the map correctly. Therefore, the main question is how to choose a proper bivariate color palette?

The **pals** R package has a dozen or so bivariate color palettes.

```
library(pals)
bivcol = function(pal){
tit = substitute(pal)
pal = pal()
ncol = length(pal)
image(matrix(seq_along(pal), nrow = sqrt(ncol)),
axes = FALSE,
col = pal,
asp = 1)
mtext(tit)
}
```

Twelve of these palettes are presented below.

```
par(mfrow = c(3, 4), mar = c(1, 1, 2, 1))
bivcol(arc.bluepink)
bivcol(brewer.divdiv)
bivcol(brewer.divseq)
bivcol(brewer.qualseq)
bivcol(brewer.seqseq1)
bivcol(brewer.seqseq2)
bivcol(census.blueyellow)
bivcol(stevens.bluered)
bivcol(stevens.greenblue)
bivcol(stevens.pinkblue)
bivcol(stevens.pinkgreen)
bivcol(stevens.purplegold)
```

Now, we can use the **colorblindcheck** package to decide if the selected color palette is colorblind-friendly or not.

```
# remotes::install_github("nowosad/colorblindcheck")
library(colorblindcheck)
```

The main function in this package is `palette_check()`

, which creates summary statistics comparing the original input palette and simulations of three main color vision deficiencies. Let’s use it on two color palettes: `arc.bluepink()`

and `brewer.seqseq2()`

.

```
colorblindcheck::palette_check(arc.bluepink(),
plot = TRUE, bivariate = TRUE)
```

```
name n tolerance ncp ndcp min_dist mean_dist max_dist
1 normal 16 7.135562 120 120 7.1355623 27.72463 53.76783
2 deuteranopia 16 7.135562 120 100 0.3450842 19.79323 52.46731
3 protanopia 16 7.135562 120 96 0.0000000 20.08030 50.20137
4 tritanopia 16 7.135562 120 120 7.9914570 31.48801 71.57927
```

The visual inspection of `arc.bluepink()`

suggests that this palette is not suitable for people with color vision deficiencies, namely deuteranopia and protanopia. In deuteranopia and protanopia simulations, it is almost impossible to distinguish some colors. This problem is also confirmed by the summary statistics, where the minimal distance between colors of the original palette is about 7, while it is only about 0.345 for deuteranopia and 0 (no difference at all) for protanopia.

```
colorblindcheck::palette_check(brewer.seqseq2(),
plot = TRUE, bivariate = TRUE)
```

```
name n tolerance ncp ndcp min_dist mean_dist max_dist
1 normal 9 13.21133 36 36 13.21133 39.99288 94.59810
2 deuteranopia 9 13.21133 36 34 10.99234 40.33172 94.22020
3 protanopia 9 13.21133 36 34 10.53062 38.99158 94.59810
4 tritanopia 9 13.21133 36 36 13.66888 39.60803 94.48661
```

On the other hand, the inspection of `brewer.seqseq2()`

indicate that it is possible to differentiate between all of the colors in this palette based on the original colors and simulations of color vision deficiencies. You can see more examples of **colorblindcheck** in action at https://nowosad.github.io/colorblindcheck.

Using the above function, I tested all of the bivariate color palettes from **pals**. I visualized all of the palettes and decided to keep only the ones for which the minimal distance between colors was above 6.

It allowed to distinguish four palettes - `brewer.divseq`

, `brewer.seqseq2`

, `stevens.greenblue`

, and `stevens.purplegold`

. You can see the comparison between them and simulations of color vision deficiencies below.

```
colorblindcheck::palette_check(brewer.divseq(),
plot = TRUE, bivariate = TRUE)
```

```
name n tolerance ncp ndcp min_dist mean_dist max_dist
1 normal 9 9.237516 36 36 9.237516 38.32933 87.90123
2 deuteranopia 9 9.237516 36 36 9.267188 39.85751 90.88415
3 protanopia 9 9.237516 36 36 9.237516 40.79861 86.08385
4 tritanopia 9 9.237516 36 35 6.777558 32.82160 83.10774
```

```
colorblindcheck::palette_check(brewer.seqseq2(),
plot = TRUE, bivariate = TRUE)
```

```
name n tolerance ncp ndcp min_dist mean_dist max_dist
1 normal 9 13.21133 36 36 13.21133 39.99288 94.59810
2 deuteranopia 9 13.21133 36 34 10.99234 40.33172 94.22020
3 protanopia 9 13.21133 36 34 10.53062 38.99158 94.59810
4 tritanopia 9 13.21133 36 36 13.66888 39.60803 94.48661
```

```
colorblindcheck::palette_check(stevens.greenblue(),
plot = TRUE, bivariate = TRUE)
```

```
name n tolerance ncp ndcp min_dist mean_dist max_dist
1 normal 9 9.29651 36 36 9.296510 26.34666 50.19184
2 deuteranopia 9 9.29651 36 33 7.238684 24.60856 51.19105
3 protanopia 9 9.29651 36 35 7.693015 24.51814 47.10098
4 tritanopia 9 9.29651 36 29 6.154169 20.06474 50.20386
```

```
colorblindcheck::palette_check(stevens.purplegold(),
plot = TRUE, bivariate = TRUE)
```

```
name n tolerance ncp ndcp min_dist mean_dist max_dist
1 normal 9 11.97625 36 36 11.97625 30.13646 53.56032
2 deuteranopia 9 11.97625 36 35 10.57857 27.58839 46.59557
3 protanopia 9 11.97625 36 34 11.48625 29.32017 50.36899
4 tritanopia 9 11.97625 36 28 6.31650 20.96426 49.27898
```

Four palettes from the **pals** package, `brewer.divseq`

, `brewer.seqseq2`

, `stevens.greenblue`

, and `stevens.purplegold`

seem to be the most adequate to use for bivariate visualizations.

All of them are suitable for people with color deficiencies. It is important to note that `brewer.divseq`

is made of a sequential (from bottom to top) and a diverging (from left to right) palette. Therefore its use should be limited only to some subset of applications, when we want to present one variable going from high to low (or vice versa) and one variable that has values around a central neutral point. `brewer.seqseq2`

, `stevens.greenblue`

, and `stevens.purplegold`

, on the other hand, consists of a mix of two sequential palettes and, thus, should be used to present two variables with values going from high to low (or vice versa).

BibTeX citation:

```
@online{nowosad2020,
author = {Nowosad, Jakub},
title = {How to Choose a Bivariate Color Palette?},
date = {2020-08-25},
url = {https://jakubnowosad.com/posts/2020-08-25-cbc-bp2/},
langid = {en}
}
```

For attribution, please cite this work as:

Nowosad, Jakub. 2020. “How to Choose a Bivariate Color
Palette?” August 25, 2020. https://jakubnowosad.com/posts/2020-08-25-cbc-bp2/.