Comparison of spatial patterns in categorical raster data for overlapping regions using R

Methods for comparing spatial patterns in raster data

This is the fourth part of a blog post series on comparing spatial patterns in raster data. More information about the whole series can be found in part one.

This blog post focuses on the comparison of spatial patterns in categorical raster data for overlapping regions. In other words, here we have two rasters with the same number of rows and columns, and we want to compare their spatial patterns.

For this blog post, we use two categorical raster datasets: the CORINE Land Cover (CLC) datasets for Tartu (Estonia) for the years 2000 and 2018.

library(terra)
clc2000_tartu = rast("https://github.com/Nowosad/comparing-spatial-patterns-2024/raw/refs/heads/main/data/clc2000_tartu.tif")
clc2018_tartu = rast("https://github.com/Nowosad/comparing-spatial-patterns-2024/raw/refs/heads/main/data/clc2018_tartu.tif")
plot(clc2000_tartu, main = "Tartu (2000)")
plot(clc2018_tartu, main = "Tartu (2018)")

In short, the red areas in the rasters represent urban areas, the green areas represent forests, the blue areas represent water bodies, and the yellow areas represent agricultural land.

Non-spatial context

Raster outcome

The binary difference between two rasters

The simplest way to compare two categorical rasters is to create a binary raster that indicates whether the values of the two rasters are the same or different. Here, the output highlights the areas where the values of the two rasters differ (yellow) and where they are the same (purple).

clc_diff = clc2018_tartu != clc2000_tartu
plot(clc_diff)

Multiple value outcome

To get a more detailed comparison, we can calculate the confusion matrix (also known as, e.g., contingency table) of the two rasters. It shows the number of pixels that have the same value in both rasters (diagonal) and the number of pixels with different values (off-diagonal).

cm = table(values(clc2000_tartu), values(clc2018_tartu))
cm

   
        1     2     3     4     6     7
  1  4821   357     2    10     0     0
  2  1342 67915   684   389     0     0
  3    59   415 33670  2896     9    22
  4   122   638  1435  7040   229    24
  6     0     3    13    20  2177    37
  7     0     0     0     0     3  1995

For example, the table shows that there are 4821 pixels with the value 1 in both rasters, and 357 pixels with the value 2 in the first raster and the value 1 in the second raster.

Confusion matrices is also a building block for many other statistics, including accuracy, that can be calculated to compare the two sets of categorical data.

Single value outcome

The proportion of changed pixels

A binary difference raster can be also used to calculate the proportion of changed pixels by dividing the number of changed pixels by the total number of non-NA pixels.

clc_diff = clc2018_tartu != clc2000_tartu
changed_pixels = freq(clc_diff)$count[2]
na_pixels = freq(clc_diff, value = NA)$count
total_pixels = ncell(clc_diff) - na_pixels 
proportion_changed = changed_pixels / total_pixels
proportion_changed

[1] 0.06894013

This outcome shows that about 0.07 of the pixels have changed between the two rasters.

The overall comparison

The overall comparison of the two rasters can be done by calculating the difference between the frequencies of the values of the two rasters (Pontius 2002).

clc2000_tartu_freq = freq(clc2000_tartu)
clc2018_tartu_freq = freq(clc2018_tartu)
freq = merge(clc2000_tartu_freq, clc2018_tartu_freq, by = "value", all = TRUE)
freq$diff = abs(freq$count.x - freq$count.y)
sum_diff = sum(freq$diff)
na_pixels = freq(clc_diff, value = NA)$count
total_pixels = ncell(clc_diff) - na_pixels 
1 - sum_diff / total_pixels

[1] 0.9640774

The overall comparison tells what is the percentage of pixels that have the same class in both rasters (however, it does not consider if the pixels are in the same location).

Statistics of the differences between rasters’ values

More statistics of the differences between the values of the two rasters can be calculated using the diffeR package (Pontius Jr. and Santacruz 2023). These statistics are based on the confusion matrix and include the overall allocation disagreement, overall difference, overall exchange disagreement, overall quantity disagreement, and overall shift disagreement.¹

library(diffeR)
clc_ct = crosstabm(clc2000_tartu, clc2018_tartu)
diffeR_df = data.frame(
  overallAllocD = overallAllocD(clc_ct),
  overallDiff = overallDiff(clc_ct),
  overallExchangeD = overallExchangeD(clc_ct),
  overallQtyD = overallQtyD(clc_ct),
  overallShiftD = overallShiftD(clc_ct)
)
diffeR_df

  overallAllocD overallDiff overallExchangeD overallQtyD overallShiftD
1          6440        8709             5280        2269          1160

Spatial context

Raster outcome

The difference between a focal measure of two rasters (e.g., selected landscape metric)

To include the spatial context in the comparison, we can calculate the difference between a focal measure of two rasters. An example of such a measure is the relative mutual information (relmutinf) metric, which quantifies the clumpiness of the landscape – the larger the value, the more clumped the landscape is [^See a blog post about this metric].

The below code chunk uses the landscapemetrics package (Hesselbarth et al. 2019) to specify a moving window of 5 by 5 and calculate the relmutinf metric for the two rasters. Next, it calculates the absolute difference between the two rasters and plots the result.

library(landscapemetrics)
window = matrix(1, nrow = 5, ncol = 5)
clc2000_tartu_relmutinf_mw = window_lsm(clc2000_tartu, window = window,
                                        what = "lsm_l_relmutinf")
clc2018_tartu_relmutinf_mw = window_lsm(clc2018_tartu, window = window, 
                                        what = "lsm_l_relmutinf")
clc2018_tartu_relmutinf_diff = abs(clc2018_tartu_relmutinf_mw[[1]][[1]] -
                                   clc2000_tartu_relmutinf_mw[[1]][[1]])
plot(clc2018_tartu_relmutinf_diff)

The largest values in the output indicate the areas where the clumpiness of the landscape has changed the most between the two rasters.

Alternatively, if we calculate the regular difference, the output will show the areas where the clumpiness of the landscape has increased (positive values) and decreased (negative values).

plot_div = function(r, ...){
  r_range = range(values(r), na.rm = TRUE, finite = TRUE)
  max_abs = max(abs(r_range))
  new_range = c(-max_abs, max_abs)
  plot(r, col = hcl.colors(100, palette = "prgn"), range = new_range, ...)
}
clc2018_tartu_relmutinf_diff2 = clc2018_tartu_relmutinf_mw[[1]][[1]] - 
                                clc2000_tartu_relmutinf_mw[[1]][[1]]
plot_div(clc2018_tartu_relmutinf_diff2)

Cross-entropy loss function

The moving window approach is also used in the raster.change() function from the spatialEco package (Evans and Murphy 2023). Its first two arguments are the two rasters to compare, the s argument specifies the size of the moving window, and the stat argument specifies the statistic to calculate. For example, stat = "cross-entropy" calculates the cross-entropy loss function, where the larger the value, the more different the two rasters are.

library(spatialEco)
clc_ce = raster.change(clc2000_tartu, clc2018_tartu, s = 5, 
                       stat = "cross-entropy")
plot(clc_ce)

Note that the above calculation may take a few minutes to complete.

Multiple value outcome

Various statistics from categorical data can be calculated at multiple scales with the waywiser package (Mahoney 2023). Here, the ww_multi_scale() function calculates the accuracy of the two rasters at different scales from 500 to 3000 meters (map units).

library(waywiser)
cell_sizes = seq(500, 3000, by = 500)
clc_multi_scale = ww_multi_scale(truth = as.factor(clc2000_tartu),
                                 estimate = as.factor(clc2018_tartu),
                                 metrics = list(yardstick::accuracy), 
                                 cellsize = cell_sizes,
                                 progress = FALSE)
clc_multi_scale

# A tibble: 6 × 6
  .metric  .estimator .estimate .grid_args       .grid            .notes  
  <chr>    <chr>          <dbl> <list>           <list>           <list>  
1 accuracy multiclass     0.781 <tibble [1 × 1]> <sf [6,400 × 5]> <tibble>
2 accuracy multiclass     0.607 <tibble [1 × 1]> <sf [1,600 × 5]> <tibble>
3 accuracy multiclass     0.453 <tibble [1 × 1]> <sf [729 × 5]>   <tibble>
4 accuracy multiclass     0.358 <tibble [1 × 1]> <sf [400 × 5]>   <tibble>
5 accuracy multiclass     0.246 <tibble [1 × 1]> <sf [256 × 5]>   <tibble>
6 accuracy multiclass     0.270 <tibble [1 × 1]> <sf [196 × 5]>   <tibble>

The output shows the accuracy of the two rasters at each scale: the largest value is at the scale of 500 meters, and the smallest value is at the scale of 3000 meters. It shows that with the increase of the scale, the agreement between the two rasters decreases – both rasters are similar at the local scale, but they differ at the regional one.

Single value outcome

Spatial association between regionalizations using V-measure

The sabre package (Nowosad and Stepinski 2018) provides a function to calculate a few measures of spatial association between two categorical maps.²

library(sabre)
clc_sabre = vmeasure_calc(clc2000_tartu, clc2018_tartu)
clc_sabre

The SABRE results:

 V-measure: 0.77 
 Homogeneity: 0.78 
 Completeness: 0.76 

 The spatial objects can be retrieved with:
 $map1 - the first map
 $map2 - the second map

Its output returns three values: the homogeneity, the completeness, and the V-measure. The homogeneity measures how well regions from the first map fit inside of regions from the second map, the completeness measures how well regions from the second map fit inside of regions from the first map, and the V-measure is the weighted harmonic mean of homogeneity and completeness. All of them range from 0 to 1, where larger values indicate better spatial agreement.

Additionally, the output contains two sets of maps of regions’ inhomogeneities (rih): the first set shows how the regions from the first map are inhomogenous in regard to the regions from the second map, and the second set shows how the regions from the second map are inhomogenous in regard to the regions from the first map.

plot(clc_sabre$map1[[2]])
plot(clc_sabre$map2[[2]])

References

Evans, Jeffrey S., and Melanie A. Murphy. 2023. spatialEco. Manual.

Hesselbarth, Maximilian H. K., Marco Sciaini, Kimberly A. With, Kerstin Wiegand, and Jakub Nowosad. 2019. “Landscapemetrics : An Open-Source R Tool to Calculate Landscape Metrics.” Ecography 42 (10): 1648–57. https://doi.org/10/gf4n9j.

Mahoney, Michael J. 2023. “Waywiser: Ergonomic Methods for Assessing Spatial Models.” https://doi.org/10.48550/arXiv.2303.11312.

Nowosad, J., and T. F. Stepinski. 2018. “Spatial Association Between Regionalizations Using the Information-Theoretical V -Measure.” International Journal of Geographical Information Science 32 (12): 2386–2401. https://doi.org/10/gf283f.

Pontius Jr., Robert Gilmore, and Ali Santacruz. 2023. diffeR: Metrics of Difference for Comparing Pairs of Maps or Pairs of Variables. Manual.

Pontius, R Gil. 2002. “Statistical Methods to Partition Effects of Quantity and Location During Comparison of Categorical Maps at Multiple Resolutions.” Photogrammetric Engineering.

Footnotes

And more, as can be found in the package documentation – see ?diffeR::overallAllocD.↩︎
Its output actually gives a few values and two maps, but the most important one is the V-measure.↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2024,
  author = {Nowosad, Jakub},
  title = {Comparison of Spatial Patterns in Categorical Raster Data for
    Overlapping Regions Using {R}},
  date = {2024-11-03},
  url = {https://jakubnowosad.com/posts/2024-11-03-spatcomp-bp4/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2024. “Comparison of Spatial Patterns in Categorical Raster Data for Overlapping Regions Using R.” November 3, 2024. https://jakubnowosad.com/posts/2024-11-03-spatcomp-bp4/.