Comparison of spatial patterns in continuous raster data for overlapping regions using R

Methods for comparing spatial patterns in raster data

This is the second part of a blog post series on comparing spatial patterns in raster data. More information about the whole series can be found in part one.

This blog post shows various methods for comparing spatial patterns in continuous raster data for overlapping regions, i.e., how to compare two rasters for the same region, but in different moments in time (or, in some cases, with different variables)¹ using R programming language.

Two continuous raster datasets are used in this blog post: the Normalized Difference Vegetation Index (NDVI) for Tartu (Estonia) for the years 2000 and 2018.

library(terra)
ndvi2018_tartu = rast("https://github.com/Nowosad/comparing-spatial-patterns-2024/raw/refs/heads/main/data/ndvi2018_tartu.tif")
ndvi2023_tartu = rast("https://github.com/Nowosad/comparing-spatial-patterns-2024/raw/refs/heads/main/data/ndvi2023_tartu.tif")
plot(ndvi2018_tartu, main = "Tartu (2000)")
plot(ndvi2023_tartu, main = "Tartu (2018)")

Non-spatial context

Raster outcome

The difference between values of two rasters for each cell

The most basic approach to compare two continuous rasters is to calculate the difference between their values for each cell.

ndvi_diff = ndvi2023_tartu - ndvi2018_tartu
plot(ndvi_diff)

The output is a raster with positive values indicating that the values of the first raster are higher than the values of the second raster, while negative values indicate the opposite.²

The differences may be even more highlighted by using a diverging color palette. The below function plot_div() visualizes a raster with a diverging color palette, where the middle color represents zero.

plot_div = function(r, ...){
  r_range = range(values(r), na.rm = TRUE, finite = TRUE)
  max_abs = max(abs(r_range))
  new_range = c(-max_abs, max_abs)
  plot(r, col = hcl.colors(100, palette = "prgn"), range = new_range, ...)
}
plot_div(ndvi_diff)

It shows that most of the area has negative values, indicating that the NDVI values were lower in 2023 than in 2018.

The interpretation of the difference between the values of two rasters is not always straightforward as it depends, among other things, on the properties of the studied variable. Some variables are linear, and thus, the difference between the values has a clear meaning, while others are not, and the difference may be harder to interpret.³

Multiple value outcome

The distribution of the difference between values of two rasters

ndvi_diff = ndvi2023_tartu - ndvi2018_tartu
hist(ndvi_diff)

Single value outcome

Statistics of the differences between rasters’ values

A similar approach, also based on the values of each cell independently, is to calculate the statistics of the differences between the two rasters’ values (i.e., error metrics).

One of the most common error metrics is the Root Mean Square Error (RMSE), which quantifies the average difference between the values of the two rasters. Its lower value indicates a better agreement between the two rasters. RMSE is implemented in many R packages, and here we use the version from the yardstick package (Kuhn, Vaughan, and Hvitfeldt 2024).⁴

In general, error metrics require two sets of values; thus, we are using the values() function to extract the values of the rasters before calculating the RMSE.

library(yardstick)
ndvi_rmse = rmse_vec(values(ndvi2023_tartu)[, 1], values(ndvi2018_tartu)[, 1])
ndvi_rmse

[1] 0.2191853

The diffeR package (Pontius Jr. and Santacruz 2023), on the other hand, is directly aimed at comparing rasters. One of its functions, MAD(), calculates the Mean Absolute Difference (MAD) between two rasters.

library(diffeR)
ndvi_mad = MAD(ndvi2023_tartu, ndvi2018_tartu)

[1] "The mean of grid1 is less than the mean of grid2"

ndvi_mad$Total

[1] 0.184306

This measure has a similar interpretation to RMSE, but it is less sensitive to outliers.

Spatial context

Raster outcome

The main general approach of incorporating spatial context into comparison of rasters is the use of calculations in a moving window. This way, we do not only consider the values of each cell independently, but also the values of the surrounding cells.⁵

Correlation coefficient between focal regions of two rasters

The focalPairs() function from the terra package (Hijmans 2024) uses a moving window to extract values from the focal regions of two rasters and calculates the correlation coefficient between them. It requires two rasters with the same number of rows and columns, the size of the moving window, and the function to calculate the correlation coefficient.⁶

ndvi_cor = focalPairs(c(ndvi2023_tartu, ndvi2018_tartu), w = 5,
                      fun = "pearson", na.rm = TRUE)
plot_div(ndvi_cor)

The values of the output raster range from -1 to 1, where 1 indicates a perfect positive correlation in the focal regions, 0 indicates no correlation (the focal values have no relationship to each other), and -1 indicates a perfect negative correlation.

The difference between a focal measure of two rasters

Similarly to the correlation coefficient, other statistics can be calculated in a moving window. This option is offered by several R packages, including geodiv (Smith et al. 2023), GLCMTextures (Ilich 2020), and rasterdiv (Rocchini et al. 2021).

The geodiv package provides a function focal_metrics() that calculates more than 20 different focal statistics. In the below example, we calculate the average surface roughness (SA) in a moving window separately for the two rasters and then calculate the difference between them.

library(geodiv)
window = matrix(1, nrow = 5, ncol = 5)
ndvi2018_tartu_sa_mw = focal_metrics(ndvi2018_tartu, window = window,
                               metric = "sa", progress = FALSE)
ndvi2023_tartu_sa_mw = focal_metrics(ndvi2023_tartu, window = window,
                               metric = "sa", progress = FALSE)
ndvi_diff_sa_mw = ndvi2023_tartu_sa_mw$sa - ndvi2018_tartu_sa_mw$sa
plot_div(ndvi_diff_sa_mw)

This outcome shows the difference in the average surface roughness between the two rasters in a moving window. The possitive values indicate that the surface roughness (in this case, the heterogeneity of NDVI) was higher in 2023 than in 2018, while the negative values indicate the opposite. Interestingly, the most extreme (negative) values are located in just a few small areas.

We can also compare various texture GLCM (Gray-Level Co-occurrence Matrix) metrics (Haralick, Shanmugam, and Dinstein 1973) in a moving window that characterizes many aspects of the spatial structure of the rasters. This example uses the GLCMTextures package (Ilich 2020). We calculate the homogeneity of the NDVI values in a moving window for both rasters and calculate the difference between them.

The positive values in the resulting map indicate that the homogeneity of the NDVI values was higher in 2023 than in 2018, while the negative values indicate the opposite.

The Rao’s quadratic entropy is a measure of diversity that takes into account not only the abundance of the classes (values), but also the dissimilarity between them (Rao 1982).

This measure can be calculated using the paRao() function from the rasterdiv package (Rocchini et al. 2021). The function requires the raster with integer values (thus we multiply the NDVI values by 100) and the size of the moving window.

library(rasterdiv)
ndvi2018_tartu_int = as.int(ndvi2018_tartu * 100)
ndvi2023_tartu_int = as.int(ndvi2023_tartu * 100)
ndvi2018_tartu_rao = paRao(ndvi2018_tartu_int, window = 5, progBar = FALSE)
ndvi2023_tartu_rao = paRao(ndvi2023_tartu_int, window = 5, progBar = FALSE)
ndvi_rao_diff = ndvi2023_tartu_rao[[1]][[1]] - ndvi2018_tartu_rao[[1]][[1]]
plot_div(ndvi_rao_diff)

Note that the above calculations may take a few minutes to complete.

The positive values in the resulting map indicate that the diversity (in this case, the heterogeneity of NDVI) was higher in 2023 than in 2018, while the negative values indicate the opposite.

Spatial autocorrelation analysis of the differences

An alternative approach for computing metrics in two rasters and then calculate the difference between them is to calculate the difference between the two rasters, and then compute the spatial autocorrelation of the differences (Cliff 1970).

The terra package provides a function autocor() that calculates the spatial autocorrelation of a raster using a selected method. By default, it uses a three by three moving window (excluding the focal cell) to calculate the Moran’s I index. The global = FALSE argument specifies that the spatial autocorrelation should be calculated for each cell separately, and thus the output is a raster.

ndvi_diff = ndvi2023_tartu - ndvi2018_tartu
ndvi_diff_autocor = autocor(ndvi_diff, method = "moran", global = FALSE)
plot_div(ndvi_diff, main = "Difference")
plot_div(ndvi_diff_autocor, main = "Moran's I of the difference")

The resulting map shows the spatial autocorrelation of the differences between the two rasters. Its high values indicate that the differences are spatially clustered (positive changes are close to positive changes, and negative changes are close to negative changes), while low values indicate that the differences are spatially random. Areas with no spatial autocorrelation (values around zero) indicate that the differences are spatially independent.

Structural similarity index (SSIM) between two rasters

Structural similarity index (SSIM) (Wang et al. 2004) is a measure used to compare similarity between two images (no surprises here). It is based on the comparison of three aspects of the images: luminance, contrast, and structure, and its idea is to mimic the human perception of similarity.

The SSIMmap package (Ha and Long 2023) provides a function ssim_raster() that calculates the SSIM between two rasters.⁷

library(SSIMmap)
ndvi_ssim = ssim_raster(ndvi2018_tartu, ndvi2023_tartu, global = FALSE, w = 5)
plot_div(ndvi_ssim[[1]])

The interpretation of the SSIM values is similar to the correlation coefficient: 1 indicates a perfect similarity, 0 indicates no similarity, and -1 indicates a perfect dissimilarity.

As of writing this blog post, the results from the SSIMmap package seem not to be consistent with other R implementations of SSIM. See https://github.com/Hailyee-Ha/SSIMmap/issues/1 for more information.

Multiple value outcome

Statistics of the differences between rasters’ values calculated at many scales

All of the single value outcomes can be also calculated at multiple scales, i.e., instead of using each original cell value, the values are aggregated in windows of different sizes.

We have already seen the use of RMSE to compare two rasters based on the values of each cell independently. Now, using the waywiser package (Mahoney 2023), we can calculate the RMSE at multiple scales.

Its function ww_multi_scale() requires the two rasters, the metrics to calculate, and the sizes of the windows (in cells) to aggregate the values. Here, we calculate the RMSE at six different scales, from 50 to 300 meters (map units).

library(waywiser)
cell_sizes = seq(50, 300, by = 50)
ndvi_multi_scale = ww_multi_scale(truth = ndvi2018_tartu, estimate = ndvi2023_tartu,
                                 metrics = list(yardstick::rmse), 
                                 cellsize = cell_sizes,
                                 progress = FALSE)
ndvi_multi_scale

# A tibble: 6 × 6
  .metric .estimator .estimate .grid_args       .grid             .notes  
  <chr>   <chr>          <dbl> <list>           <list>            <list>  
1 rmse    standard       0.205 <tibble [1 × 1]> <sf [10,000 × 5]> <tibble>
2 rmse    standard       0.198 <tibble [1 × 1]> <sf [2,500 × 5]>  <tibble>
3 rmse    standard       0.195 <tibble [1 × 1]> <sf [1,156 × 5]>  <tibble>
4 rmse    standard       0.193 <tibble [1 × 1]> <sf [625 × 5]>    <tibble>
5 rmse    standard       0.193 <tibble [1 × 1]> <sf [400 × 5]>    <tibble>
6 rmse    standard       0.193 <tibble [1 × 1]> <sf [289 × 5]>    <tibble>

This result (note the .estimate column) shows the RMSE values at each scale: the largest value is at the scale of 50 meters, and the smallest value is at the scale of 200, 250, and 300 meters. It indicates that with the increase of the scale, the agreement between the two rasters increases.

A similar approach can be found in the MAD() function from the diffeR package, which calculates the Mean Absolute Difference (MAD) between two rasters at multiple scales. Here, these scales start at the original resolution of the rasters and increase by a factor of 2.

library(diffeR)
ndvi_mad = MAD(ndvi2023_tartu, ndvi2018_tartu, eval = "multiple")

[1] "The mean of grid1 is less than the mean of grid2"

ndvi_mad

   Multiples Resolution  Quantity Strata      Element     Total
1          1         10 0.1783152      0 0.0059907211 0.1843060
2          2         20 0.1783152      0 0.0038461843 0.1821614
3          4         40 0.1783152      0 0.0016917875 0.1800070
4          8         80 0.1783152      0 0.0003588876 0.1786741
5         16        160 0.1783152      0 0.0001143444 0.1784296
6         32        320 0.1783152      0 0.0001143444 0.1784296
7         64        640 0.1783152      0 0.0000000000 0.1783152
8        128       1280 0.1783152      0 0.0000000000 0.1783152
9        256       2560 0.1783152      0 0.0000000000 0.1783152
10       512       5120 0.1783152      0 0.0000000000 0.1783152

Note the last column of the result, which shows the MAD values at each scale. It is similar to the RMSE result, with the largest value at the original resolution of the rasters and the smallest value at the largest scale.

Single value outcome

Average of Structural Similarity Index

The average (global) SSIM value can be calculated by averaging the SSIM values of all cells in the raster. It is a single value that indicates the overall similarity between the two rasters (Wang et al. 2004; Robertson et al. 2014).

It can be calculated using the ssim_raster() function from the SSIMmap package(Ha and Long 2023).

library(SSIMmap)
ssim_raster(ndvi2018_tartu, ndvi2023_tartu, global = TRUE)

SSIM: 0.63845 SIM: 0.90432 SIV: 0.90778 SIP: 0.75671

The result is a single value that ranges from -1 to 1, where 1 indicates a perfect similarity, 0 indicates no similarity, and -1 indicates a perfect dissimilarity.

References

Cliff, Andrew D. 1970. “Computing the Spatial Correspondence Between Geographical Patterns.” Transactions of the Institute of British Geographers, no. 50 (July): 143. https://doi.org/10.2307/621351.

Ha, Hui Jeong (Hailyee), and Jed Long. 2023. SSIMmap: The Structural Similarity Index Measure for Maps. Manual.

Haralick, Robert M, Karthikeyan Shanmugam, and Its’ Hak Dinstein. 1973. “Textural Features for Image Classification.” IEEE Transactions on Systems, Man, and Cybernetics, no. 6: 610–21.

Hijmans, Robert J. 2024. Terra: Spatial Data Analysis. Manual.

Ilich, Alexander R. 2020. “GLCMTextures.” https://doi.org/10.5281/zenodo.4310186.

Kuhn, Max, Davis Vaughan, and Emil Hvitfeldt. 2024. Yardstick: Tidy Characterizations of Model Performance. Manual.

Mahoney, Michael J. 2023. “Waywiser: Ergonomic Methods for Assessing Spatial Models.” https://doi.org/10.48550/arXiv.2303.11312.

Pontius Jr., Robert Gilmore, and Ali Santacruz. 2023. diffeR: Metrics of Difference for Comparing Pairs of Maps or Pairs of Variables. Manual.

Rao, C.Radhakrishna. 1982. “Diversity and Dissimilarity Coefficients: A Unified Approach.” Theoretical Population Biology 21 (1): 24–43. https://doi.org/10.1016/0040-5809(82)90004-1.

Robertson, Colin, Jed A. Long, Farouk S. Nathoo, Trisalyn A. Nelson, and Cameron C. F. Plouffe. 2014. “Assessing Quality of Spatial Models Using the Structural Similarity Index and Posterior Predictive Checks.” Geographical Analysis 46 (1): 53–74. https://doi.org/10.1111/gean.12028.

Rocchini, Duccio, Elisa Thouverai, Matteo Marcantonio, Martina Iannacito, Daniele Da Re, Michele Torresani, Giovanni Bacaro, et al. 2021. “Rasterdiv - An Information Theory Tailored R Package for Measuring Ecosystem Heterogeneity from Space: To the Origin and Back.” Methods in Ecology and Evolution 12 (6): 2195. https://doi.org/10.1111/2041-210X.13583.

Smith, Annie C., Phoebe Zarnetske, Kyla Dahlin, Adam Wilson, and Andrew Latimer. 2023. Geodiv: Methods for Calculating Gradient Surface Metrics. Manual.

Wang, Z., A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. “Image Quality Assessment: From Error Visibility to Structural Similarity.” IEEE Transactions on Image Processing 13 (4): 600–612. https://doi.org/10.1109/TIP.2003.819861.

Footnotes

In fact, they may also be used for comparing rasters for other regions, but they still should have the same number of rows and columns.↩︎
Alternatively, the absolute difference can be calculated to show the difference in the magnitude of the values.↩︎
Read the comment on this by Mathieu Gravey.↩︎
Implementation of RMSE is also straightforward: rmse = function(x, y) sqrt(mean((x - y)^2)).↩︎
This leaves a tough question of how to choose the size of the moving window.↩︎
The “pearson” method is the only one available by name, but other methods can be used by providing a custom function.↩︎
This function also returns three other rasters: SIM, SIV, and SIP that relate to the luminance, contrast, and structure, respectively.↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2024,
  author = {Nowosad, Jakub},
  title = {Comparison of Spatial Patterns in Continuous Raster Data for
    Overlapping Regions Using {R}},
  date = {2024-10-20},
  url = {https://jakubnowosad.com/posts/2024-10-20-spatcomp-bp2/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2024. “Comparison of Spatial Patterns in Continuous Raster Data for Overlapping Regions Using R.” October 20, 2024. https://jakubnowosad.com/posts/2024-10-20-spatcomp-bp2/.