Thinking in spatial patterns

Navigating the maze: Reflections on applying for the Marie Skłodowska-Curie Actions Postdoctoral Fellowship

Jakub Nowosad — Mon, 22 Jul 2024 00:00:00 GMT

I received a grant from the Marie Skłodowska-Curie Actions Postdoctoral Fellowships (MSCA-PF) program: between August 2024 and August of 2026, I will be at the University of Muenster, Germany, working on a project named PRISM: PReservation and RecognItion of Spatial patterns using Machine learning. The project’s primary goal is to develop and compare methods for validating and including spatial patterns in machine learning. You can read a short description of the project here.¹

I am excited about the project and looking forward to the next two years, but this blog post is not about the project itself. Instead, I want to share some thoughts on the process of applying for the MSCA-PF grant. I hope this post will be helpful for others applying for this grant in the future, but also for grant providers and reviewers who might be interested in improving the process.²

MSCA-PF

In short, the MSCA-PF grant is a competitive grant aimed at early career researchers (up to 8 years after the PhD).³ The European Postdoctoral Fellowship allows researchers to work on a project of their choice for up to two in a host institution in Europe or associated countries. The aim of this grant is not only to support the research project but also to develop new skills and promote knowledge transfer between the researcher and the host institution. The MSCA-PF grant website also lists the following benefits of the grant: a living allowance, a mobility allowance (plus possibly also family, long-term leave, and special needs allowances) and funding for research, training, and networking activities, and management and indirect costs.

The grant proposal

The grant proposal consists of Part A (the administrative part filled in the online portal) and Part B (the scientific part uploaded as a PDF). Part B is divided into two subparts: B1 and B2.

Part B1 is the core part of the proposal and should contain details of the proposed research and training activities along with the practical arrangements proposed to implement them, etc. It is strictly restricted to 10 A4-sized pages, with font size and margin limitations. Part B2 has no page limit and contains the researcher’s CV, the capacity of the participating organization(s), and other related information.

The grant process has extensive documentation, which includes the main website, Q&A Blog, and many other documents, such as “The guide for Applicants”, “PF Handbook”, “Evaluation Form”, “How to complete your ethics self-assessment”, and many more. The extensive documentation, while comprehensive, can be overwhelming for applicants, as it far exceeds the length of the proposal itself.

Application process

After reading the documentation, the application process seems straightforward: you just fill out the online form, write Part B, and submit everything. I initially thought that my main issue would be the page limit of Part B1: how to fit all the ideas about the project into just 10 pages? The reality turned out to be quite different–the main issue was understanding what was actually expected in the proposal.

Part B1 has three main sections: (1) “Excellence”, (2) “Impact”, and (3) “Quality and Efficiency of the Implementation”. Then, each section has a number of subsections. For example, the “Excellence” section has the following subsections:

“Quality and pertinence of the project’s research and innovation objectives (and the extent to which they are ambitious, and go beyond the state of the art).”
“Soundness of the proposed methodology (including interdisciplinary approaches, consideration of the gender dimension and other diversity aspects if relevant for the research project, and the quality of open science practices).”
“Quality of the supervision, training and of the two-way transfer of knowledge between the researcher and the host.”
“Quality and appropriateness of the researcher’s professional experience, competences and skills.”

Then, each subsection has several bullet points that you are supposed to address in your proposal. For example, the first subsection has two bullet points, and the second subsection has five bullet points, etc. Spoiler alert: not addressing these bullet points may result in a lower reviewer’s score.

Moreover, throughout the grant proposal template, you may encounter various new terms and concepts that you need to understand and address. Some are probably well-known by seasoned grant writers but not by all early-career researchers. For example, you need to know the differences between dissemination, exploitation, and communication of the results; what’s “Data Management Plan (DMP)” or “Career Development Plan (CDP)”; what’s “Mobility declaration” or “Evaluation questionnaire”; how to address “Gender dimension and other diversity aspects”, “Environmental considerations in light of the MSCA Green Charter”, etc. These are defined in various documents available online, but it takes time to understand them and it is required to address them in the proposal.

To make things more complicated, the grant documentation also contains several hidden expectations. I got great help by talking to a previous MSCA-PF grant holder, local university advisors, and a KoWi advisor.⁴ Interestingly, they all pointed out different aspects (and hidden expectations) of the grant proposal. Thus, the process can feel like navigating a complex puzzle, with some elements not immediately apparent.

My rough estimation is that about 7 out of 10 pages of Part B1 relate to the expected information: you must address the bullet points, explain the concepts, and meet the hidden expectations. The remaining three pages, scattered throughout the proposal, are for the actual ideas behind the project (and some references).

The proposal’s structure may inadvertently encourage applicants to focus more on meeting specific criteria than on fully elaborating their research ideas. For example, there should be a part about “Open science practices” within the second subsection of Part B1. It does not matter if you are actually doing (or thinking about doing) open science⁵ – you need to write about it to get points. I think it teaches the wrong behavior to young researchers.

Let’s say you wrote a first draft of the text and are happy with it. Now, you need to format it to meet all formal expectations, such as font size, margins, page limits, and more. You may spend many hours moving the text around, changing the font sizes, and so on. Thus, I suggest leaving the formatting as one of the last steps of the proposal writing process.

In total, I spent about three months of part-time work on the grant proposal.⁶ I also suspect that the cumulative time investment across all applicants (of this single-person grant) is substantial, highlighting an opportunity to explore ways to streamline the application process.⁷

Reviewing process

After you submit the grant proposal (the deadline is usually in September), it goes through a reviewing process and the results are announced the following March. You can find the evaluation form at https://ec.europa.eu/info/funding-tenders/opportunities/docs/2021-2027/horizon/temp-form/ef/ef_he-msca_en.pdf. The form mainly focuses on the expected information. On the one hand, it makes sense: the reviewers need to try to be objective and evaluate the proposals based on the same criteria. On the other hand, there’s a chance that proposals that best meet the formal criteria score highly, which may not always align perfectly with identifying the most innovative research ideas.⁸

Celebration?

Now, let’s say that you got the grant. It is time for ~~celebration~~ filling out the grant agreement. This brings me to the topic of the online portals related to the MSCA-PF grant.⁹ Yes, portals. This is because there are several portals that you need to use during the application and project management phases. These portals are not very user-friendly, and each of them uses a different technology and visual style. It takes a lot of time to get used to them, understand which portal you need to use for which task, and how to navigate them. There seems to be room for improving the user experience and integration of these systems.

The fine print

Let’s end up with one important piece of information for those who are considering applying for the MSCA-PF grant. The grant promotes itself with a list of benefits: a living allowance, a mobility allowance (plus possibly also family, long-term leave, and special needs allowances), funding for research, training, networking activities, and management and indirect costs. The living allowance (that is supposed to cover your salary) depends on the country where you will be working, based on a correction factor for the cost of living, while the rest of the allowances are fixed. For example, in the 2023 edition of the grant, the living allowance for Germany was about 5,000 EUR per month, plus the mobility allowance of 600 EUR per month, and the family allowance of 660 EUR per month (plus some money for the research, training, and networking activities, and management and indirect costs).

What is not directly mentioned, however, is that every country (and even institutions in one country) has different rules about salary, taxes, and other benefits. For example, some universities in Germany will just hire you as a regular employee, and you will get a salary based on the pay scale. Thus, you won’t get the mobility and family allowances.¹⁰ Moreover, the grant funding can be treated in some (?) institutions as gross gross (brutto brutto), which means that the money will be first used to cover the employer’s costs, and then you will need to pay the taxes on the remaining money. Thus, the actual amount the researcher receives (netto) is significantly less than the total amount granted.

Given that the way the grant is treated may vary between countries and institutions, I think it is essential to ask about the details before applying for the grant. The best way is probably to contact a previous MSCA-PF grant holder from the institution where you plan to work and ask about all these details.

Conclusions

This may not be obvious after reading this blog post, but I am very happy I got the grant and excited about the project. If I could go back in time knowing all of the above, I would still apply for the MSCA-PF grant. This grant format is an excellent opportunity for researchers to move to a different environment, learn new skills, and develop new ideas.

That being said, I think the grant proposal process could be greatly improved. The complexity of the application process (evidenced by the extensive documentation required for a relatively short proposal) suggests that there may be opportunities to improve the procedure. Currently, it puts a lot of pressure and a high time burden on the applicants and may lead to a situation where the best proposals are not funded.¹¹ This, combined with many hidden expectations, document formatting, and user-unfriendly online portals, makes the whole process even more time-consuming: it could require thousands of hours of young researchers’ work.

I think the grant proposal process should be much simplified and streamlined. While I understand the need for evaluation criteria and objectivity, I think the current system is not the best way to achieve this. In my opinion, the focus of the proposal should be on the research ideas, the potential of the applicants, and the transfer of knowledge, not on the ability to fill in the expected information. A good example of a grant proposal process that I think is much better¹² is the Humboldt Research Fellowship, which has one simple online portal and fairly straightforward expectations – the focus is on the research ideas and the potential of the applicants, and the whole application process is less surprising and much less time-consuming.

I hope you found this post helpful – if you have any questions or comments, feel free to email me. And now, it’s time to pack my bags.

Footnotes

I also plan to write a few blog posts about the project, so stay tuned!↩︎
It will be useful for me to clear my thoughts and reflect on the process.↩︎
The acceptance rate in 2024 was about 15%.↩︎
Thank you!↩︎
I strongly encourage it.↩︎
Of course, I was also working on several other projects, teaching, etc., at the same time.↩︎
About 8,000 applicants multiplied by 3 months is 2,000 years of part-time work of a highly educated person.↩︎
As pointed out by a peer, these types of reviews are still very good to filter out bad proposals. I agree with this statement. At the same time, I also think that science is a strong link problem, and we should focus our efforts on finding the best ideas.↩︎
And, I assume, to other EU grants as well.↩︎
This creates a situation where a person with a family gets the same salary as a person without a family.↩︎
And possibly not even written when the applicants do not have enough privilege to spend three months on a single-person grant proposal.↩︎
But still not perfect.↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2024,
  author = {Nowosad, Jakub},
  title = {Navigating the Maze: {Reflections} on Applying for the
    {Marie} {Skłodowska-Curie} {Actions} {Postdoctoral} {Fellowship}},
  date = {2024-07-22},
  url = {https://jakubnowosad.com/posts/2024-07-22-msca-bp1/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2024. “Navigating the Maze: Reflections on Applying for the Marie Skłodowska-Curie Actions Postdoctoral Fellowship.” July 22, 2024. https://jakubnowosad.com/posts/2024-07-22-msca-bp1/.

Optimizing the parameters of the spatial kinetic Ising model to simulate spatial patterns

Jakub Nowosad — Sun, 07 Jan 2024 00:00:00 GMT

The spatial kinetic Ising model is a simple model of spatial patterns that can be used to simulate the evolution of spatial patterns over time. Its two main parameters are B and J, which control the external pressure and the local autocorrelation tendency, respectively. Both of them have a strong effect on the results of the spatial kinetic Ising model. Thus, the question is how to find the best values of these parameters for a given situation.

This blog post shows how to use the simulated annealing algorithm to find the best values of B and J that minimize the difference between the metrics of the expected map and the metrics of the last simulation.

Data preparation

To reproduce the calculations in the following post, you need to attach the following packages:

library(terra)
library(spatialising)
library(ggplot2)

We will also use and process the same input data as in the previous blog post.

twomaps = rast("/vsicurl/https://github.com/Nowosad/bp-data/raw/main/spatialising-bp/twomaps.tif")
rcl = matrix(c(1, -1, 2, 1), byrow = TRUE, ncol = 2)
twomaps = classify(twomaps, rcl)
map1 = twomaps[[1]]
map2 = twomaps[[2]]
plot(c(map1, map2))

Optimizing the parameters

In this case, we have two maps: the first map (map1) is the initial map, and the second map (map2) is the expected map. Our main goal is to find the best values of B and J that, given the initial map (map1), will result in a simulated map that is as similar as possible to the expected map (map2).

One possible approach to find these best values of B and J is to use an optimization algorithm. This algorithm’s goal is to minimize the difference between the metrics of the expected map (e.g., map2) and the metrics of the last simulation.

For that purpose, we can use the optim_sa() function from the optimization package that implements the simulated annealing algorithm. Firstly, we need to define a function that takes one variable (x) and returns a single numeric value. Here, we created the optimize_model() function that takes the x variable, which is a vector of two values: B and J. The function then runs the spatial kinetic Ising model with the given values of B and J, calculates the composition and texture indexes for the expected map (map2) and the last simulation, and returns the distance between the metrics of the expected map (map2) and the metrics of the last simulation.¹

optimize_model = function(x){
  sim = spatialising::kinetic_ising(x = map1, B = x[1], J = x[2], 
                                    updates = 4)  
  map2_metrics = c(composition_index(map2), texture_index(map2))
  sim_metrics = c(composition_index(sim[[4]]), texture_index(sim[[4]]))             
  dist(rbind(map2_metrics, sim_metrics))[[1]]
}

Secondly, we need to use the optim_sa() function, which takes the optimize_model() function, the initial values of B and J, and the lower and upper bounds for B and J. Here, we set the bounds for B to be between 0 and 0.9 as we know that the proportion of the 1 (forest) values should increase over time. This operation may take about one minute in this case.

optim_params = optimization::optim_sa(fun = optimize_model, 
                                      start = c(0.4, 0.4),
                                      lower = c(0, 0), 
                                      upper = c(0.9, 0.9))

The output of the optim_sa() function is a list with several elements, including the best values of B and J in the par element.

optim_params$par

[1] 0.78 0.51

Here, our optimal values of B and J are 0.78 and 0.51, respectively. Now, we can use the newly derived values to simulate the spatial kinetic Ising model similar to the second map (map2).

sim_optim = kinetic_ising(map1, 
                          B = optim_params$par[1], J = optim_params$par[2], 
                          updates = 4)
plot(c(map2, sim_optim[[4]]))

The map on the left is the expected, true map, and the map on the right is our simulation. While both maps are not identical, they have a similar spatial pattern with a dominance of forest (1) (especially in the northeast part of the map) and a similar configuration of the patches.

Exploring the simulation results

In addition to looking at just the final map, we can also retrace the entire simulation process and its effect on the metrics of spatial patterns. Firstly, we can plot all of the simulated rasters:

names(sim_optim) = paste0("sim_year", 2:5)
plot(sim_optim, nr = 1)

Secondly, we can calculate their metrics of spatial patterns and compare their changes over time.

ci_df = data.frame(year = 1:5, metric = "composition index",
                   value = composition_index(c(map1, sim_optim)))
ti_df = data.frame(year = 1:5, metric = "texture index",
                   value = texture_index(c(map1, sim_optim)))
pred_df = rbind(ci_df, ti_df)
ggplot(pred_df, aes(year, value)) +
  geom_line() +
  facet_wrap(~metric, scales = "free_y")

As expected, the composition index increases over time, from negative values indicating a dominance of the -1 values to positive values indicating a dominance of the 1 values. The texture index, on the other hand, decreases for the first two simulations and then increases for the last two simulations. There is a good (and well-known) reason for that: the composition of values has an impact on the spatial texture. The larger the dominance of one category, the more clustered the values are, and thus the higher the texture index is.

Predicting the future changes

Given the assumption that the external pressure and the local autocorrelation tendency will remain the same, we can use the kinetic_ising() function to predict future spatial patterns.

map2_pred = kinetic_ising(map2, 
                          B = optim_params$par[1], J = optim_params$par[2], 
                          updates = 4)
names(map2_pred) = paste0("sim_year", 6:9)
plot(map2_pred, nr = 1)

The above map shows the predicted spatial patterns for the years 6, 7, 8, and 9.

Conclusions

This blog post showed how to use the simulated annealing algorithm to find the best values of B and J that minimize the difference between the metrics of the expected map and the metrics of the last simulation. It allows not only the simulation of spatial patterns similar to the expected map but also the analysis of the simulation process and prediction of future spatial patterns.

Of course, there are many caveats and limitations of this approach. For example, the spatial kinetic Ising model assumes that the external pressure and the local autocorrelation tendency are constant over time over the entire area. Moreover, the model is not able to simulate patterns that create or modify linear features (e.g., rivers or roads).

To learn more about the spatial kinetic Ising model, its background, possible applications, and limitations, I encourage you to read Tomasz F. Stepinski (2023) and Tomasz F. Stepinski and Nowosad (2023).

References

Stepinski, Tomasz F. 2023. “Spatially Explicit Simulation of Deforestation Using the Ising-Like Neutral Model.” Environmental Research: Ecology 2 (2): 025003. https://doi.org/10.1088/2752-664x/acdbd2.

Stepinski, Tomasz F., and Jakub Nowosad. 2023. “The Kinetic Ising Model Encapsulates Essential Dynamics of Land Pattern Change.” Royal Society Open Science 10 (10): 231005. https://doi.org/10.1098/rsos.231005.

Footnotes

The optimization is also explained at https://jakubnowosad.com/spatialising/articles/Optimizing_spatialising_parameters.html.↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2024,
  author = {Nowosad, Jakub},
  title = {Optimizing the Parameters of the Spatial Kinetic {Ising}
    Model to Simulate Spatial Patterns},
  date = {2024-01-07},
  url = {https://jakubnowosad.com/posts/2024-01-07-spatialising-bp2/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2024. “Optimizing the Parameters of the Spatial Kinetic Ising Model to Simulate Spatial Patterns.” January 7, 2024. https://jakubnowosad.com/posts/2024-01-07-spatialising-bp2/.

Simulating spatial patterns with the spatial kinetic Ising model

Jakub Nowosad — Sun, 17 Dec 2023 00:00:00 GMT

A two-dimensional Ising model is an idealized physical system that consists of a lattice of binary variables (magnetic spins) that can be in one of two states: up or down. Each spin’s state is influenced by its neighbors: the more neighbors in the same state, the more likely the spin will be in the same state. Thus, the change in the state of a spin impacts the state of its neighbors, which in turn affects the state of their neighbors, and so on. It is a simple model that can be used in physics to study, for example, phase transitions given by the temperature of the system.

The above idea can be, in principle, applied to other two-dimensional systems, such as geographical (spatial) data. For example, we can think of a binary spatial raster as a two-dimensional system, where each cell can have one of two states: -1 or 1. These numbers can represent, for example, two land cover categories: forest and non-forest. Next, we can introduce two parameters influencing the state of each cell: B and J. B is an external pressure: it tries to align cells’ values with its sign: are we more likely to have more -1 or 1 values? J is a strength of the local autocorrelation tendency—it tries to align signs of neighboring cells: are we more likely to have more cells of the same value in the neighborhood? The spatial kinetic Ising model is a simulation of such a system, where each cell is given an opportunity to flip its value (1 to -1 or -1 to 1). The probability of a flip depends on the value of the cell, the values of its four neighbors (top, left, bottom, and right), and the values of B and J.

This blog post shows how to use the spatial kinetic Ising model to simulate the change in the spatial pattern of a binary raster using the spatialising R package. The package is available on GitHub at https://github.com/Nowosad/spatialising/.

Setup

To reproduce the calculations in the following post, you need to attach the following packages:

library(terra)
library(spatialising)
library(ggplot2)

Data preparation

The twomaps.tif file contains two maps of the same area, but for different years. The first map represents the year 1, and the second map represents the year 5. Both of them have only two values: 1 for non-forest and 2 for forest:

twomaps = rast("/vsicurl/https://github.com/Nowosad/bp-data/raw/main/spatialising-bp/twomaps.tif")
map1 = twomaps[[1]]
map2 = twomaps[[2]]
plot(c(map1, map2))

The spatial kinetic Ising model requires binary raster data with just two values: -1 and 1. Thus, the first step in our case is to reclassify the original binary (1, 2) map into new (-1, 1) values. This can be done with the classify() function from the terra package.¹ Here, we replace the 1 value with -1 and the 2 value with 1:

rcl = matrix(c(1, -1, 2, 1), byrow = TRUE, ncol = 2)
map1 = classify(map1, rcl)
map2 = classify(map2, rcl)
plot(c(map1, map2))

Running the spatial kinetic Ising model

Now, our data is ready for use in the spatialising package. Its main function is kinetic_ising(), which simulates the spatial kinetic Ising model. It requires the input raster (x), the strength of the external pressure (B), the strength of the local autocorrelation tendency (J), and also has some optional arguments, such as the number of updates.

Our goal is to simulate the change in the spatial pattern of the first map (map1) to make it similar to the second map (map2). The code below simulates the spatial kinetic Ising model for the first map (map1) with the value of B of 0.3 (meaning that the external pressure is toward increased forest cover) and the value of J of 0.7 (meaning that the local autocorrelation tendency is strong). We also set the number of updates to 4, which means that it will create four simulations (rasters), each of which will be based on the previous one.²

sim1 = kinetic_ising(map1, B = 0.3, J = 0.7, updates = 4)
plot(sim1, nr = 1)

The result consists of four simulated rasters, which are stored in the sim1 object. Each of them represents successive simulations of the spatial kinetic Ising model.

We can also compare the final simulation (sim1[[4]]) with the first (map1) and the second map (map2).

plot(c(map1, map2, sim1[[4]]), nr = 1)

Compared to the first map, the last simulation has slightly more forest cover, which is in line with the provided external pressure (B) toward increased forest cover. The forest category also tends to be spatially clustered, which is in line with the set local autocorrelation tendency (J). On the other hand, the simulation is still quite different from the second map (map2), possibly indicating that we should increase the value of B to make the simulation more like the second map.

Metrics of spatial patterns

The spatialising package also provides two functions to calculate metrics of spatial patterns of binary rasters: composition_index() and texture_index(). The composition imbalance index (composition_index()) is a sum of cell’s values over the entire site divided by the number of cells in the site. It has a range from -1 (site completely dominated by the -1 values) to 1 (site completely dominated by the 1 values). The value of 0 indicates that the site is equally divided between the two values.

composition_index(c(map1, map2, sim1[[4]]))

[1] -0.5000  0.5000 -0.4184

In our case, map1 has a dominance of the -1 values, map2 has a dominance of the 1 values, and sim1[[4]] has a dominance of the -1 values, but it is less pronounced than in map1.

The texture index (texture_index()) is a measure of the spatial autocorrelation of the values of a raster. Its value is between 0 (fine texture), and 1 (coarse texture).

texture_index(c(map1, map2, sim1[[4]]))

[1] 0.6477551 0.6216327 0.8387755

In our examples, map1 and map2 have a rather similar texture, while sim1[[4]] has a slightly coarser texture (its values have stronger spatial autocorrelation).

Spatial kinetic Ising model explained

How does the spatial kinetic Ising model work? The simulation starts with the input binary (-1, 1) raster and proceeds with one randomly selected cell at a time. The selected cell is given an opportunity to flip its value (1 to -1 or -1 to 1). The probability of a flip depends on the value of the cell and the values of its four neighbors (top, left, bottom, and right). It also depends on the values of B and J. B (positive or negative) is an external pressure: it tries to align cells’ values with its sign. J (always positive) is a strength of the local autocorrelation tendency: it tries to align signs of neighboring cells.

We can also control the model using a few additional arguments. The iter argument controls the number of iterations—how many times the flip of a cell value is attempted before a new simulated raster is returned. By default, its value equals to the number of cells in the input raster. Next, updates controls the number of simulated rasters returned—each of which is based on the previous one. The inertia parameter (0, by default), when positive makes it less likely for a cell of -1 to change its value to 1 when surrounded by other -1 cells. As the effect, it minimizes the possibility of a “salt and pepper” effect, where cells of different values are mixed together. The last important argument is rule, which controls how the probability of a flip is calculated: either using the "glauber" (default) or "metropolis" rule.

The code below compares the results of the spatial kinetic Ising model for different values of B and J.

sim2 = kinetic_ising(map1, B = -0.7, J = 0.1, updates = 4, inertia = 1)
sim3 = kinetic_ising(map1, B = -0.7, J = 0.7, updates = 4, inertia = 1)
sim4 = kinetic_ising(map1, B = 0, J = 0.1, updates = 4, inertia = 1)
sim5 = kinetic_ising(map1, B = 0, J = 0.7, updates = 4, inertia = 1)
sim6 = kinetic_ising(map1, B = 0.7, J = 0.1, updates = 4, inertia = 1)
sim7 = kinetic_ising(map1, B = 0.7, J = 0.7, updates = 4, inertia = 1)
all_sims = c(sim2[[4]], sim3[[4]], sim4[[4]], sim5[[4]], sim6[[4]], sim7[[4]])
names(all_sims) = c("B: -0.7, J: 0.1", "B: -0.7, J: 0.7", "B: 0, J: 0.1", 
                    "B: 0, J: 0.7", "B: 0.7, J: 0.1", "B: 0.7, J: 0.7")
plot(all_sims, nc = 2)

The top row shows the results for B values equal to -0.7, the middle row shows the results for B values equal to 0, and the bottom row shows the results for B values equal to 0.7. The left column shows the results of the spatial kinetic Ising model for values of J equal to 0.1, while the right column shows the results for values of J equal to 0.7.

Quick visual comparison underlines that both parameters have a strong effect on the results of the spatial kinetic Ising model. Negative values of B tend to decrease the forest cover, while positive values of B tend to increase the forest cover. The effect of J is, on the other hand, more related to the configuration of the values, with lower values of J leading to more dispersed values, and higher values of J leading to more clustered values.

Summary

This blog post showed how to use the spatialising package to simulate the spatial kinetic Ising model, how selected parameters influence the results, and how to calculate metrics of spatial patterns. However, it leaves one important question unanswered: how do you find the best values of B and J to make the simulation more like the second map? That is the topic of the next blog post.

To learn more about the spatial kinetic Ising model, I encourage you to read Tomasz F. Stepinski (2023) and Tomasz F. Stepinski and Nowosad (2023).

References

Footnotes

This function can also be used to binarize continuous data or data with many categories.↩︎
Here, we can think of each simulation as a year, and the number of updates as the number of years.↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2023,
  author = {Nowosad, Jakub},
  title = {Simulating Spatial Patterns with the Spatial Kinetic {Ising}
    Model},
  date = {2023-12-17},
  url = {https://jakubnowosad.com/posts/2023-12-17-spatialising-bp1/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2023. “Simulating Spatial Patterns with the Spatial Kinetic Ising Model.” December 17, 2023. https://jakubnowosad.com/posts/2023-12-17-spatialising-bp1/.

Finding the most unique land cover spatial pattern

Jakub Nowosad — Sun, 03 Dec 2023 00:00:00 GMT

Spatial signatures represent spatial patterns of land cover in a given area. Thus, they can be used to search for areas with similar spatial patterns to a query region or to quantify changes in spatial patterns. The approaches above are implemented as lsp_search() and lsp_compare() functions of the motif R package, respectively.

At the same time, it is possible to create other, more customized workflows. Here, I will show how to compare spatial patterns of two different areas and find the most unique land cover spatial pattern in the process.

Spatial data

To reproduce the calculations in the following post, you need to download all of the relevant datasets using the code below:

library(osfr)
dir.create("data")
osf_retrieve_node("xykzv") |>
        osf_ls_files(n_max = Inf) |>
        osf_download(path = "data",
                     conflicts = "skip")

You should also attach the following packages:

library(sf)
library(terra)
library(motif)
library(dplyr)
library(readr)
library(cluster)

Land cover in Africa

The data/land_cover.tif contains land cover data for Africa. It is a categorical raster of the 300-meter resolution that can be read into R using the rast() function.

lc = rast("data/land_cover.tif")

Additionally, the data/lc_palette.csv file contains information about the labels and colors of each land cover category.

lc_palette_df = read.csv("data/lc_palette.csv")

We will use this file to integrate labels and colors into the raster object:

levels(lc) = lc_palette_df[c("value", "label")]
coltab(lc) = lc_palette_df[c("value", "color")]
plot(lc)

Comparing spatial patterns of two areas

First, we need to define the areas for which we want to compare spatial patterns. For the example purpose, we use two African countries: Cameroon and Congo. We can download their areas using the rnaturalearth package, and use them to crop the lc raster object to their borders:

library(rnaturalearth)
# download
cameroon = ne_countries(country = "Cameroon", returnclass = "sf") |>
  select(name) |>
  st_transform(crs = st_crs(lc))
congo = ne_countries(country = "Republic of the Congo", returnclass = "sf") |>
  select(name) |>
  st_transform(crs = st_crs(lc))
# crop
lc_cameroon = crop(lc, cameroon, mask = TRUE)
lc_congo = crop(lc, congo, mask = TRUE)
# plot
plot(lc_cameroon)
plot(lc_congo)

Both countries have similar shares of land cover categories, with the domination of forests and some agricultural and grassland areas, as we can see by calculating their "composition" signatures.

lc_cameroon_composition = lsp_signature(lc_cameroon, type = "composition", classes = 1:9)
lc_congo_composition = lsp_signature(lc_congo, type = "composition", classes = 1:9)
round(lc_cameroon_composition$signature[[1]], 2)

        1    2    3 4 5    6 7 8    9
[1,] 0.15 0.77 0.03 0 0 0.04 0 0 0.01

round(lc_congo_composition$signature[[1]], 2)

       1    2    3 4 5    6 7 8 9
[1,] 0.1 0.82 0.04 0 0 0.03 0 0 0

We can also look at their spatial patterns (both composition and configuration) by calculating the "cove" signature.

lc_cameroon_cove = lsp_signature(lc_cameroon, type = "cove", classes = 1:9)
lc_congo_cove = lsp_signature(lc_congo, type = "cove", classes = 1:9)

Next, these signatures can be compared using dissimilarity measures. The philentropy package provides a wide range of such measures, including the Jensen-Shannon divergence. Here, we use this measure to calculate the dissimilarity between the spatial patterns (as represented with "cove") of Cameroon and Congo.

library(philentropy)
dist_cove = dist_one_one(lc_cameroon_cove$signature[[1]], 
                         lc_congo_cove$signature[[1]], 
                         method = "jensen-shannon")
dist_cove

[1] 0.008919291

This value is small (approximately 0.009), which means that, in general, Cameroon’s and Congo’s spatial patterns are fairly similar.

Comparing local spatial patterns

We can also look at the local spatial patterns of Cameroon and Congo, here on a scale of 100 by 100 cells (i.e., 30 by 30 km):

lc_cameroon_cove100 = lsp_signature(lc_cameroon, type = "cove",
                                    window = 100, classes = 1:9)
lc_congo_cove100 = lsp_signature(lc_congo, type = "cove", 
                                 window = 100, classes = 1:9)

To compare these signatures, we can calculate the Jensen-Shannon divergence for each pair of signatures in both datasets. This can be done using the dist_many_many() function from the philentropy package, which expects two matrices as input.

lc_cameroon_cove100_mat = do.call(rbind, lc_cameroon_cove100$signature)
lc_congo_cove100_mat = do.call(rbind, lc_congo_cove100$signature)
dist_cove_100 = dist_many_many(lc_cameroon_cove100_mat, 
                               lc_congo_cove100_mat, 
                               method = "jensen-shannon")

The result is a matrix with the Jensen-Shannon divergence between each pair of areas in both countries, in which rows represent areas in Cameroon and columns represent areas in Congo. Lower values indicate more similar spatial patterns, while higher values indicate more dissimilar spatial patterns. This matrix shows that there are some areas with similar spatial patterns in both countries, and some are even identical (given the source data scale/resolution and scope/number and variety of categories):

summary(c(dist_cove_100))

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00000 0.03228 0.08722 0.16375 0.22835 0.69315

Identifiers of the identical areas can be found using the which() function. For example, area 341 in Cameroon and area 4 in Congo have the same spatial pattern:

head(which(dist_cove_100 == 0, arr.ind = TRUE))

     row col
[1,] 341   4
[2,] 418   4
[3,] 423   4
[4,] 440   4
[5,] 462   4
[6,] 477   4

We can add spatial information to the lc_cameroon_cove100 and lc_congo_cove100 objects using the lsp_add_sf() function. Then, we are able to visualize these areas by cropping the land cover data using the new objects. In this case, both areas are fully covered by forest (although the second one is located at the border, and thus contains some NA values).

lc_cameroon_cove100_sf = lsp_add_sf(lc_cameroon_cove100)
lc_congo_cove100_sf = lsp_add_sf(lc_congo_cove100)
plot(crop(lc_cameroon, lc_cameroon_cove100_sf[341, ]), main = "Cameroon")
plot(crop(lc_congo, lc_congo_cove100_sf[4, ]), main = "Congo")

Grouping areas with similar local spatial patterns

We can group areas with similar spatial patterns of land cover using the pam() function from the cluster package. For this example, we will divide the areas into six groups.

my_pam = pam(rbind(lc_cameroon_cove100_mat, lc_congo_cove100_mat), 6)

Next, we can add the clustering results to the spatial object by naming both existing sf objects, combining them into one, and adding the clustering results as a new column.

lc_cameroon_cove100_sf$name = "Cameroon"
lc_congo_cove100_sf$name = "Congo"
lc_cove100_sf = rbind(lc_cameroon_cove100_sf, lc_congo_cove100_sf)
lc_cove100_sf$k = as.factor(my_pam$clustering)

Visualization of the results is shown below:

plot(subset(lc_cove100_sf, name == "Cameroon")["k"], pal = palette.colors, main = "Cameroon")
plot(subset(lc_cove100_sf, name == "Congo")["k"],  pal = palette.colors, main = "Congo")

You may quickly notice that the sixth and fifth clusters exist prominently in both countries. On the other hand, cluster 2 only exists in Cameroon.

We can look at each cluster representative by subsetting the lc_cove100_sf object using the id.med column from the my_pam object.

lc_cove100_sf_subset = lc_cove100_sf[my_pam$id.med, ]
for (i in seq_len(nrow(lc_cove100_sf_subset))){
  plot(crop(lc, lc_cove100_sf_subset[i, ]), main = i)
}

Cluster 6 represents forest areas, and cluster 4 consists of areas predominantly covered by forests and some agricultural and grassland areas. Cluster 5 is represented by forest, but with a substantial share of agriculture and grasslands and cluster 1 is a mix of highly aggregated agriculture and forest. Cluster 3 are areas with a large share of shrublands, agricultural and forest areas. Finally, cluster 2, which only exists in Cameroon, represents large (30 by 30 km) areas of agriculture.

Finding the most unique land cover spatial pattern

The dist_cove_100 object contains the Jensen-Shannon divergence between each pair of areas in both countries, where rows represent areas in Cameroon and columns represent areas in Congo. Usually, it may be used to find the most similar areas (areas with the smallest divergence), but here, we will look for the most unique areas.

This can be done in two steps. First, we need to calculate the smallest value in each row and column, which can be done using the apply() function. This allows us to find what is the smallest divergence between each area in Cameroon and Congo; in other words, how dissimilar is an area in one country to the most similar area in the other country.

lc_cameroon_cove100_sf$min_dist = apply(dist_cove_100, 1, min)
plot(lc_cameroon_cove100_sf["min_dist"], main = "Cameroon")
lc_congo_cove100_sf$min_dist = apply(dist_cove_100, 2, min)
plot(lc_congo_cove100_sf["min_dist"], main = "Congo")

Second, we can find the area with the largest value in the lc_cameroon_cove100_sf$min_dist column, which is the most unique area in Cameroon, and the area with the largest value in lc_congo_cove100_sf$min_dist, which is the most unique area in Congo. In other words, these areas are the most dissimilar to any area in the other country.

most_unique_cameroon = lc_cameroon_cove100_sf[which.max(lc_cameroon_cove100_sf$min_dist), ]
plot(crop(lc_cameroon, most_unique_cameroon), main = "Cameroon")
most_unique_congo = lc_congo_cove100_sf[which.max(lc_congo_cove100_sf$min_dist), ]
plot(crop(lc_congo, most_unique_congo), main = "Congo")

In the case of Cameroon, such an area is a mosaic of agriculture and grasslands; for Congo, it is a complex area with grasslands, agriculture, forest, and some shrublands. Interestingly, both areas are located at the border of the countries.¹

Summary

In this post, we have seen how to compare spatial patterns of land cover in two different areas. It also showed how to find the most unique land cover spatial pattern (try to find the most unique area in your country as compared to the rest of the world!) This approach can be used to find areas with unique spatial land cover patterns or any other categorical rasters. To learn more about the motif package, see the other blog posts in the “motif” category.

Footnotes

You could change the threshold parameter in lsp_signature() to 0 to only include areas completely inside the countries’ borders.↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2023,
  author = {Nowosad, Jakub},
  title = {Finding the Most Unique Land Cover Spatial Pattern},
  date = {2023-12-03},
  url = {https://jakubnowosad.com/posts/2023-12-03-motif-bp8/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2023. “Finding the Most Unique Land Cover Spatial Pattern.” December 3, 2023. https://jakubnowosad.com/posts/2023-12-03-motif-bp8/.

Extracting information about spatial patterns from spatial signatures

Jakub Nowosad — Sat, 18 Nov 2023 00:00:00 GMT

The spatial signatures of categorical rasters are a set of numbers that describe the spatial patterns of the provided variables. Next, they allow for further operations such as searching, comparing, or clustering. Less known is that they can also be used to extract information about the composition and configuration of spatial patterns. This blog post shows how to do it using the motif R package.

Spatial data

To reproduce the calculations in the following post, you need to download all of the relevant datasets using the code below:

library(osfr)
dir.create("data")
osf_retrieve_node("xykzv") |>
        osf_ls_files(n_max = Inf) |>
        osf_download(path = "data",
                     conflicts = "skip")

You should also attach the following packages:

library(sf)
library(terra)
library(motif)
library(dplyr)
library(readr)
library(cluster)
library(ggplot2)

Land cover in Africa

The data/land_cover.tif contains land cover data for Africa. It is a categorical raster of the 300-meter resolution that can be read into R using the rast() function.

lc = rast("data/land_cover.tif")

Additionally, the data/lc_palette.csv file contains information about the labels and colors of each land cover category.

lc_palette_df = read.csv("data/lc_palette.csv")

We will use this file to integrate labels and colors into the raster object:

levels(lc) = lc_palette_df[c("value", "label")]
coltab(lc) = lc_palette_df[c("value", "color")]
plot(lc)

Extracting information from spatial signatures

As already shown in the previous blog posts about motif, the lsp_signature() function can be used to extract spatial signatures from a categorical raster object that can be used to describe spatial patterns of land cover. The most fundamental signature is co-occurrence matrix (coma), which is a matrix of co-occurrence frequencies of each pair of land cover categories. The lsp_signature() function can be used to extract the coma signature in 300 by 300 cells non-overlapping windows (i.e., 90 by 90 km) as follows:

lc_coma = lsp_signature(lc, type = "coma", window = 300)

The output is a data frame with 3,843 rows and 3 columns. The most important one is the signature column, which contains coma signatures in each window.

The co-occurrence matrix can be thought of as a compression of information about the composition and configuration of land cover categories in a given window. However, as it consists of many numbers (here, 81), it is not easy to directly analyze, visualize, or interpret. Gladly, we can further extract information from this signature using metrics from the information theory, such as marginal entropy and relative mutual information (for more details, see Nowosad and Stepinski (2019) and the “Information theory provides a consistent framework for the analysis of spatial patterns” blog post).

The it_metric() function from the comat package can be used to calculate these metrics for each coma signature. Here, we calculate marginal entropy ("ent") and relative mutual information ("relmutinf"), and add them to the lc_coma data frame.

lc_coma$ent = vapply(lc_coma$signature, comat::it_metric,
       FUN.VALUE = numeric(1), metric = "ent")
lc_coma$relmutinf = vapply(lc_coma$signature, comat::it_metric,
       FUN.VALUE = numeric(1), metric = "relmutinf")

In short, the marginal entropy is a measure of diversity (thematic complexity, composition) of spatial categories — the larger the entropy, the more diverse the categories in the window. The relative mutual information is a measure of spatial autocorrelation (configuration) of spatial categories – the larger the relative mutual information, the more autocorrelated the categories in the window are.

Importantly, both metrics are uncorrelated, which means that they describe different aspects of spatial patterns of land cover:

plot(lc_coma$ent, lc_coma$relmutinf)

Visualizing spatial patterns’ metrics

We can visualize the spatial distribution of these metrics’ values by removing the signature column and converting the lc_coma object to an sf class:

lc_coma$signature = NULL
lc_coma_sf = lsp_add_sf(lc_coma)
plot(lc_coma_sf["ent"], border = NA)
plot(lc_coma_sf["relmutinf"], border = NA)

Representative examples

We are also able to look at some examples of areas with representative values of these metrics. For that purpose, we use the pam() method (Partitioning Around Medoids) to cluster the lc_coma data frame into six groups based on the scaled values of ent and relmutinf.

pam = pam(scale(lc_coma[, c("ent", "relmutinf")]), 6)

We can see all of the groups on the map by adding a new column with cluster labels to the lc_coma_sf object and plotting it:

lc_coma_sf$cluster = pam$clustering
plot(lc_coma_sf["cluster"], border = NA, pal = palette.colors(6))

Then, we can select one representative from each cluster in a loop using the crop() function and visualize it using plot().

lc_coma_sf_subset = lc_coma_sf[pam$id.med, ]
for (i in seq_len(nrow(lc_coma_sf_subset))){
  ent_sel = round(lc_coma_sf_subset[i, "ent", drop = TRUE], 2)
  relmutinf_sel = round(lc_coma_sf_subset[i, "relmutinf", drop = TRUE], 2)
  plot(crop(lc, lc_coma_sf_subset[i, ]), 
       main = paste0(i, " ent: ",  ent_sel, " relmutinf: ", relmutinf_sel))
}

As you can see above, the ent and relating metrics can be used to describe various spatial patterns of land cover. The last group, 6, is the simplest one with just one land cover category. Next, groups 4 and 5 represent areas with a low diversity of land cover categories but with different spatial autocorrelation: Group 4 has lower spatial autocorrelation (is more fragmented), while group 5 has higher spatial autocorrelation (is less fragmented). Groups 2 and 3 are areas with medium diversity of land cover categories, but with different spatial autocorrelation: group 2 has higher spatial autocorrelation, while group 3 has lower spatial autocorrelation. Finally, group 1 is an area with a high diversity of land cover categories and a medium spatial autocorrelation.

Additional possibilities

Importantly, these metrics do not provide any information about the actual land cover categories. Thus, to look at the results in more depth, we can add information about land cover shares in each window to the lc_coma_sf data frame and use it in further analysis.

Here, we can use the "composition" type of signature to extract information about land cover shares in each window, restructure it from a list column to a set of columns, and add it to the lc_coma_sf data frame.

lc_composition = lsp_signature(lc, type = "composition", window = 300)
lc_composition = lsp_restructure(lc_composition)
lc_coma_sf = left_join(lc_coma_sf, lc_composition)

Now, you are able to subset your data frame based on the land cover shares and analyze the spatial patterns for various types of areas. You can also repeat the above calculations for two time periods or two areas and compare the results.

Summary

This blog post shows how to extract information about the composition and configuration of spatial patterns, visualize it on a map, and look at representative examples. For more details about the information theory-based metrics, see the “Information theory provides a consistent framework for the analysis of spatial patterns” blog post. To learn more about the motif package, see the other blog posts in the “motif” category.

References

Nowosad, Jakub, and Tomasz F. Stepinski. 2019. “Information Theory as a Consistent Framework for Quantification and Classification of Landscape Patterns.” Landscape Ecology 34 (9): 2091–101. https://doi.org/10.1007/s10980-019-00830-x.

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2023,
  author = {Nowosad, Jakub},
  title = {Extracting Information about Spatial Patterns from Spatial
    Signatures},
  date = {2023-11-18},
  url = {https://jakubnowosad.com/posts/2023-11-18-motif-bp7/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2023. “Extracting Information about Spatial Patterns from Spatial Signatures.” November 18, 2023. https://jakubnowosad.com/posts/2023-11-18-motif-bp7/.

Spatial regionalization using universal superpixels algorithm

Jakub Nowosad — Mon, 15 May 2023 00:00:00 GMT

This is a second blog post in a series about the supercells package. You can read the first one at “supercells: universal superpixels algorithm for applications to geospatial data”.

The main idea of supercells is to create groupings of adjacent cells that share common characteristics. This process often results in an over-segmentation – a situation when supercells are internally homogeneous, but not very different from their neighbors. Thus, we need to find the best way to merge similar adjacent supercells into larger entities (regions). It can be done with various approaches, both supervised and unsupervised. Here, we will focus on two examples of unsupervised approaches. The first one, k-means, is a general clustering method, while the second one, SKATER, is a spatial clustering method. The main goals of this blog post are to show how the merging of supercells can be performed and how to evaluate the obtained results.

Example

Let’s start by attaching the packages and reading the input data. We will use the same packages as in the first blog post with one addition – dplyr.

library(supercells)
library(terra)
library(sf)
library(regional)
library(tmap)
library(dplyr)

Here, we will also use the same dataset cropped to an area of 50 by 100 cells.

flc = rast("/vsicurl/https://github.com/Nowosad/supercells-examples/blob/main/raw-data/all_ned.tif?raw=true")
flc1 = project(flc, "EPSG:3035", method = "near")
flc1 = flc[101:150, 1:100, drop = FALSE]

Our next preparation step will be to create a set of supercells representing areas with homogeneous arrangements of fractional land cover values (Figure 1).

flc_sc = supercells(x = flc1,
                    step = 12, compactness = 0.1,
                    dist_fun = "jensen-shannon")

Code

supercells: universal superpixels algorithm for applications to geospatial data

Jakub Nowosad — Sun, 30 Apr 2023 00:00:00 GMT

Segmentation is a process of partitioning space into smaller segments. For example, imagine looking at your family photo and trying to distinct individual people. Similarly, we can look at a satellite image (in RGB colors) with the goal of delineating where are the buildings, fields, roads, etc. In geography, segmentation can also be associated with regionalization. Here, our goal is not to detect objects (e.g., people or buildings) but rather areas with similar properties.

Segmentation/regionalization methods should minimize internal inhomogeneity and maximize external isolation. First, each segment (you can think of them as polygons) should contain similar values.¹ For example, let’s consider two segments representing two different roofs. One roof is entirely covered by red tiles, while the roof of second one looks like a chessboard with red and brown tiles. Both of the segments are homogeneous, however, the latter is more complex than the former. Second segmentation property is the maximization of external isolation. This means that any given segment is different from its neighbors as much as possible.

Segmentation is an optimization problem – trying and testing all of possible segments borders and their properties may take a very long time, even for relatively small data. For that reason, some heuristics need to be used. One way to improve the output and reduce the time/processing cost of segmentation is to perform a preprocessing stage with superpixels.

Superpixels

The main idea of superpixels is to create groupings of adjacent cells that share common characteristics. This process often results in an over-segmentation – this is a situation when our segments (superpixels) are internally homogeneous, but not always very different from their neighbors.

Superpixels are used for two main reasons:

Pixels are not natural entities. They are rather a consequence of the discrete representation of data. For example, depending on the data resolution, our roofs from the previous example can consist of 20 pixels or be just a fraction of one pixel.
Superpixels, as groupings of adjacent cells, reduce the dimensionality of the data making further segmentation tasks easier. For example, we may end up with 5,000 superpixels, instead of 150,000 original pixels.

SLIC algorithm

Many superpixels algorithms have been developed; see Stutz, Hermans, and Leibe (2018). The SLIC algorithm (Achanta et al. 2012) is one of the most often used superpixel algorithms due to its simplicity, accuracy, and low computational cost. It starts with cluster centers spaced by the interval of . Each cell is assigned to the nearest cluster center, and the distance is calculated between the cluster centers and cells in the region. Afterward, new cluster centers (centroids) are updated for the new superpixels, and their color values are the average of all the cells belonging to the given superpixel. The SLIC algorithm works iteratively, repeating the above process until it reaches the expected number of iterations.

The distance is calculated as:

where is the color (spectral) distance, is the compactness parameter, is the spatial (Euclidean) distance, and is the interval between the initial cluster centers.

Typical workflow for the original SLIC algorithm is to convert RGB image into the LAB color space and then use it to create superpixels. In that case, the distance depends on (a) the Euclidean spatial distance between a cell and a superpixel centroid and (b) the Euclidean color distance between a cell’s LAB values and a superpixel centroid average LAB values. As originally implemented by its authors, the SLIC algorithm has the RGB image hard-wired as input data. Thus, its geospatial applications remain restricted to images, RGB, multispectral, or hyperspectral.

SLIC extension

In Nowosad and Stepinski (2022), we propose an extension of SLIC that can be applied to non-imagery geospatial rasters. This includes rasters that carry:

Pattern information (co-occurrence matrices)
Compositional information (histograms)
Time-series information (ordered sequences)
Other forms of information for which the use of Euclidean distance may not be justified²

The extended SLIC allows using any distance measure to calculate the semantic distance – can be replaced with any distance/dissimilarity measure. We implemented the above idea as an R package {supercells}. Note that we decided to use the supercells name (instead of superpixels) to highlight that the method can be applied to various spatial raster data. The package installation instructions can be found at https://jakubnowosad.com/supercells/.

Example

This blog post presents a short example of using spatial raster data with compositional information (histograms). For the study site located in the eastern Netherlands, we downloaded fractions of a pixel’s area covered by different land cover classes (source: Copernicus Global Land Service: 2019 Land Cover 100m-resolution data). Our goal is to create superpixels with similar fractions of land cover classes (Figure 1).

Let’s start by attaching the packages and reading the input data:

library(supercells)
library(terra)
library(sf)
library(regional)
library(tmap)
flc = rast("/vsicurl/https://github.com/Nowosad/supercells-examples/blob/main/raw-data/all_ned.tif?raw=true")

The input data, flc, is a raster of 507 by 1105 cells and eight layers (fractions of different land cover classes). We will resample our raster into a projected CRS and, for a simplicity case and to see our results easier, we will crop it to a 50 by 100 cells area:

flc1 = project(flc, "EPSG:3035", method = "near")
flc1 = flc[101:150, 1:100, drop = FALSE]

The flc1 object represents an area mostly covered by croplands, with some forests in its south-eastern parts, and smaller fractions of grasslands and built-up classes:

tm_shape(flc1) +
  tm_raster(style = "cont", palette = "cividis", title = "Fraction:")

Figure 1: Example data representing fractions of land cover classes

Now, we are able to create supercells using the supercells package and its supercells() function.³ This function is very flexible and its results can be much customized.⁴ Here, we will just use its basic arguments:

x: our input raster with one or more layers; flc1 in our case
step: our interval between initial cluster centers; here we use the value of 12 (cells). Decreasing this value with give us more supercells, and increasing it results in fewer supercells
compactness: , the compactness parameter; here we use the value of 0.1 – the lower the value, the more impact the value distance has on the result
dist_fun: distance function used; here we use the Jensen-Shannon distance ("jensen-shannon"), which is suitable for measuring the dissimilarity between histograms

flc_sc = supercells(x = flc1,
                    step = 12, compactness = 0.1,
                    dist_fun = "jensen-shannon")

The flc_sc result is an sf (spatial vector) object with 28 polygons. We can visualize them on top of the two most prominent land cover classes for this area, forest and cropland (Figure 2):

tm_shape(flc1[[c(1, 5)]]) +
  tm_raster(style = "cont", palette = "cividis", title = "Fraction:") +
  tm_shape(flc_sc) +
  tm_borders(col = "red")

Figure 2: Created supercells on the top of fractions of the forest and cropland classes

This visual inspection allows us to see that supercells serve their purpose: they delineate areas with homogeneous arrangements of fractional land cover values. Areas with dominating fractions of forests are encapsulated in different polygons compared to there with dominating fractions of croplands, or some mixes of land cover classes. At the same time, some supercells are more homogeneous than others. This is due to: (a) the set interval value (a lower value would result in a large number of more homogeneous supercells), and (b) the fact that supercells are not designed especially for roads (or other linear features) detection.

The quality of our result can also be determined numerically: we can calculate “inhomogeneity” of our supercells. The inhomogeneity metric represents an average distance between cells belonging to the same supercell. This value is small when all cells have similar values (land cover classes’ fractions, in our case), and large when cells’ values are very different.

Inhomogeneity can be calculated using the regional package’s function reg_inhomogeneity(). We just need to provide our “regions” (supercells), raster with values, and a distance function. Comparing values of many cells may take a lot of time; thus, usually, it is more efficient to use some subset of them for this comparison. We can specify the subset size with sample_size.

vars = c("Forest", "Shrubland", "Grassland", "Bare.Sparse.vegatation", 
         "Cropland", "Built.up", "Seasonal.inland.water", "Permanent.inland.water")
flc_sc$inh = reg_inhomogeneity(flc_sc[vars], flc1, 
                                dist_fun = "jensen-shannon", sample_size = 100)

The resulting inhomogeneity values can also be visualized, signaling the most and the least consistent supercells (Figure 3):

tm_shape(flc_sc) +
  tm_polygons("inh", title = "Inhomogeneity:", style = "cont") +
  tm_layout(legend.outside = TRUE)

Figure 3: Inhomogeneity values of the created supercells

Additionally, we can calculate an area-weighted inhomogeneity as a general metric of all the supercells:

flc_sc$area_km2 = as.numeric(st_area(flc_sc)) / 1000000
weighted.mean(flc_sc$inh, flc_sc$area_km2)

[1] 0.08516878

Finally, you may notice that several adjacent supercells are very similar, and thus should be merged. Several approaches to merging supercells into larger segments/regions exist. I will discuss them in the next blog post.

Summary

We propose the SLIC algorithm extension to work with non-imagery data structures without data reduction and conversion to the false-color image. It allows for using a data distance measure most appropriate to a particular data structure and a custom function for averaging values of clusters centers. If you want to learn more about supercells, we encourage you to try a few entry points. One is the Nowosad and Stepinski (2022) article that explains the whole idea in more detail and compares our extension and original SLIC algorithms on three examples of non-imagery data. Code related to these examples is available at https://github.com/Nowosad/supercells-examples. You can also see slides from a talk entitled “A method for universal superpixels-based regionalization” that I gave during the FOSS4G 2022 conference at https://jakubnowosad.com/foss4g-2022/. Finally, the package has extensive documentation, including several vignettes, that can be found at https://jakubnowosad.com/supercells/.

References

Achanta, R., A. Shaji, K. Smith, A. Lucchi, P. Fua, and Sabine Süsstrunk. 2012. “SLIC Superpixels Compared to State-of-the-Art Superpixel Methods.” IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (11): 2274–82. https://doi.org/f39g5f.

Nowosad, Jakub, and Tomasz F. Stepinski. 2022. “Extended SLIC Superpixels Algorithm for Applications to Non-Imagery Geospatial Rasters.” International Journal of Applied Earth Observation and Geoinformation 112 (August): 102935. https://doi.org/10.1016/j.jag.2022.102935.

Stutz, David, Alexander Hermans, and Bastian Leibe. 2018. “Superpixels: An Evaluation of the State-of-the-Art.” Computer Vision and Image Understanding 166 (January): 1–27. https://doi.org/gcvnsc.

Footnotes

On a side note: homogeneity does not always imply simplicity.↩︎
Let me know (email/twitter) if you have any examples of such data!↩︎
Supercells!↩︎
Read the “The supercells() function” vignette for more details.↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2023,
  author = {Nowosad, Jakub},
  title = {Supercells: Universal Superpixels Algorithm for Applications
    to Geospatial Data},
  date = {2023-04-30},
  url = {https://jakubnowosad.com/posts/2023-04-30-supercells-bp1/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2023. “Supercells: Universal Superpixels Algorithm for Applications to Geospatial Data.” April 30, 2023. https://jakubnowosad.com/posts/2023-04-30-supercells-bp1/.

How to visualize landscape metrics for local landscapes?

Jakub Nowosad — Thu, 17 Feb 2022 00:00:00 GMT

In the past, I wrote a post on how to calculate landscape-level metrics for local landscapes. There, I showed how to divide the categorical input map into a number of smaller areas and calculate selected landscape metrics for each area. The result consists of a regular grid - a number of square polygons, where each polygon contained just one value of a calculated landscape-level metric (e.g., marginal entropy - ent). This output is easy to visualize on a map - each polygon can be colored accordingly to its value.

There are, although, two more levels of landscape metrics - the class- and patch-levels. Calculation of class-level metrics returns as many values as unique classes in a local landscape, while the patch-level calculations result in as many values as there are patches¹ in each polygon. This makes simple visualizations of class- and patch-level metrics not as straightforward as landscape-level metrics. The main goal of this blog post is to present different approaches for visualization of class- and patch-level metrics.

To reproduce the following results on your own computer, install and attach the packages:

library(landscapemetrics)     # landscape metrics calculation
library(raster)               # spatial raster data reading and handling
library(sf)                   # spatial vector data reading and handling
library(dplyr)                # data manipulation
library(tidyr)                # data manipulation
library(tmap)                 # spatial viz
library(geofacet)             # geofacet
library(ggplot2)              # geofacet

Reading the data

The first step is to read the input data. Here, we are going to use example data that is already included in the landscapemetrics package (Hesselbarth et al. 2019).

data("augusta_nlcd")
my_raster = augusta_nlcd

It is also possible to read any spatial raster file with the raster() function, for example my_raster = raster("path_to_my_file.tif")². However, the input file should fulfill two requirements: (1) contain only integer values that represent categories, and (2) be in a projected coordinate reference system. You can check if your file meets the requirements using the check_landscape() function, and learn more about coordinate reference systems in the Geocomputation with R book (Lovelace, Nowosad, and Muenchow 2019).

Our example data looks like that:

plot(my_raster)

Creating a grid

The next step is to create borders of local landscapes using the st_make_grid() function. This function accepts an sf object as the first argument, therefore we need to create a new object based on the bounding box of the input raster. Next, we also need to provide a second argument, either cellsize or n:

cellsize - vector of length 1 or 2 - the side length of each grid cell in map units (usually meters)
n - vector of length 1 or 2 - the number of grid cells in a row/column

my_grid_geom = st_make_grid(st_as_sfc(st_bbox(my_raster)), cellsize = 1500)
my_grid_template = st_sf(geom = my_grid_geom)

We should also add a unique identification number (id) to each grid cell (local landscape).

my_grid_template$plot_id = seq_len(nrow(my_grid_template))

Next, we can overlay the newly created grid on top of our input raster:

plot(my_raster)
plot(st_geometry(my_grid_template), add = TRUE)

Note that some cells cover smaller areas with data than the others.

Calculating a class-level metric

The calculation of landscape metrics for each cell can be done with the sample_lsm() function. It requires an input raster as the first argument, and a grid as the second one³. The function calculates the selected landscape metric independenly for each cell. Next, we can specify which landscape metrics we want to calculate. For this example, we use aggregation index (ai) to be calculated on a class level^[The complete list of the implemented metrics can be obtained with the list_lsm() function. Let us know if you are missing some metrics.

my_metric1 = sample_lsm(my_raster, my_grid_template,
                       level = "class", metric = "ai")
my_metric1

# A tibble: 1,313 × 8
   layer level class    id metric value plot_id percentage_inside
                         
 1     1 class    11    NA ai      83.3       1               100
 2     1 class    21    NA ai      34.5       1               100
 3     1 class    22    NA ai      23.0       1               100
 4     1 class    31    NA ai      86.8       1               100
 5     1 class    41    NA ai      63.9       1               100
 6     1 class    42    NA ai      73.2       1               100
 7     1 class    43    NA ai      39.5       1               100
 8     1 class    52    NA ai      43.1       1               100
 9     1 class    71    NA ai      66.9       1               100
10     1 class    81    NA ai      85.8       1               100
# … with 1,303 more rows

Each row in the my_metric object corresponds to one calculated value of ai, while the plot_id column specifies to which grid cell the results are related ⁴. Because there are several classes in each cell, there are also several ai values for each cell present. Next, we can connect spatial grid (my_grid_template) with the calculation results (my_metric1) using the left_join() function:

my_grid1 = left_join(my_grid_template, my_metric1, by = "plot_id")

Vizualizing the class-level results

For the class-level results, each local landscape has as many values as unique classes belong to it. It prevents us from creating a single comprehensive map, but, on the other hand, allows for some other visualizations.

Subsets

The most basic approach is to create a map just for a selected class, which can be done with the subset() function:

my_grid1_class1 = subset(my_grid1, class == 11)
plot(my_grid1_class1["value"])

The above plot shows the distribution of ai values for class 11.

Map facets

It is also possible to quickly visualize all of the subsets at the same time with the tmap package (Tennekes 2018).

tm_shape(my_grid1) +
  tm_polygons("value", style = "cont", title = "ai") +
  tm_facets(by = "class", free.coords = FALSE)

The result here contains a separate panel for each unique class in our dataset, while colors represent different AI values.

geofacet

As our local landscapes are made of a regular grid, we can also test some less traditional visualizations. One possibility is to use the geofacet package (Hafen 2020) - it allows to create many regular plots (such as histograms, scatterplots, boxplots, etc.), but arrange them spatially. Visit the package website at https://github.com/hafen/geofacet to find more examples.

The first step here is to create a plotting grid with grid_auto():

grd = grid_auto(my_grid_template, names = "plot_id")

Next, we can create a plot with the ggplot2 syntax (Wickham 2016) adding facet_geo(~plot_id, grid = grd) to it:

ggplot(my_grid1, aes(as.factor(class), value, fill = as.factor(class))) +
  geom_col() +
  facet_geo(~plot_id, grid = grd) +
  labs(x = NULL, y = "ai", fill = "Class") +
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        strip.background = element_blank(),
        strip.text.x = element_blank())

Here, we can see a visualization that consists of many separate bar plots. Each bar plot represents ai values for each local landscape.

Calculating a patch-level metric

The calculation of patch-level metrics can also be done with the sample_lsm() function. In this example, we use euclidean nearest-neighbor distance (enn) - a metric that can only be calculated on a patch level.

my_metric2 = sample_lsm(my_raster, my_grid_template,
                       level = "patch", metric = "enn")
my_metric2

# A tibble: 10,479 × 8
   layer level class    id metric value plot_id percentage_inside
                         
 1     1 patch    11     1 enn    201.        1               100
 2     1 patch    11     2 enn    201.        1               100
 3     1 patch    11     3 enn    495.        1               100
 4     1 patch    21     4 enn    433.        1               100
 5     1 patch    21     5 enn     67.1       1               100
 6     1 patch    21     6 enn     67.1       1               100
 7     1 patch    21     7 enn     67.1       1               100
 8     1 patch    21     8 enn     67.1       1               100
 9     1 patch    21     9 enn     60         1               100
10     1 patch    21    10 enn     60         1               100
# … with 10,469 more rows

The result contains a large number of rows, where each row is related to a unique combination of a patch, a class, and a grid cell. We can connect spatial grid (my_grid_template) with the calculation results (my_metric2) again using the left_join() function:

my_grid2 = left_join(my_grid_template, my_metric2, by = "plot_id")

Vizualizing the patch-level results

Visualization of the patch-level results could be the most challenging to create. Here, each local landscape can have as little as one value (a single large patch) and as many values as cells. Patches also can be related to just a few or to many different classes.

One patch map

The most straightforward approach here is to use the spatialize_lsm() function, which calculates a selected patch-level metric for a given raster, and returns a new raster with the metric values.

my_metric_r_all = spatialize_lsm(my_raster, level = "patch", metric = "enn")

The output of spatialize_lsm() is a nested list. Each list element is used to represent different input layers (e.g., land cover data for different years), while each sub-list element relates to the calculated metric (you are able to calculate many metrics at the same time).

plot(my_metric_r_all$layer_1$lsm_p_enn)

The spatialize_lsm() function is fine when we need to present patch-level values for a whole raster. However, if we have many local landscapes (or sub-rasters) then we need to repeat the spatialize_lsm() calculations for each area.

Patch maps

This can be done with the help of the following code.

patch_raster = function(my_raster, my_grid){
  result = vector(mode = "list", length = nrow(my_grid))
  for (i in seq_len(nrow(my_grid))){
    my_small_raster = crop(my_raster, my_grid[i, ])
    result[[i]] = spatialize_lsm(my_small_raster,
                                 level = "patch", metric = "enn")
  }
  return(result)
}

my_metric_r = patch_raster(my_raster, my_grid_template)
my_metric_r = unlist(my_metric_r)
names(my_metric_r) = NULL
my_metric_r = do.call(merge, my_metric_r)

In it, we go through each grid cell, crop a raster to its extent, and calculate our metric of interest. The result is a large list consisting of many separate small rasters, that we can combine to get a full-size raster in return with do.call() and merge().

plot(my_metric_r)
plot(st_geometry(my_grid_template), add = TRUE)

The resulting visualization looks much different from the previous one. In it, each local landscape is treated independently, therefore, a patch belonging to many grid cells is split into many patches.

Summary

The landscapemetrics package, together with many other open-source R packages, allows for spatial visualizations of class- and patch-level metrics. The class-level metrics can be presented independently for each class, or as a geofacet plot, while patch-level metrics might be visualized in combination with the spatialize_lsm() function. It is also worth to mention that patch-level metrics can also use the geofacet package; however, this could work best when the number of local landscapes is relatively small. Otherwise, the resulting plot could be hard to read. To learn more about landscape metrics and the landscapemetrics package, visit https://r-spatialecology.github.io/landscapemetrics/ and http://dx.doi.org/10.1111/ecog.04617.

Acknowledgement

Many thanks to Maximilian H.K. Hesselbarth for reading and improving a draft of this blog post.

References

Hafen, Ryan. 2020. Geofacet: ’Ggplot2’ Faceting Utilities for Geographical Data. https://CRAN.R-project.org/package=geofacet.

Hesselbarth, Maximilian H. K., Marco Sciaini, Kimberly A. With, Kerstin Wiegand, and Jakub Nowosad. 2019. “Landscapemetrics: An Open-Source r Tool to Calculate Landscape Metrics.” Ecography 42: 1648–57.

Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2019. Geocomputation with R. CRC Press.

Tennekes, Martijn. 2018. “tmap: Thematic Maps in R.” Journal of Statistical Software 84 (6): 1–39. https://doi.org/10.18637/jss.v084.i06.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.

Footnotes

A patch is a group of adjacent cells of the same category.↩︎
Currently landscapemetrics accepts also objects from the terra and stars packages.↩︎
This function also allows for many more possibilities, including specifying a 2-column matrix with coordinates, SpatialPoints, SpatialLines, SpatialPolygons, sf points or sf polygons as the second argument. You can learn all of the valid options using ?sample_lsm.↩︎
To learn more about the structure of the output read the Efficient landscape metrics calculations for buffers around sampling points blog post.↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2022,
  author = {Nowosad, Jakub},
  title = {How to Visualize Landscape Metrics for Local Landscapes?},
  date = {2022-02-17},
  url = {https://jakubnowosad.com/posts/2022-02-17-lsm-bp3/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2022. “How to Visualize Landscape Metrics for Local Landscapes?” February 17, 2022. https://jakubnowosad.com/posts/2022-02-17-lsm-bp3/.

Considerations for the pattern-based spatial analysis

Jakub Nowosad — Wed, 10 Mar 2021 00:00:00 GMT

TLTR:

This is a last blog post in a series about motif - an R package aimed for pattern-based spatial analysis. It sums up previous posts, but also underlines potential considerations when working with spatial patterns. Finally, it lists underexplored topics and future ideas related to pattern-based spatial analysis.

Pattern-based spatial analysis

The first blog post in this series introduces a basic concept of categorical data spatial patterns, and why commonly used landscape metrics are not best suited for finding areas with similar spatial patterns. A better approach is to derive a spatial signature - a multi-number description that compactly stores information about the composition and configuration of a spatial pattern.

In the second blog post presents some basic spatial signatures, including coma (co-occurrence matrix) for single-variable categorical rasters, wecoma (weighted co-occurrence matrix) for single-variable categorical rasters that have another continuous raster representing the intensity of categories, and incoma (integrated co-occurrence matrix) for categorical rasters with two or more variables. All of the mentioned spatial signatures can be converted into a 1D vector - a probability function, and similarity between probability functions can be calculated using one of many distance measures (e.g., Jenson-Shannon distance). Now, having spatial signatures for two areas, we can find out how similar (or dissimilar) they are. This allows us to find the most similar rasters, describe changes between rasters, or group (cluster) rasters based on the spatial patterns.

The third blog post shows how we can search for areas with similar spatial patterns to a query region based on an example of finding areas of similar topography to the area of Suwalski Landscape Park. In the search process, spatial signatures are derived for the query region and many sub-areas of the search space, and distances between them are calculated. Next, sub-areas with the smallest distances from the query region are assumed to be the most similar to it.

The fourth blog post focused on finding areas with the largest change of land cover patterns in the Amazon between 1992 and 2018. The land cover data from the Amazon in 1992 and 2018 were subdivided into areas of 90 by 90 kilometers, and a spatial signature was calculated for each subarea in each year. Then, a distance between spatial signatures for each subarea was derived, with large distance values indicating a large change of spatial patterns.

The fifth blog post showcases clustering of similar spatial patterns of joint spatial patterns of land cover and landforms in Africa. In this process, Africa was divided into many sub-areas and spatial signatures were derived for each sub-area. Distances between signatures for each sub-area were calculated and stored in a distance matrix, which was used as a basis for the creation of clusters of similar spatial patterns. The quality of clusters was assessed visually using a pattern mosaic and with dedicated quality metrics.

Potential applications

The role of the presented examples is to highlight the universality and extensibility of the pattern-based methods. They could be used in a wide range of local, regional, and global studies of global environmental changes, land management, sustainable development, environmental protection, forest cover change, urban growth monitoring, or agriculture expansion studies. Some example research ideas include:

studying global environmental changes by analysis of changes in patterns of different environmental features, such as land cover,
delineating of ecoregions - regionalization of land into homogeneous units of similar ecological and physiographic features (land cover, landform, soils, climate),
clustering of forest patterns, which results could be used for conservation, planning, and management
identifying spatial patterns of cropland usage
inventorying of landscape patterns and analysis of landscape spatial configuration

Additionally, the pattern-based spatial analysis methods and tools could be useful in various other disciplines that use categorical images, for example, medical science, astronomy, or social studies.

Study considerations

However, no matter if we analyze patterns in an environmental raster, demographic map, or categorized microscope image, we need to consider several questions.

How should we preprocess the input data? For example, do we need all 18 categories in our data, or is it better to simplify the number of categories to improve analysis and streamline interpretation of the results? When we are interested in forest fragmentation, do we really need several other land cover classes, or can we merge them into one or two categories? Additionally, preprocessing can be applied to derive new categories from the data. An example of this was shown in the third blog post, where elevation data was first converted into geomorphons before applying any other steps. Reprojecting of the input data may also be important in some cases. In the pattern-based spatial analysis, each cell is treated equally, which means that we usually want to apply data in some equal-area projection.

What is the scale of the process we want to study? Are we interested in investigating patterns in 10 by 10 cell windows or maybe 100 by 100 cell windows? If we do not have any prior information or expectation about spatial scale, then there are two general approaches that could help. Firstly, we could apply the same analysis steps a few times using different sizes of a local window, and decide on a proper spatial scale afterward. Secondly, we could use the smallest meaningful windows we can think of¹, for example, 10 by 10 cells, and then apply the clustering process. After merging similar areas into larger regions, we can decide the spatial scale of homogeneous spatial patterns.

Which signature should we apply? The coma representation was developed for single-variable categorical rasters, wecoma for single-variable categorical rasters that have another continuous raster representing the intensity of categories, and incoma for categorical rasters with two or more variables. Which of the above representation suits your problem the best? Or maybe you need to create some new signature focused on the specifics of your case?

Which distance measure should we use? A few dozen of distance/dissimilarity measures exist². Our previous experiences showed that the Jensen-Shannon distance is suitable to describe relations between spatial patterns of land cover data. However, there is no free lunch in selecting a distance measure, and I would usually recommend trying out a few measures before deciding on one of them.

General considerations and future work

There are also general considerations that would gain from establishing a consistent methodology. For example, how to decide which scale is valid? What type of signatures are still missing and should be developed? How to integrate categorical and continuous spatial patterns in an analysis? What are the advantages and disadvantages of using different distance measures? What are the missing workflows that can be added to the pattern-based spatial analysis?

I encourage everyone to submit their issues or enhancement requests to the motif package, which will help me to prioritize my work. Furthermore, if you have any questions or ideas related to the pattern-based spatial analysis, please email me at nowosad.jakub@gmail.com.

Footnotes

This depends on the number of categories, their spatial arrangements, etc.↩︎
Read https://users.uom.gr/~kouiruki/sung.pdf for a comprehensive review of distance measures.↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2021,
  author = {Nowosad, Jakub},
  title = {Considerations for the Pattern-Based Spatial Analysis},
  date = {2021-03-10},
  url = {https://jakubnowosad.com/posts/2021-03-10-motif-bp6/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2021. “Considerations for the Pattern-Based Spatial Analysis.” March 10, 2021. https://jakubnowosad.com/posts/2021-03-10-motif-bp6/.

Clustering similar spatial patterns

Jakub Nowosad — Wed, 03 Mar 2021 00:00:00 GMT

TLTR:

Clustering similar spatial patterns requires one or more raster datasets for the same area. Input data is divided into many sub-areas, and spatial signatures are derived for each sub-area. Next, distances between signatures for each sub-area are calculated and stored in a distance matrix. The distance matrix can be used to create clusters of similar spatial patterns. Quality of clusters can be assessed visually using a pattern mosaic or with dedicated quality metrics.

Spatial data

To reproduce the calculations in the following post, you need to download all of relevant datasets using the code below:

library(osfr)
dir.create("data")
osf_retrieve_node("xykzv") %>%
        osf_ls_files(n_max = Inf) %>%
        osf_download(path = "data",
                     conflicts = "overwrite")

You should also attach the following packages:

library(sf)
library(stars)
library(motif)
library(tmap)
library(dplyr)
library(readr)

Land cover and landforms in Africa

The data/land_cover.tif contains land cover data and data/landform.tif is landform data for Africa. Both are single categorical rasters of the same extent and the same resolution (300 meters) that can be read into R using the read_stars() function.

lc = read_stars("data/land_cover.tif")
lf = read_stars("data/landform.tif")

Additionally, the data/lc_palette.csv file contains information about colors and labels of each land cover category, and data/lf_palette.csv stores information about colors and labels of each landform class.

lc_palette_df = read_csv("data/lc_palette.csv")
lf_palette_df = read_csv("data/lf_palette.csv")
names(lc_palette_df$color) = lc_palette_df$value
names(lf_palette_df$color) = lf_palette_df$value

Both datasets can be visualized with tmap.

tm_lc = tm_shape(lc) +
        tm_raster(style = "cat",
                  palette = lc_palette_df$color,
                  labels = lc_palette_df$label,
                  title = "Land cover:") +
        tm_layout(legend.position = c("LEFT", "BOTTOM"))
tm_lc

tm_lf = tm_shape(lf) +
        tm_raster(style = "cat",
                  palette = lf_palette_df$color,
                  labels = lf_palette_df$label,
                  title = "Landform:") +
        tm_layout(legend.outside = TRUE)
tm_lf

We can combine these two datasets together with the c() function.

eco_data = c(lc, lf)

The problem now is how to find clusters of similar spatial patterns of both land cover categories and landform classes.

Clustering spatial patterns

The basic step in clustering spatial patterns is to calculate a proper signature for each spatial window using the lsp_signature() function. Here, we use the integrated co-occurrence vector (type = "cove") representation. In this example, we use a window of 300 cells by 300 cells (window = 300). This means that our search scale will be 90 km (300 cells x data resolution) - resulting in dividing the whole area into about 7,500 regular rectangles of 90 by 90 kilometers.

This operation could take a few minutes.

eco_signature = lsp_signature(eco_data,
                              type = "incove",
                              window = 300)

The output, eco_signature contains numerical representation for each 90 by 90 km area. Notice that it has 3,838 rows (not 7,500) - this is due to removing areas with a large number of missing values before calculations¹.

Distance matrix

Next, we can calculate the distance (dissimilarity) between patterns of each area. This can be done with the lsp_to_dist() function, where we must provide the output of lsp_signature() and a distance measure used (dist_fun = "jensen-shannon"). This operation also could take a few minutes.

eco_dist = lsp_to_dist(eco_signature, dist_fun = "jensen-shannon")

The output, eco_dist, is of a dist class, where small values show that two areas have a similar joint spatial pattern of land cover categories and landform classes.

class(eco_dist)

[1] "dist"

Hierarchical clustering

Objects of class dist can be used by many existing R functions for clustering. It includes different approaches of hierarchical clustering (hclust(), cluster::agnes(), cluster::diana()) or fuzzy clustering (cluster::fanny()). In the below example, we use hierarchical clustering using hclust(), which expects a distance matrix as the first argument and a linkage method as the second one. Here, we use the Ward’s minimum variance method (method = "ward.D2") that minimizes the total within-cluster variance.

eco_hclust = hclust(eco_dist, method = "ward.D2")
plot(eco_hclust)

Graphical representation of the hierarchical clustering is called a dendrogram, and based on the obtained dendrogram, we can divide our local landscapes into a specified number of groups using cutree(). In this example, we use eight classes (k = 8) to create a fairly small number of clusters to showcase the presented methodology.

clusters = cutree(eco_hclust, k = 8)

However, a decision about the number of clusters in real-life cases should be based on the goal of the research.

Clustering results

The lsp_add_clusters function adds: a column clust with a cluster number to each area, and converts the result to an sf object.

eco_grid_sf = lsp_add_clusters(eco_signature,
                               clusters)

The clustering results can be further visualized using tmap.

tm_clu = tm_shape(eco_grid_sf) +
        tm_polygons("clust", style = "cat", palette = "Set2", title = "Cluster:") +
        tm_layout(legend.position = c("LEFT", "BOTTOM"))
tm_clu

Most clusters form continuous regions, so we could merge areas of the same clusters into larger polygons.

eco_grid_sf2 = eco_grid_sf %>%
        dplyr::group_by(clust) %>%
        dplyr::summarize()

The output polygons can then be superimposed on maps of land cover categories and landform classes.

tm_shape(eco_data) +
                tm_raster(style = "cat",
                          palette = list(lc_palette_df$color, lf_palette_df$color)) +
  tm_facets(ncol = 2) +
  tm_shape(eco_grid_sf2) +
  tm_borders(col = "black") +
  tm_layout(legend.show = FALSE, 
            title.position = c("LEFT", "TOP"))

We can see that many borders (black lines) contain areas with both land cover or landform patterns distinct from their neighbors. Some clusters are also only distinct for one variable (e.g., look at Sahara on the land cover map).

Clustering quality

We can also calculate the quality of the clusters with the lsp_add_quality() function. It requires an output of lsp_add_clusters() and an output of lsp_to_dist(), and adds three new variables: inhomogeneity, distinction, and quality.

eco_grid_sfq = lsp_add_quality(eco_grid_sf, eco_dist, type = "cluster")

Inhomogeneity (inhomogeneity) measures a degree of mutual distance between all objects in a cluster. This value is between 0 and 1, where the small value indicates that all objects in the cluster represent consistent patterns, so the cluster is pattern-homogeneous. Distinction (distinction) is an average distance between the focus cluster and all the other clusters. This value is between 0 and 1, where the large value indicates that the cluster stands out from the rest of the clusters. Overall quality (quality) is calculated as 1 - (inhomogeneity / distinction). This value is also between 0 and 1, where increased values indicate better quality of clustering.

We can create a summary of each clusters’ quality using the code below.

eco_grid_sfq2 = eco_grid_sfq %>%
        group_by(clust) %>%
        summarise(inhomogeneity = mean(inhomogeneity),
                  distinction = mean(distinction),
                  quality = mean(quality))

clust	inhomogeneity	distinction	quality
1	0.5064706	0.7724361	0.3443204
2	0.4038704	0.7023297	0.4249561
3	0.3377875	0.7065250	0.5219029
4	0.1161293	0.7921515	0.8534002
5	0.3043422	0.7366735	0.5868696
6	0.2774136	0.6849140	0.5949657
7	0.2926504	0.7149212	0.5906537
8	0.3486704	0.7579511	0.5399830

The created clusters show a different degree of quality metrics. The fourth cluster has the lowest inhomogeneity and the largest distinction, and therefore the best quality. The first cluster has the most inhomogeneous patterns, and while its distinction from other clusters is relatively large, its overall quality is the worst.

tm_inh = tm_shape(eco_grid_sfq2) +
        tm_polygons("inhomogeneity", style = "cont", palette = "magma")

tm_iso = tm_shape(eco_grid_sfq2) +
        tm_polygons("distinction", style = "cont", palette = "-inferno")

tm_qua = tm_shape(eco_grid_sfq2) +
        tm_polygons("quality", style = "cont", palette = "Greens")

tm_cluster3 = tmap_arrange(tm_clu, tm_qua, tm_inh, tm_iso, ncol = 2)
tm_cluster3

Understanding clusters

Inhomogeneity can also be assessed visually with a pattern mosaic. Pattern mosaic is an artificial rearrangement of a subset of randomly selected areas belonging to a given cluster.

Using the code below, we randomly selected 100 areas for each cluster. It could take a few minutes.

eco_grid_sample = eco_grid_sf %>% 
  filter(na_prop == 0) %>% 
  group_by(clust) %>% 
  slice_sample(n = 100)

Next, we can extract a raster for each selected area with the lsp_add_examples() function.

eco_grid_examples = lsp_add_examples(eco_grid_sample, eco_data)

Finally, we can use the lsp_mosaic() function, which creates raster mosaics by rearranging spatial data for sample areas. Note that this function is still experimental and can change in the future.

eco_mosaic = lsp_mosaic(eco_grid_examples)

The output is a stars object with the third dimension (clust) representing clusters, from which we can use slice() to extract a raster mosaic for a selected cluster. For example, the raster mosaic for fourth cluster looks like this:

eco_mosaic_c4 = slice(eco_mosaic, clust, 4)

tm_shape(eco_mosaic_c4) +
  tm_raster(style = "cat",
            palette = list(lc_palette_df$color, lf_palette_df$color)) +
  tm_facets(ncol = 2) +
  tm_layout(legend.show = FALSE)

We can see that the land cover patterns for this cluster are very simple and homogeneous. The landform patterns are slightly more complex and less homogeneous.

And the raster mosaic for first cluster is:

eco_mosaic_c1 = slice(eco_mosaic, clust, 1)

tm_shape(eco_mosaic_c1) +
  tm_raster(style = "cat",
            palette = list(lc_palette_df$color, lf_palette_df$color)) +
  tm_facets(ncol = 2) +
  tm_layout(legend.show = FALSE)

Patterns of both variables in this cluster are more complex and heterogeneous. This result could suggest that additional clusters could be necessary to distinguish some spatial patterns.

Summary

The pattern-based clustering allows for grouping areas with similar spatial patterns. The above example shows the search based on two-variable raster data (land cover and landform), but by using a different spatial signature, it can be performed on a single variable raster as well. R code for the pattern-based clustering can be found here, with other examples described in the Spatial patterns’ clustering vignette.

Footnotes

See the threshold argument for more details.↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2021,
  author = {Nowosad, Jakub},
  title = {Clustering Similar Spatial Patterns},
  date = {2021-03-03},
  url = {https://jakubnowosad.com/posts/2021-03-03-motif-bp5/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2021. “Clustering Similar Spatial Patterns.” March 3, 2021. https://jakubnowosad.com/posts/2021-03-03-motif-bp5/.

Quantifying changes of spatial patterns

Jakub Nowosad — Wed, 24 Feb 2021 00:00:00 GMT

TLTR:

Quantifying changes of spatial patterns requires two datasets for the same variable in the same area. Both datasets are divided into many sub-areas, and spatial signatures are derived for each sub-area for each dataset. Next, distances for each pair of areas are calculated. Sub-areas with the largest distances represent the largest change.

To reproduce the calculations in the following post, you need to download all of relevant datasets using the code below:

library(osfr)
dir.create("data")
osf_retrieve_node("xykzv") %>%
        osf_ls_files(n_max = Inf) %>%
        osf_download(path = "data",
                     conflicts = "overwrite")

You should also attach the following packages:

library(stars)
library(motif)
library(tmap)
library(readr)

Spatial patterns changes

A standard approach for detecting changes between two rasters is to calculate a change for each cell independently. This allows quantifying how cells changed their values for A to B, and how many from B to A. However, this approach does not tell us if the spatial pattern actually had changed or stayed the same. For example, consider a regular checkerboard and a checkerboard with all colors reversed. While every cell changed its value, we still have the classes, and their spatial arrangement is the same.

Here, we are interested in changes of spatial patterns, therefore, instead of looking at pixel-by-pixel change, we focus on pattern-by-pattern change ¹.

Land cover in the Amazon

The data/lc_am_1992.tif contains land cover data for the year 1992 and data/lc_am_2018.tif for the year 2018. Both are single categorical rasters of the same extent, the Amazon, and the same resolution - 300 meters that can be read into R using the read_stars() function.

lc92 = read_stars("data/lc_am_1992.tif")
lc18 = read_stars("data/lc_am_2018.tif")

Additionally, the data/lc_palette.csv file contains information about the colors and labels of each land cover category.

lc_palette_df = read_csv("data/lc_palette.csv")
names(lc_palette_df$color) = lc_palette_df$value

Both land cover dataset can be visualized with tmap. The lc_palette_df is used to set a color palette and legend’s labels.

tm_compare1 = tm_shape(c(lc92, lc18)) +
        tm_raster(style = "cat",
                  palette = lc_palette_df$color,
                  labels = lc_palette_df$label,
                  title = "Land cover:") +
        tm_layout(legend.outside = TRUE,
                  panel.labels = c(1992, 2018))
tm_compare1

The above map clearly shows that there has been a large land cover change in many areas of Amazon between 1992 and 2018. The problem now is to find out what areas changed the most.

Comparing spatial patterns

This could be solved with lsp_compare(). The lsp_compare() function expects two stars objects with the same extent and resolution. We also need to specify the spatial scale of comparison (window), signature (type), and distance method (dist_fun)².

In this example, we use a window of 300 cells by 300 cells (window = 300). This means that our search scale will be 90 km (300 cells x data resolution) - resulting in dividing the whole area into about 1,500 regular rectangles of 90 by 90 kilometers. We also use the "cove" signature and the "jensen-shannon" distance here.

lc_am_compare = lsp_compare(lc92, lc18, 
                            window = 300, 
                            type = "cove",
                            dist_fun = "jensen-shannon")

Comparing results

By default, the output is a stars object with four attributes: (1) id - an id of each window, (2) na_prop_x - share between 0 and 1 of NA cells for each window in the first stars object, (3) na_prop_y - share between 0 and 1 of NA cells for each window in the second stars object, (4) dist - derived distance between the pattern in the first object and the second object for each window.

lc_am_compare

stars object with 2 dimensions and 4 attributes
attribute(s):
           Min.      1st Qu.       Median         Mean      3rd Qu.
id            1 3.637500e+02 7.265000e+02 726.50000000 1.089250e+03
na_prop_x     0 0.000000e+00 0.000000e+00   0.02093060 0.000000e+00
na_prop_y     0 0.000000e+00 0.000000e+00   0.02112938 0.000000e+00
dist          0 1.504208e-03 3.550033e-03   0.01713996 1.100637e-02
                   Max. NA's
id         1452.0000000    0
na_prop_x     0.4933778  620
na_prop_y     0.4933778  620
dist          0.2469178  620
dimension(s):
  from to   offset  delta                       refsys point values x/y
x    1 44 -8834600  90000 Interrupted_Goode_Homolosine    NA   NULL [x]
y    1 33   964250 -90000 Interrupted_Goode_Homolosine    NA   NULL [y]

We can visualize the result the same as a regular stars object, for example using the tmap package:

tm_compare2 = tm_shape(lc_am_compare) +
        tm_raster("dist", 
                  palette = "viridis", 
                  style = "cont",
                  title = "Distance (JSD):") +
        tm_layout(legend.outside = TRUE)
tm_compare2

The yellow color represents areas of the largest change. They are mostly located in the south and south-east part of the Amazon.

A comparison result can also be easily converted into an sf object with st_as_sf() for subsetting and analyzing the outcomes.

lc_am_compare_sf = st_as_sf(lc_am_compare)

Areas of the largest change in the pattern

In the previous blog post, we were interested in finding the most similar areas to the query region - smallest distance. Here, we are looking for the areas with the largest change, which is expressed by the largest dist values.

We can use slice_max() to subset the obtained result to a selected number of areas with the largest change between 1992 and 2018. The code below selects nine areas with the largest distance between the spatial pattern in 1992 and 2018.

library(dplyr)
lc_am_compare_sel = slice_max(lc_am_compare_sf, dist, n = 9)

If we want to look closer at the result, then we can extract each of the above regions with the lsp_add_examples() function. It adds a region column with a stars object to each row.

lc_am_compare_ex = lsp_add_examples(x = lc_am_compare_sel, y = c(lc92, lc18))

It allows us to visualize area with the largest change:

tm_shape(lc_am_compare_ex$region[[1]]) + 
  tm_raster(style = "cat",
                          palette = lc_palette_df$color,
                          labels = lc_palette_df$label,
                          title = "Land cover:") +
  tm_layout(legend.show = FALSE,
            panel.labels = c(1992, 2018))

Here, we can see an area mostly covered by forest in 1992, which large parts were transformed into agriculture before 2018.

This approach can also be extended to plot all nine areas. We just need to create a visualization function (create_map2()) and use it iteratively on each region in lc_am_compare_ex. The output of this process, map_list, is a list of tmaps that can be plotted with tmap_arrange():

library(purrr)
create_map2 = function(x){
        tm_shape(x) +
                tm_raster(style = "cat",
                          palette = lc_palette_df$color,
                          labels = lc_palette_df$label,
                          title = "Land cover:") +
                tm_facets(ncol = 2) +
                tm_layout(legend.show = FALSE,
                          panel.labels = c(1992, 2018))
}
map_list = map(lc_am_compare_ex$region, create_map2)
tmap_arrange(map_list)

It shows that majority of changes in the Amazon are related to the forest being removed for agricultural purposes.

Summary

The pattern-based comparison allows for finding areas with the largest change in spatial patterns. The above example shows the search based on a single variable raster data (land cover), but by using a different spatial signature, it can be performed on rasters with two or more variables (think of multi-variable change). R code for the pattern-based comparison can be found here, with other examples described in the Spatial patterns’ comparision vignette.

Footnotes

For a more detailed explanation of spatial patterns’ changes, visit my older blog post.↩︎
If you want more explanation about these arguments, please read the previous posts in this series.↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2021,
  author = {Nowosad, Jakub},
  title = {Quantifying Changes of Spatial Patterns},
  date = {2021-02-24},
  url = {https://jakubnowosad.com/posts/2021-02-24-motif-bp4/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2021. “Quantifying Changes of Spatial Patterns.” February 24, 2021. https://jakubnowosad.com/posts/2021-02-24-motif-bp4/.

Finding similar spatial patterns

Jakub Nowosad — Wed, 17 Feb 2021 00:00:00 GMT

TLTR:

Finding similar spatial patterns requires data for a query region and a search space. Spatial signatures are derived for the query region and many sub-areas of the search space, and distances between them are calculated. Sub-areas with the smallest distances from the query region are the most similar to it.

To reproduce the calculations in the following post, you need to download all of relevant datasets using the code below:

library(osfr)
dir.create("data")
osf_retrieve_node("xykzv") %>%
        osf_ls_files(n_max = Inf) %>%
        osf_download(path = "data",
                     conflicts = "overwrite")

You should also attach the following packages:

library(sf)
library(stars)
library(tmap)

Suwalski Landscape Park

Spatial pattern search allows for quantifying similarity between the query region and the search space and finally finding regions that are the most similar to the query one. Here, we were interested in finding areas of similar topography to the area of Suwalski Landscape Park. Suwalski Landscape Park is a protected area in north-eastern Poland with a post-glacial landscape consisting of young morainic hills.

One possible approach to classify the topography of a given region is to use geomorphons. Geomorphons categorize cells in this area into one of ten forms: flat, summit, ridge, shoulder, spur, slope, hollow, footslope, valley, and depression ¹.

The "data/geomorphons_pol.tif" file contains a raster with geomorphons calculated for Poland’s area, while "data/suw_lp.gpkg" is a vector polygon with the Suwalski Landscape Park borders. Let’s start by reading these two files into R.

gm = read_stars("data/geomorphons_pol.tif")
suw_lp = read_sf("data/suw_lp.gpkg")

Now, we can visualize geomorphons and the location of Suwalski Landscape Park with the tmap.

tm_gm = tm_shape(gm) +
        tm_raster(title = "Geomorphons:") +
        tm_shape(suw_lp) +
        tm_symbols(col = "black", shape = 6) +
        tm_layout(legend.outside = TRUE, frame = FALSE)
tm_gm

Search area

The geomorphon data for Poland is our search space. Now, we also need a second raster object with a query region. The query region is an area to which we want to find other similar areas.

There are two main ways to create a query region:

By cropping spatial data of a large area to the extent or borders of a query region.
By reading an external file. In the second case, the values in the external file should match the values in the search space.

Here, we are using the former approach by reading the Suwalski Landscape Park borders and then using it to crop the whole-country raster.

suw_lp = read_sf("data/suw_lp.gpkg")
gm_suw = st_crop(gm, suw_lp)

The query area has irregular spatial patterns represented by slopes and a limited number of flat areas.

tm_gm_suw = tm_shape(gm_suw) +
        tm_raster() +
        tm_shape(suw_lp) +
        tm_borders(col = "black") +
        tm_layout(legend.show = FALSE, frame = FALSE)
tm_gm_suw

Search process

The searching process consists of:

Selecting a query region and a search space. In our case, the query region is gm_sum, while the search space is the gm object.
Dividing the search space using regular (non-overlapping) squares or using polygons.
Creating numerical representation (called a signature) for the query region and the search space.
Comparing the signature of the query region with signatures for each part of the search space using a distance measure.

Search scale

The first important consideration is the search scale - what is the size of areas we want to find? This is not an easy question and largely depends on the research problem. The current version of motif accepts either regular (non-overlapping) squares or polygons.

Search signature

The second consideration is the search signature. We are able to describe the above area in words, however, how to translate the spatial pattern properties to a computer and, at the same time, made them more objective? Again, it is a complex question, and largely depends on the type of input data.

In our case, we have a single categorical raster, and for this type of data, we found out that the cove signature works well. Cove stands for co-occurrence vector - it is a 1D vector where each value represents what is the share of one category is adjacent to some other cells². More information about cove can be found in the previous blog post.

We can calculate cove for our query region using lsp_signature() with the type argument set to "cove".

library(motif)
gm_sum_sig = lsp_signature(gm_suw, type = "cove")
gm_sum_sig

# A tibble: 1 × 3
     id na_prop signature      
*              
1     1   0.381

The output object contains a signature column, which stores the cove signature for our region. We can see this signature with gm_sum_sig$signature[[1]].

Search distance

The third consideration is a distance measure. Many distance measures have been developed for different types of data, and each of them has different properties. The motif package allows using any distance measure implemented in the philentropy package, which includes more than 40 different measures³.

Searching

The lsp_search() function performs spatial pattern-based search. It expects two stars objects: a query region (gm_suw) and a search space (gm). Next, we need to specify the search scale (window), signature (type), and distance method (dist_fun).

In this example, we use a window of 100 cells by 100 cells (window = 100). This means that our search scale will be 2500 meters (100 cells x data resolution) - resulting in dividing the search space into about 70,000 regular rectangles of 2500 by 2500 meters. We also use the "cove" signature and the "jensen-shannon" distance here.

gm_search = lsp_search(gm_suw, gm,
                       window = 100,
                       type = "cove",
                       dist_fun = "jensen-shannon")

The above calculation could take several minutes on a modern computer.

Search results

By default, the output of the search is a stars object with three attributes:

id - an id of each window,
na_prop - share between 0 and 1 of NA cells for each window in the search space,
dist - derived distance between the query region and each window in the search space.

gm_search

stars object with 2 dimensions and 3 attributes
attribute(s):
                Min.      1st Qu.       Median         Mean      3rd Qu.
id       1.000000000 1.764075e+04 3.528050e+04 3.528050e+04 5.292025e+04
na_prop  0.000000000 0.000000e+00 0.000000e+00 3.111193e-03 0.000000e+00
dist     0.001737701 4.483980e-02 1.051218e-01 1.461936e-01 2.080158e-01
                 Max.  NA's
id       7.056000e+04     0
na_prop  4.997000e-01 20485
dist     6.451746e-01 20485
dimension(s):
  from  to  offset delta                       refsys point values x/y
x    1 288 4595300  2500 +proj=laea +lat_0=52 +lon...    NA   NULL [x]
y    1 245 3556600 -2500 +proj=laea +lat_0=52 +lon...    NA   NULL [y]

We can visualize the result in the same fashion as a regular stars object (see the final map at the end of the post):

tm_search2 = tm_shape(gm_search) +
          tm_raster("dist",
                  style = "log10", 
                  palette = "BrBG",
                  title = "Distance (JSD):",
                  legend.is.portrait = FALSE)

A search result can also be easily converted into an sf object with st_as_sf(). This allows for straightforward analysis and subsetting of the search results.

gm_search_sf = st_as_sf(gm_search)

The most similar areas

Spatial pattern-based search is similar to a search using internet search engines - we do not care about the most dissimilar areas. We just want to locate the ones most similar to the query region. Therefore, we should select only areas with the smallest distance values - this means that they are the most similar to the query region.

We can achieve it, for example, using the slide_min() function. The code below selects nine areas with the smallest distance from the query region.

library(dplyr)
gm_search_sel = slice_min(gm_search_sf, dist, n = 9)

If we want to look closer at the result, then we can extract each of the above regions with the lsp_add_examples() function. It adds a region column with a stars object to each row.

gm_search_ex = lsp_add_examples(x = gm_search_sel, y = gm)

It allows us to visualize any of the most similar areas.

tm_shape(gm_search_ex$region[[1]]) + 
  tm_raster() + 
  tm_layout(legend.show = FALSE)

This approach can also be extended to plot all nine of the most similar areas. We just need to create a visualization function (create_map()) and use it iteratively on each region in gm_search_ex. The output of this process, map_list, is a list of tmaps that can be plotted with tmap_arrange():

library(purrr)
create_map = function(x, y){
  tm_shape(x) + 
  tm_raster() + 
  tm_layout(legend.show = FALSE,
            title = y)
}
map_list = map2(gm_search_ex$region, gm_search_ex$id, create_map)
tmap_arrange(map_list)

Nine examples of the areas with the most similar patterns of geomorphons comparing to the Suwalski Landscape Park are presented above. They are also similar to each other, suggesting a high-quality result.

The final map consists of two parts: (a) a distance raster and (b) symbols representing the nine most similar areas.

tm_search2 +
  tm_shape(gm_search_sel) +
  tm_symbols(shape = 2, col = "black") +
  tm_text("id", auto.placement = TRUE)

The brown color on the above map represents areas with the most similar patterns of geomorphons to the Suwalski Landscape Park. The majority of similar areas are located in northern Poland and forms a belt with homogeneous topography.

Summary

The pattern-based search allows for finding areas with similar spatial patterns. The above example shows the search based on a single variable raster data (geomorphons), but by using a different spatial signature, it can be performed on rasters with two or more variables. Additionally, search space can be not only divided into regular areas, but also in irregular ones - see an example. R code for the pattern-based search can be found here.

Footnotes

Learn more about geomorphons by reading the dedicated paper or its preprint. You can also calculate geomorphons for your own data using a GRASS GIS module r.geomorphon.↩︎
In other words, it is a vector containing a normalized form of the co-occurrence matrix.↩︎
You can check all of them using philentropy::getDistMethods().↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2021,
  author = {Nowosad, Jakub},
  title = {Finding Similar Spatial Patterns},
  date = {2021-02-17},
  url = {https://jakubnowosad.com/posts/2021-02-17-motif-bp3/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2021. “Finding Similar Spatial Patterns.” February 17, 2021. https://jakubnowosad.com/posts/2021-02-17-motif-bp3/.

Describing categorical rasters with spatial signatures

Jakub Nowosad — Wed, 10 Feb 2021 00:00:00 GMT

TLTR:

Spatial signatures are multi-value representations of the patterns that compress information about spatial composition and configuration. Spatial signatures can be directly compared using various distance measures.

Describing categorical rasters

A categorical raster shown below represents land cover data for some area. This area is mainly covered by forest, with some small patches of agriculture, grasslands, and water.

If we want to describe this area, we could start by measuring areas of different land cover categories. Then, we could know that forest cover about 0.986% and agriculture cover about 0.013%. We could also use landscape metrics to put a number on some property of this raster. Then, we would know that the entropy is 0.116, and relative mutual information is 0.331¹.

This approach can be applied to many categorical rasters, as you can see below.

id	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16
ent	0.12	0.45	0.53	0.63	0.75	1.16	1.16	1.28	1.53	1.2	1.65	1.6	1.74	1.72	1.6	2.02
relmutinf	0.33	0.39	0.34	0.52	0.44	0.51	0.39	0.33	0.42	0.36	0.5	0.58	0.43	0.34	0.2	0.38
forest	0.99	0.93	0.9	0.84	0.83	0.76	0.69	0.68	0.59	0.57	0.53	0.5	0.44	0.4	0.39	0.36
agriculture	0.01	0.05	0.08	0.16	0.16	0.12	0.25	0.23	0.24	0.39	0.3	0.36	0.36	0.39	0.36	0.3

Now, each rasters’ spatial properties are expressed by a vector of numbers representing its categories and selected landscape metrics.

Basic spatial signature

As I mentioned in my previous blog posts, we could represent categorical rasters with a large number of landscape metrics. However, many landscape metrics are highly correlated, and some of them depend on the resolution of the input data and the size of the study area.

An alternative approach is to derive a multi-value representation of the raster that compress information about its spatial composition and configuration. One of such representations is a co-occurrence matrix (coma).

The coma representation is calculated by moving through each cell, looking at its value, and counting how many neighbors of each class our central cell has. For example, the co-occurrence matrix below shows that the forest category cells are 38,778 times adjacent to other cells of this category, 218 times to the cells of the agriculture category, and four times to the cells of the grassland category, and so on.

	agriculture	forest	grassland	water
agriculture	272	218	4	0
forest	218	38778	32	12
grassland	4	32	16	0
water	0	12	0	2

Importantly, this signature contains information about the categories and their shares (composition), and also the spatial relation between categories (configuration).

The co-occurrence matrix (coma) representation is two-dimensional, with values of categories in row and columns. It can be converted into a one-dimensional representation called a co-occurrence vector (cove).

272	218	4	0	218	38778	32	12	4	32	16	0	0	12	0	2

As you can see, some elements of this vector represent the same relations. For example, the first value of 4 shows the relation between grassland and agriculture, and the second value of 4 represents the relation between agriculture and grassland. We can simplify the above vector by counting all relations only once²:

136	218	19389	4	32	8	0	12	0	1

This vector can be further transformed to have its values to sum up to one. The output vector is called the normalized co-occurrence vector.

0.0069	0.011	0.9792	0.0002	0.0016	0.0004	0	0.0006	0	0.0001

The role of normalization is to create a probability function, and thus be able to compare categorical rasters of different sizes using mathematical distance measures.

Measuring similarity between patterns

Let’s consider two rasters below. We want to know how similar they are to each other.

To answer this question, we need to perform three steps:

calculate a normalized co-occurrence vector for the first raster,
calculate a normalized co-occurrence vector for the second raster,
calculate a numerical distance between these two signatures.

Normalized co-occurrence vector for the first raster is:

0.0069	0.011	0.9792	0.0002	0.0016	0.0004	0	0.0006	0	0.0001

Normalized co-occurrence vector for the second raster is:

0.1282	0.0609	0.8105	0.0002	0.0002	0.0001	0	0	0	0

A large number of possible distance measures between probability functions exists³. In this example, we use the Jenson-Shannon distance.

$$ JSD(A, B) = H(\frac{A + B}{2}) - \frac{1}{2}[H(A) + H(B)] $$

It takes two probability functions (spatial signatures in our case) A and B, and calculates entropy values (H). The Jenson-Shannon distance is a value between 0 and 1, where 0 means that two probability functions are identical, and 1 means that they have nothing in common.

Jensen-Shannon distance between our two rasters is 0.068, suggesting that their spatial composition and configuration are fairly similar, but not identical. Now, let’s consider two rasters that are visually very different. One is covered mostly by forest, while the second one is mostly a mosaic of forests, agricultural areas, and grasslands.

Normalized co-occurrence vector for the first raster is:

0.0069	0.011	0.9792	0.0002	0.0016	0.0004	0	0	0	0	0	0.0006	0	0	0.0001

Normalized co-occurrence vector for the second raster is:

0.2033	0.1335	0.2944	0.1747	0.0562	0.1307	0.0035	0.0002	0.0004	0.0015	0.0007	0.0005	0	0	0.0005

The Jensen-Shannon distance between this pair of rasters is 0.444, indicating that two rasters are fairly different⁴.

Calculating spatial signatures for many areas allows us to find the most similar rasters, describe changes between rasters, or group (cluster) rasters with similar spatial patterns.

Other spatial signatures

The co-occurrence matrix (coma) is suitable to represent pattern of a single categorical variable. There are, however, other spatial signatures aimed to describe spatial patterns of multi-variable cases.

A weighted co-occurrence matrix (wecoma) representation

Let’s consider a situation, in which we have two rasters: one with categories, and one with weights. So, now we have not only a category of each cell, but also its intensity.

Regular co-occurrence matrix (coma) based just on a categorical raster looks the following:

	1	2	3	4	5
1	0	4	5	0	1
2	4	1652	493	86	316
3	5	493	1148	38	509
4	0	86	38	6	14
5	1	316	509	14	818

It represents a spatial pattern of the categories, however, it completely omits the secondary information about the weight of each raster cell. To utilize the secondary information, a weighted co-occurrence matrix (wecoma) was developed. It is a modification of the co-occurrence matrix, in which each adjacency contributes to the output based on the values from the weight raster. The contributed value is calculated as the average of the weights in the two adjacent cells.

	1	2	3	4	5
1	0.00	7.08	15.42	0.00	2.18
2	7.08	3513.53	1723.24	92.45	923.97
3	15.42	1723.24	4524.03	113.75	2029.07
4	0.00	92.45	113.75	3.72	36.37
5	2.18	923.97	2029.07	36.37	1574.14

As you can see above, the weighted co-occurrence matrix differs from regular coma.

Similarly to the previous case, we can also convert wecoma into a one-dimensional normalized representation now called a weighted co-occurrence vector (wecove):

0	0.0007	0.1802	0.0016	0.1767	0.232	0	0.0095	0.0117	0.0002	0.0002	0.0948	0.2081	0.0037	0.0807

You can also see the weighted co-occurrence matrix (wecoma) concept, there described as an exposure matrix, in action in the vignettes of the raceland package.

An integrated co-occurrence matrix (incoma) representation

Another situation would be when we have two or more categorical raster variables. For example, let’s consider one raster with land cover categories and one with landform classes.

The question here is how to create a signature that incorporates spatial patterns of both land cover and landform data? The apparent solution would be to create a new raster with the joint-distribution of class labels. For example, if agriculture is represented as 1 in the first raster and flat plains are represented as 1 in the second raster, then a value of 101 would represent agriculture on a flat plain in a new raster. Next, we could just calculate a regular co-occurrence matrix. However, this approach is not recommended - by creating joint labels in this example data, we would end up with 84 categories, and therefore with a co-occurrence matrix of 84 by 84. Large signatures not only occupy more storage but also are harder to meaningfuly compare.

An alternative approach is to use an integrated co-occurrence matrix (incoma). It consists of co-occurrence matrices (coma) and co-located co-occurrence matrices (cocoma). In the co-occurrence matrix, we only use one raster and count adjacent categories of each cell. The co-located co-occurrence matrix, on the other hand, uses two rasters and counts neighbors in the second raster for each cell in the first raster.

The incoma representation for two rasters consists of four sectors (see an example below):

A co-occurrence matrix for the first raster.
A co-located co-occurrence matrix between the first raster and the second raster. It is between the first and third column and the third and fourth row.
A co-located co-occurrence matrix between the second and the first raster.
A co-occurrence matrix for the second raster.

Similar to the previous signatures, it is possible to convert incoma to its 1D normalized representation called an integrated co-occurrence vector (incove).

Summary

Spatial signatures allow to store compressed information about spatial patterns of many types of data. It includes a co-occurrence matrix (coma) for regular categorical rasters, a weighted co-occurrence matrix (wecoma) for categorical rasters with related intensity rasters, and an integrated co-occurrence matrix (incoma) for two or more categorical rasters. A spatial signature can be represented by 1D vectors and compared using a large number of distance measures.

To learn more how different spatial signatures can be calculated read the Types of spatial patterns’ signatures, A co-occurrence matrix (coma) representation, A weighted co-occurrence matrix (wecoma) representation, and An integrated co-occurrence matrix (incoma) representation vignettes.

Footnotes

See the Information theory provides a consistent framework for the analysis of spatial patterns blog post.↩︎
It also means dividing the diagonal by two.↩︎
Read https://users.uom.gr/~kouiruki/sung.pdf for a comprehensive review of distance measures.↩︎
Larger values of the Jensen-Shannon distance could occur when two rasters have different categories↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2021,
  author = {Nowosad, Jakub},
  title = {Describing Categorical Rasters with Spatial Signatures},
  date = {2021-02-10},
  url = {https://jakubnowosad.com/posts/2021-02-10-motif-bp2/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2021. “Describing Categorical Rasters with Spatial Signatures.” February 10, 2021. https://jakubnowosad.com/posts/2021-02-10-motif-bp2/.

Pattern-based spatial analysis: an approach for discovering, describing and studying geographical patterns

Jakub Nowosad — Thu, 04 Feb 2021 00:00:00 GMT

I gave the overview of what is the pattern-based spatial analysis and how it can be applied for the RGS-IBG GIScience Webinar Series. You can find the workshop abstract, slides, and recording below.

Abstract

Discovering and describing spatial patterns is an important element of many geographical studies with spatial patterns being related to ecological and sociological processes. While spatial patterns are often clearly visible on maps, it is not easy to unequivocally decide if two areas are much alike or delineate regions with similar patterns. In this talk, Jakub Nowosad will present a set of consistent ideas on how spatial patterns can be described and analyzed, with a focus on categorical raster data. The core idea is to divide raster data consisting of cells having simple content (a single value) into a large number of smaller areas, and then characterize each area using a statistical description of a pattern - a spatial signature. Spatial signatures are multi-values representations of spatial composition and configuration, and therefore can be compared using a large number of existing distance or dissimilarity measures. This enables spatial analysis such as search, change detection, clustering, and segmentation. During this talk, a number of real-life examples of finding similar spatial patterns, detecting changes over time, and grouping areas with homogeneous patterns for regional, continental, and global scales will be shown.

Slides

You can find the slides for the talk at https://nowosad.github.io/giscience-webinar-2021.

Recording

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2021,
  author = {Nowosad, Jakub},
  title = {Pattern-Based Spatial Analysis: An Approach for Discovering,
    Describing and Studying Geographical Patterns},
  date = {2021-02-04},
  url = {https://jakubnowosad.com/posts/2021-02-04-pattern-based-spatial-analysis/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2021. “Pattern-Based Spatial Analysis: An Approach for Discovering, Describing and Studying Geographical Patterns.” February 4, 2021. https://jakubnowosad.com/posts/2021-02-04-pattern-based-spatial-analysis/.

Pattern-based spatial analysis in R: an introduction

Jakub Nowosad — Wed, 03 Feb 2021 00:00:00 GMT

TLTR:

motif is an R package aimed for pattern-based spatial analysis. It allows for spatial analysis such as search, change detection, and clustering to be performed on spatial patterns. This blog post introduces basic ideas behind the pattern-based spatial analysis, and shows the types of problems to which it can be applied.

Spatial patterns

Discovering and describing patterns is a vital part of many spatial analysis. However, spatial data is gathered in many ways and forms, which requires different approaches to expressing spatial patterns. Other methods are applied when we work with numerical or categorical variables, also other methods are used to find patterns in point datasets, lines datasets, or raster datasets. Next, patterns and their relevance depend on a studied scale, with different patterns found on small or large scales, or data of different spatial resolutions. Finally, the way we describe patterns should depend on our main goal.

In this blog post, I only focus on a small subset of possible problems related to spatial patterns - I am only interested in categorical raster data. Categorical rasters, such as land cover maps, soil categories, or any other categorized images, express spatial patterns by two inter-related properties: composition and configuration. Composition shows how many different categories we have, and how much area they occupy, while configuration focuses on the spatial arrangement of the categories.

Landscape metrics

Spatial patterns in categorical raster data are most often described by landscape metrics (landscape indices). A landscape metric is a single numerical value expressing some property of a raster, such as diversity of categories or spatial aggregation of classes. In the last 40 or so years, several hundred of different spatial metrics were developed. They are widely used in the field of landscape ecology, but their application can be also found in some other distant fields, even such as clinical pathology (laboratory medicine).

The landscapemetrics package allows calculating various landscape metrics in R. It contains a simple categorical raster named landscape with three classes, which we can use to calculate some metrics. To learn more about these ideas and the landscapemetrics package visit https://r-spatialecology.github.io/landscapemetrics.

library(landscapemetrics)
library(raster)
plot(landscape)

For example, the lsm_l_shdi() function calculates Shannon’s diversity index, which shows how many categories we have and what are they abundance. It is 0 when only one patch is present and increases, without limit, as the number of classes increases, while their proportions are similar.

lsm_l_shdi(landscape)

# A tibble: 1 × 6
  layer level     class    id metric value
            
1     1 landscape    NA    NA shdi    1.01

The lsm_l_ai() function focuses on the configuration of spatial patterns by calculating the aggregation index. It equals to 0 for maximally disaggregated areas and 100 for maximally aggregated ones.

lsm_l_ai(landscape)

# A tibble: 1 × 6
  layer level     class    id metric value
            
1     1 landscape    NA    NA ai      81.1

Spatial signatures

The above examples show how we condensed some information about raster data to just one number. It can be useful in a multitude of cases when we want to connect some aspect of a spatial pattern to external processes. However, what do to, if our goal is to find areas with similar spatial patterns?

In theory, we could calculate landscape metrics for many areas and then search for those which have the most similar values to our area of interest. This approach, however, leaves us with a number of problems, including which landscape metrics to use. Many landscape metrics are highly correlated, and their interrelations are hard to interpret.

An alternative approach, in this case, is to use a spatial signature. A spatial signature is a multi-number description that compactly stores information about the composition and configuration of a spatial pattern. Therefore, instead of having just one number representing a raster, we have several numbers that condense information about this location. We can calculate spatial signatures for many rasters, which allows us to find the most similar rasters, describe changes between rasters, or group (cluster) rasters based on the spatial patterns.

motif

Search, change detection, and clustering of spatial patterns have been possible in GRASS GIS using the GeoPAT module or command-line tool GeoPAT 2. All of the above actions can also be now performed natively in R with the motif package. In a series of blog posts, I plan to show and explain several use cases. They include:

A. Finding areas similar to the area of interest

B. Comparing changes between two times

C. Clustering areas with similar patterns of more than one layer of data

If you do not want to wait for the next blog post, you can install the the motif package with:

install.packages("motif")

You can read more about it in the Landscape Ecology article or its preprint:

Nowosad, J. Motif: an open-source R tool for pattern-based spatial analysis. Landscape Ecol (2020). https://doi.org/10.1007/s10980-020-01135-0

You can also visit the package website at https://nowosad.github.io/motif and the GitHub repository with examples at https://github.com/Nowosad/motif-examples.

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2021,
  author = {Nowosad, Jakub},
  title = {Pattern-Based Spatial Analysis in {R:} An Introduction},
  date = {2021-02-03},
  url = {https://jakubnowosad.com/posts/2021-02-03-motif-bp1/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2021. “Pattern-Based Spatial Analysis in R: An Introduction.” February 3, 2021. https://jakubnowosad.com/posts/2021-02-03-motif-bp1/.

How to choose a bivariate color palette?

Jakub Nowosad — Tue, 25 Aug 2020 00:00:00 GMT

Bivariate color palettes are products of combining two separate color palettes. They are usually represented by a square with rows (one color palette) and columns (second color palette). You can more about how they are made in the blog post “Bivariate Choropleth Maps: A How-to Guide” by Joshua Stevens.

The main role of bivariate color palettes is to present the values of two variables simultaneously. For example, the map below uses a bivariate palette to represent both GDP per capita and life expectancy for countries in Africa.

The code to create this map is in the tmap issue tracker. Some other bivariate maps’ examples can be found in the “Bivarite Mapping with ggplot2” vignette and the “Bivariate maps with ggplot2 and sf” blog post.

The above map has one issue, though. As pointed out by Frederico R Ramos, it is not suitable for people with color vision deficiencies. They are not able to distinguish between some colors, and therefore, cannot understand the map correctly. Therefore, the main question is how to choose a proper bivariate color palette?

Bivariate palettes

The pals R package has a dozen or so bivariate color palettes.

library(pals)
bivcol = function(pal){
        tit = substitute(pal)
        pal = pal()
        ncol = length(pal)
        image(matrix(seq_along(pal), nrow = sqrt(ncol)),
              axes = FALSE, 
              col = pal, 
              asp = 1)
        mtext(tit)
}

Twelve of these palettes are presented below.

par(mfrow = c(3, 4), mar = c(1, 1, 2, 1))
bivcol(arc.bluepink)
bivcol(brewer.divdiv)
bivcol(brewer.divseq)
bivcol(brewer.qualseq)
bivcol(brewer.seqseq1)
bivcol(brewer.seqseq2)
bivcol(census.blueyellow)
bivcol(stevens.bluered)
bivcol(stevens.greenblue)
bivcol(stevens.pinkblue)
bivcol(stevens.pinkgreen)
bivcol(stevens.purplegold)

Palettes’ properties

Now, we can use the colorblindcheck package to decide if the selected color palette is colorblind-friendly or not.

# remotes::install_github("nowosad/colorblindcheck")
library(colorblindcheck)

The main function in this package is palette_check(), which creates summary statistics comparing the original input palette and simulations of three main color vision deficiencies. Let’s use it on two color palettes: arc.bluepink() and brewer.seqseq2().

colorblindcheck::palette_check(arc.bluepink(),
                               plot = TRUE, bivariate = TRUE)

          name  n tolerance ncp ndcp  min_dist mean_dist max_dist
1       normal 16  7.135562 120  120 7.1355623  27.72463 53.76783
2 deuteranopia 16  7.135562 120  100 0.3450842  19.79323 52.46731
3   protanopia 16  7.135562 120   96 0.0000000  20.08030 50.20137
4   tritanopia 16  7.135562 120  120 7.9914570  31.48801 71.57927

The visual inspection of arc.bluepink() suggests that this palette is not suitable for people with color vision deficiencies, namely deuteranopia and protanopia. In deuteranopia and protanopia simulations, it is almost impossible to distinguish some colors. This problem is also confirmed by the summary statistics, where the minimal distance between colors of the original palette is about 7, while it is only about 0.345 for deuteranopia and 0 (no difference at all) for protanopia.

colorblindcheck::palette_check(brewer.seqseq2(), 
                               plot = TRUE, bivariate = TRUE)

          name n tolerance ncp ndcp min_dist mean_dist max_dist
1       normal 9  13.21133  36   36 13.21133  39.99288 94.59810
2 deuteranopia 9  13.21133  36   34 10.99234  40.33172 94.22020
3   protanopia 9  13.21133  36   34 10.53062  38.99158 94.59810
4   tritanopia 9  13.21133  36   36 13.66888  39.60803 94.48661

On the other hand, the inspection of brewer.seqseq2() indicate that it is possible to differentiate between all of the colors in this palette based on the original colors and simulations of color vision deficiencies. You can see more examples of colorblindcheck in action at https://nowosad.github.io/colorblindcheck.

Colorblind-friendly palettes

Using the above function, I tested all of the bivariate color palettes from pals. I visualized all of the palettes and decided to keep only the ones for which the minimal distance between colors was above 6.

It allowed to distinguish four palettes - brewer.divseq, brewer.seqseq2, stevens.greenblue, and stevens.purplegold. You can see the comparison between them and simulations of color vision deficiencies below.

colorblindcheck::palette_check(brewer.divseq(), 
                               plot = TRUE, bivariate = TRUE)

          name n tolerance ncp ndcp min_dist mean_dist max_dist
1       normal 9  9.237516  36   36 9.237516  38.32933 87.90123
2 deuteranopia 9  9.237516  36   36 9.267188  39.85751 90.88415
3   protanopia 9  9.237516  36   36 9.237516  40.79861 86.08385
4   tritanopia 9  9.237516  36   35 6.777558  32.82160 83.10774

colorblindcheck::palette_check(brewer.seqseq2(),
                               plot = TRUE, bivariate = TRUE)

          name n tolerance ncp ndcp min_dist mean_dist max_dist
1       normal 9  13.21133  36   36 13.21133  39.99288 94.59810
2 deuteranopia 9  13.21133  36   34 10.99234  40.33172 94.22020
3   protanopia 9  13.21133  36   34 10.53062  38.99158 94.59810
4   tritanopia 9  13.21133  36   36 13.66888  39.60803 94.48661

colorblindcheck::palette_check(stevens.greenblue(),
                               plot = TRUE, bivariate = TRUE)

          name n tolerance ncp ndcp min_dist mean_dist max_dist
1       normal 9   9.29651  36   36 9.296510  26.34666 50.19184
2 deuteranopia 9   9.29651  36   33 7.238684  24.60856 51.19105
3   protanopia 9   9.29651  36   35 7.693015  24.51814 47.10098
4   tritanopia 9   9.29651  36   29 6.154169  20.06474 50.20386

colorblindcheck::palette_check(stevens.purplegold(),
                               plot = TRUE, bivariate = TRUE)

          name n tolerance ncp ndcp min_dist mean_dist max_dist
1       normal 9  11.97625  36   36 11.97625  30.13646 53.56032
2 deuteranopia 9  11.97625  36   35 10.57857  27.58839 46.59557
3   protanopia 9  11.97625  36   34 11.48625  29.32017 50.36899
4   tritanopia 9  11.97625  36   28  6.31650  20.96426 49.27898

Summary

Four palettes from the pals package, brewer.divseq, brewer.seqseq2, stevens.greenblue, and stevens.purplegold seem to be the most adequate to use for bivariate visualizations.

All of them are suitable for people with color deficiencies. It is important to note that brewer.divseq is made of a sequential (from bottom to top) and a diverging (from left to right) palette. Therefore its use should be limited only to some subset of applications, when we want to present one variable going from high to low (or vice versa) and one variable that has values around a central neutral point. brewer.seqseq2, stevens.greenblue, and stevens.purplegold, on the other hand, consists of a mix of two sequential palettes and, thus, should be used to present two variables with values going from high to low (or vice versa).

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2020,
  author = {Nowosad, Jakub},
  title = {How to Choose a Bivariate Color Palette?},
  date = {2020-08-25},
  url = {https://jakubnowosad.com/posts/2020-08-25-cbc-bp2/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2020. “How to Choose a Bivariate Color Palette?” August 25, 2020. https://jakubnowosad.com/posts/2020-08-25-cbc-bp2/.

How to measure spatial diversity and segregation?

Jakub Nowosad — Mon, 03 Aug 2020 00:00:00 GMT

The raceland package implements a computational framework for a pattern-based, zoneless analysis and visualization of (ethno)racial topography.

The main concept in this package is a racial landscape (RL). It consists of many large and small patches (racial enclaves) formed by adjacent raster grid cells having the same race categories. The distribution of racial enclaves creates a specific spatial pattern, which can be quantified by two metrics (entropy and mutual information) derived from the Information Theory concept (IT). Entropy is the measure of racial diversity and mutual information measures racial segregation.

Methods in the raceland package are based on the raster data, and unlike the previous methods, do not depend on the division for specific zones (census tract, census block, etc.). Calculation of racial diversity (entropy) and racial segregation (mutual information) can be performed for the whole area of interests (i.e., metropolitan area) or any portion of the whole area without introducing any arbitrary divisions.

To learn more about this topic, read our Applied Geography article or its preprint:

Dmowska, A., Stepinski T., Nowosad J. Racial landscapes – a pattern-based, zoneless method for analysis and visualization of racial topography. Applied Geography. 122:1-9, DOI:10.1016/j.apgeog.2020.102239

Example calculations

To reproduce the results on your own computer, install and attach the following packages:

library(raceland)
library(raster)
library(sf)
library(tmap)
library(dplyr)

You also need to download and extract the data.zip file containing the example data.

temp_data_file = tempfile(fileext = ".zip")
download.file("https://github.com/Nowosad/raceland-bp1/raw/master/data.zip",
              destfile = temp_data_file,
              mode = "wb")
unzip(temp_data_file)

Input data

The presented approach requires a set of rasters, where each raster represents one of five race-groups: Asians, Blacks, Hispanic, others, and Whites. In this example, we use data limited to the city of Cincinnati, Ohio.

list_raster = dir("data", pattern = ".tif$", full.names = TRUE)
race_raster = stack(list_raster)

We also use vector data containing the city borders to ease the understanding of the results.

cincinnati = read_sf("data/cincinnati.gpkg")

We can visualize the data using the tmap package:

tm_race = tm_shape(race_raster) +
    tm_raster(style = "fisher",
              n = 10,
              palette = "viridis",
              title = "Number of people") +
    tm_facets(nrow = 3) +
    tm_shape(cincinnati) +
    tm_borders(lwd = 3, col = "black")
tm_race

The above maps show the distribution of people from different race-groups in Cincinnati. Each, 30 by 30 meters, cell represents a number of people living in this area. Data was obtained from https://www.socscape.edu.pl/ and preprocessed using the instructions at https://cran.r-project.org/web/packages/raceland/vignettes/raceland-intro3.html.

Basic example

Our goal is to measure racial diversity and racial segregation for different places in the city. We can use the quantify_raceland() function for this purpose.

results_metrics = quantify_raceland(race_raster,
                                    n = 30,
                                    window_size = 10, 
                                    fun = "mean",
                                    size = 20,
                                    threshold = 0.75) 
head(results_metrics)

Simple feature collection with 6 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 978285 ymin: 1858035 xmax: 984885 ymax: 1859235
CRS:           PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["Unknown_based_on_GRS80_ellipsoid",
            ELLIPSOID["GRS 1980",6378137,298.257222101004,
                LENGTHUNIT["metre",1],
                ID["EPSG",7019]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]],
    CONVERSION["Albers Equal Area",
        METHOD["Albers Equal Area",
            ID["EPSG",9822]],
        PARAMETER["Latitude of false origin",23,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-96,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",29.5,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",45.5,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["easting",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["northing",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
   row col       ent     mutinf                       geometry
30   1  30 1.0928666 0.01224849 POLYGON ((981885 1859235, 9...
31   1  31 1.2931567 0.03783377 POLYGON ((982485 1859235, 9...
33   1  33 1.1044932 0.01699973 POLYGON ((983685 1859235, 9...
34   1  34 1.6397933 0.05802896 POLYGON ((984285 1859235, 9...
74   2  24 0.9367936 0.01367241 POLYGON ((978285 1858635, 9...
80   2  30 1.4189221 0.04388582 POLYGON ((981885 1858635, 9...

It requires several arguments:

x - RasterStack with race-specific population densities assign to each cell
n - a number of realizations
window_size - expressed in the numbers of cells, is a length of the side of a square-shaped block of cells for which local densities will be calculated
fun - function to calculate values from adjacent cells to contribute to exposure matrix, "mean" - calculate average values of local population densities from adjacent cells, "geometric_mean" - calculate geometric mean values of local population densities from adjacent cells, or "focal" assign value from the focal cell
size - expressed in the numbers of cells, is a length of the side of a square-shaped block of cells. It defines the extent of a local pattern
threshold - the share of NA cells to allow metrics calculation

The result is a spatial vector object containing areas of the size of 20 by 20 cells from input data (600 by 600 meters in this example). Its attribute table has five columns - row and col allowing for identification of each square polygon, ent - entropy measuring racial diversity, mutinf - mutual information, which is associated with measuring racial segregation, and geometry containing spatial geometries.

diversity_map = tm_shape(results_metrics) +
    tm_polygons(col = "ent",
                title = "Diversity",
                style = "cont",
                palette = "magma") +
    tm_shape(cincinnati) +
    tm_borders(lwd = 1, col = "black")
segregation_map = tm_shape(results_metrics) +
    tm_polygons(col = "mutinf",
                title = "Segregation",
                style = "cont", 
                palette = "cividis") +
    tm_shape(cincinnati) +
    tm_borders(lwd = 1, col = "black")
tmap_arrange(diversity_map, segregation_map)

The above result present areas with different levels of racial diversity and segregation. Interestingly, there is a low correlation between these two properties. Some areas inside of the city do not have any value attached - this indicates either they are covered with missing values in more than 75% of their areas or nobody lives there.

Extended example

The quantify_raceland() function is a wrapper around several steps implemented in raceland, namely create_realizations(), create_densities(), calculate_metrics(), and create_grid(). All of them can be used sequentially, as you can see below.

Additionally, the raceland package has zones_to_raster() function that prepares input data based on spatial vector data with race counts.

Constructing racial landscapes

The racial landscape is a high-resolution grid in which each cell contains only inhabitants of a single race. It is constructed using the create_realizations() function, which expects a stack of race-specific rasters. Racial composition at each cell is translated into probabilities of drawing a person of a specific race from a cell. For example, if a cell has 100 people, where 90 are classified as Black (90% chance) and 10 as White (10% chance), then we can assign a specific race randomly based on these probabilities.

This approach generates a specified number (n = 30, in this case) of realization with slightly different patterns.

realizations_raster = create_realizations(race_raster, n = 30)

The output of this function is a RasterStack, where each raster contains values from 1 to k, where k is a number of provided race-specific grids. In this case, we provided five race-specific grids (Asians, Blacks, Hispanic, others, and Whites), therefore the value of 1 in the output object represents Asians, number 2 Blacks, etc.

my_pal = c("#F16667", "#6EBE44", "#7E69AF", "#C77213", "#F8DF1D")
tm_realizations = tm_shape(realizations_raster[[1:4]]) +
    tm_raster(style = "cat",
              palette = my_pal,
              labels = c("Asians", "Blacks", "Hispanic", "others", "Whites"),
              title = "") +
    tm_facets(ncol = 2) +
    tm_shape(cincinnati) +
    tm_borders(lwd = 3, col = "black") +
    tm_layout(panel.labels = paste("Realization", 1:30))
tm_realizations

The above plot shows four of 30 created realizations and makes it clear that all of them are fairly similar.

Local densities

Now, for each of the created realization, we can calculate local densities of subpopulations (race-specific local densities) using the create_densities() function.

dens_raster = create_densities(realizations_raster,
                               race_raster,
                               window_size = 10)

The output is a RasterStack with local densities calculated separately for each realization.

tm_density = tm_shape(dens_raster[[1:4]]) +
    tm_raster(style = "fisher",
              n = 10,
              palette = "viridis",
              title = "Number of people") +
    tm_facets(ncol = 2) +
    tm_shape(cincinnati) +
    tm_borders(lwd = 3, col = "black") +
    tm_layout(panel.labels = paste("Realization", 1:30))
tm_density

Total diversity and segregation

We can use both, realizations and density rasters, to calculate racial diversity and segregation using calculate_metrics() function. It calculates four information theory-derived metrics: entropy (ent), joint entropy (joinent), conditional entropy (condent), and mutual information (mutinf). As we mentioned before, ent is measuring racial diversity, while mutinf is associated with racial segregation. These metrics can be calculated for a given spatial scale. For example, setting size to NULL, as in the example below, calculates the metrics for the whole area of each realization.

metr_df = calculate_metrics(x = realizations_raster, 
                            w = dens_raster, 
                            fun = "mean", 
                            size = NULL, 
                            threshold = 1)
head(metr_df)

  realization row col      ent  joinent  condent    mutinf
1           1   1   1 1.394937 2.617351 1.222414 0.1725237
2           2   1   1 1.395346 2.618786 1.223440 0.1719062
3           3   1   1 1.397898 2.621963 1.224065 0.1738325
4           4   1   1 1.397268 2.622896 1.225628 0.1716395
5           5   1   1 1.396112 2.617471 1.221359 0.1747527
6           6   1   1 1.398692 2.624289 1.225597 0.1730954

Now, we can calculate average metrics across all realization, which should give more accurate results.

metr_df %>% 
  summarise(
    mean_ent = mean(ent, na.rm = TRUE),
    mean_mutinf = mean(mutinf)
  )

  mean_ent mean_mutinf
1   1.3984   0.1740535

These values could be compared with values obtained by other US cities to evaluate, which cities have high average racial diversity (larger values of mean_ent) and which have high average racial segregation (larger values of mean_mutinf).

Local diversity and segregation

The information theory-derived metrics can be also calculated for smaller, local scales using the size argument. It describes the size of a local area for metrics calculations. For example, size = 20 indicates that each local area will consist of 20 by 20 cells of the original raster.

metr_df_20 = calculate_metrics(x = realizations_raster,
                               w = dens_raster, 
                               fun = "mean", 
                               size = 20, 
                               threshold = 0.75)

Now, we can summarize the results for each local area independently (group_by(row, col)).

smr = metr_df_20 %>%
  group_by(row, col) %>%
  summarize(
    ent_mean = mean(ent, na.rm = TRUE),
    mutinf_mean = mean(mutinf, na.rm = TRUE),
  ) %>% 
  na.omit()
head(smr)

# A tibble: 6 × 4
# Groups:   row [2]
    row   col ent_mean mutinf_mean
              
1     1    30    1.08       0.0154
2     1    31    1.31       0.0356
3     1    33    1.12       0.0166
4     1    34    1.62       0.0552
5     2    24    0.951      0.0163
6     2    30    1.44       0.0494

Each row in the obtained results relates to some spatial locations. We can create an empty grid with appropriate dimensions using the create_grid() function. Its size argument expects the same value as used in the calculate_metrics() function.

grid_sf = create_grid(realizations_raster, size = 20)

The result is a spatial vector object with three columns: row and col allowing for identification of each square polygon, and geometry containing spatial geometries.

tm_shape(grid_sf) +
    tm_polygons()

The first two columns,row and col, can be used to connect the grid with summary results.

grid_attr = dplyr::left_join(grid_sf, smr, by = c("row", "col"))
grid_attr = na.omit(grid_attr)

Finally, we are able to create two maps. The first one represents racial diversity (larger the value, larger the diversity; the ent_mean variable) and the second one shows racial segregation (larger the value, larger the segregation; the ent_mean variable).

diversity_map = tm_shape(grid_attr) +
    tm_polygons(col = "ent_mean",
                title = "Diversity",
                style = "cont",
                palette = "magma") +
    tm_shape(cincinnati) +
    tm_borders(lwd = 3, col = "black")
segregation_map = tm_shape(grid_attr) +
    tm_polygons(col = "mutinf_mean",
                title = "Segregation",
                style = "cont", 
                palette = "cividis") +
    tm_shape(cincinnati) +
    tm_borders(lwd = 3, col = "black")
tmap_arrange(diversity_map, segregation_map)

Bonus: visualizing racial landscapes

While the realizations created few steps before represents race spatial distribution fairly well, they do not take the spatial variability of the population densities into consideration. Additional function plot_realization() displays a selected realization taking into account not only race spatial distribution, but also the population density.

plot_realization(x = realizations_raster[[2]],
                 y = race_raster,
                 hex = my_pal)

In its result, darker areas have larger populations, and brighter represent areas less-inhabited areas.

Summary

The raceland package implements a computational framework for a pattern-based, zoneless analysis and visualization of (ethno)racial topography. The most comprehensive description of the method can be found in the Racial landscapes – a pattern-based, zoneless method for analysis and visualization of racial topography article published in Applied Geography. Its preprint is available at https://osf.io/preprints/socarxiv/mejz5. Additionally, raceland has three extensive vignettes:

raceland: R package for a pattern-based, zoneless method for analysis and visualization of racial topography - introducing the package and its functions
raceland: Describing local racial patterns of racial landscapes at different spatial scales - showing how the calculations can be performed at different spatial scales
raceland: Describing local pattern of the racial landscape using SocScape grids - presenting how to use the raceland methods with SocScape race-specific grids to perform analysis for different spatial scales, using the Cook county as an example.

This approach is based on the concept of ‘landscape’ used in the domain of landscape ecology. To learn more about information theory metrics used in this approach you can read the Information theory as a consistent framework for quantification and classification of landscape patterns article published in Landscape Ecology.

The raceland package requires race-specific grids. They can be obtained in two main ways. The first one is to download prepared grids from the SocScape project. It provides high-resolution raster grids for 1990, 2000, 2010 years for 365 metropolitan areas and each county in the conterminous US. The second way is to rasterize a spatial vector file (e.g., an ESRI Shapefile) with an attribute table containing race counts for some areas using the zones_to_raster() function.

Finally, while the presented methods have been applied to race-specific raster grids, they can be also used for many other problems where it is important to determine spatial diversity and segregation.

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2020,
  author = {Nowosad, Jakub},
  title = {How to Measure Spatial Diversity and Segregation?},
  date = {2020-08-03},
  url = {https://jakubnowosad.com/posts/2020-08-03-raceland-bp1/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2020. “How to Measure Spatial Diversity and Segregation?” August 3, 2020. https://jakubnowosad.com/posts/2020-08-03-raceland-bp1/.

Introduction to landscape ecology with R

Jakub Nowosad — Tue, 26 May 2020 00:00:00 GMT

Maximilian H.K. Hesselbarth and I gave the Introduction to landscape ecology with R workshop during IALE-North America 2020 Annual Meeting. You can find the workshop abstract, slides, and recordings below.

Abstract

R is a free, open-source programming language created as an environment for statistical computing and visualization. The advantages of using R include its flexibility, ease of collaboration, and focus on reproducibility. Additionally, the concept of packages - collections of R functions, data, and compiled code created by users - allowed for the growth of its capabilities and expansion into many scientific fields. In recent years, R also has become one of the most often used tools in ecology.

R also has a long history of supporting spatial data analysis, including spatial data downloading, preprocessing, visualizing, and modeling. Recently, however, some new packages have appeared which have significantly changed the work with spatial data in R; in particular, the sf package.

The workshop is divided into two parts. The first one introduces participants to the spatial data analysis system in R. The focus is on getting started, with demonstrations of key packages, spatial analysis, and making maps. The second part of the workshop focuses on how to use the landscapemetrics package. This package is based on the main concepts from FRAGSTATS, but it is characterized by a number of advantages. These include, among others, removing existing metric implementation errors, adding new landscape metrics, enabling landscape visualization, and allowing for calculations on large input data. A particular advantage is also an ability to integrate this package with other packages for spatial analysis, so it is possible to download spatial data, process it, calculate landscape metrics and visualize them within one tool.

The workshop is a mixture of theoretical and practical. Pointers to further materials ensure that participants know where to get help and how to take confident next steps after the workshop.

Slides

You can find the slides for the talk at https://nowosad.github.io/whyr_webinar004/.

Recordings

Part I:

Part II:

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2020,
  author = {Nowosad, Jakub},
  title = {Introduction to Landscape Ecology with {R}},
  date = {2020-05-26},
  url = {https://jakubnowosad.com/posts/2020-05-26-intro-to-landscape-ecology/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2020. “Introduction to Landscape Ecology with R.” May 26, 2020. https://jakubnowosad.com/posts/2020-05-26-intro-to-landscape-ecology/.

How to calculate landscape metrics for local landscapes?

Jakub Nowosad — Thu, 23 Jan 2020 00:00:00 GMT

In the last few weeks, I was asked a similar question several times - how to calculate landscape metrics for local landscapes? In other words, how to divide the categorical input map into a number of smaller areas, and next calculate selected landscape metrics for each of the areas. Those areas have many names, such as tiles, squares, or motifels. The main goal of this post is to show how to calculate landscape metrics using the landscapemetrics R package.

To reproduce the results on your own computer, install and attach the following packages:

library(landscapemetrics)     # landscape metrics calculation
library(raster)               # spatial raster data reading and handling
library(sf)                   # spatial vector data reading and handling
library(dplyr)                # data manipulation

Reading the data

The first step is to read the input data.

data("augusta_nlcd")
my_raster = augusta_nlcd

In this example, we will use the augusta_nlcd dataset build in the landscapemetrics package. It is also possible to read any spatial raster file with the raster() function, for example my_raster = raster("path_to_my_file.tif"). The input file, however, should fulfill two requirements: (1) contain only integer values that represent categories, and (2) be in a projected coordinate reference system. You can check if your file meets the requirements using the check_landscape() function, and learn more about coordinate reference systems in the Geocomputation with R book.

Our example data looks like that:

plot(my_raster)

Creating a grid

One of the ways to create borders of local landscapes is to use the st_make_grid() function. This function accepts an sf object as the first argument, therefore we need to create a new object based on the bounding box of the input raster. Next, we also need to provide a second argument, either cellsize or n:

cellsize - vector of length 1 or 2 - the side length of each grid cell in map units (usually meters)
n - vector of length 1 or 2 - the number of grid cells in a row/column

my_grid_geom = st_make_grid(st_as_sfc(st_bbox(my_raster)), cellsize = 1500)
my_grid = st_sf(geom = my_grid_geom)

Next, we can overlay the newly created grid on top of our input raster:

plot(my_raster)
plot(my_grid, add = TRUE)

Calculating a metric

The calculation of landscape metrics for each cell can be done with the sample_lsm() function. It requires the input raster as the first argument, and the grid as the second one¹. Next, we can specify which landscape metrics we want to calculate. For example, we selected marginal entropy to be calculated on a landscape level. The complete list of the implemented metrics can be obtained with the list_lsm() function. Let us know if you are missing some metrics.

my_metric = sample_lsm(my_raster, my_grid,
                       level = "landscape", metric = "ent")
my_metric

## # A tibble: 126 x 8
##    layer level     class    id metric value plot_id percentage_inside
##                              
##  1     1 landscape    NA    NA ent     2.54       1               100
##  2     1 landscape    NA    NA ent     2.52       2               100
##  3     1 landscape    NA    NA ent     2.47       3               100
##  4     1 landscape    NA    NA ent     2.49       4               100
##  5     1 landscape    NA    NA ent     2.38       5               100
##  6     1 landscape    NA    NA ent     2.48       6               100
##  7     1 landscape    NA    NA ent     2.65       7               100
##  8     1 landscape    NA    NA ent     2.41       8               100
##  9     1 landscape    NA    NA ent     2.28       9               100
## 10     1 landscape    NA    NA ent     2.44      10               100
## # … with 116 more rows

Each row in the my_metric object corresponds to each provided grid cell, with the values of marginal entropy in the value column ². Next, we can connect spatial grid with my_metric using the bind_cols() function:

my_grid = bind_cols(my_grid, my_metric)

The quickest visualization of the results can be done with the plot() function.

plot(my_grid["value"])

More complex vizualizations can be done with the tmap or ggplot2 packages.

Saving the results

The write_sf() function can save the results together with spatial geometry.

write_sf(my_grid, "my_grid.gpkg")

This output file can be read in any GIS software, such as QGIS, GRASS GIS, or ArcGIS.

Bonus 1 - show each raster

We can also see local landscapes for each grid cell, when we set the return_raster argument in sample_lsm() to TRUE:

my_metric_r = sample_lsm(my_raster, my_grid,
                         level = "landscape", metric = "ent",
                         return_raster = TRUE)
my_metric_r

## # A tibble: 126 x 9
##    layer level class    id metric value plot_id percentage_insi…
##                         
##  1     1 land…    NA    NA ent     2.54       1              100
##  2     1 land…    NA    NA ent     2.52       2              100
##  3     1 land…    NA    NA ent     2.47       3              100
##  4     1 land…    NA    NA ent     2.49       4              100
##  5     1 land…    NA    NA ent     2.38       5              100
##  6     1 land…    NA    NA ent     2.48       6              100
##  7     1 land…    NA    NA ent     2.65       7              100
##  8     1 land…    NA    NA ent     2.41       8              100
##  9     1 land…    NA    NA ent     2.28       9              100
## 10     1 land…    NA    NA ent     2.44      10              100
## # … with 116 more rows, and 1 more variable: raster_sample_plots

This adds a new column, raster_sample_plots, that contains each local landscapes. Next, we can vizualize whichever landscapes we want, for example, number 1 which is located in the bottom left corner of the study area:

plot(my_metric_r$raster_sample_plots[[1]],
     main = paste0("ent: ", round(my_metric_r$value[[1]], 2)))

Bonus 2 - calculate many metrics

We are not limited to calculating just one metric at the time. The landscapemetrics package makes it possible to calculate up to 65 metrics on landscape level. There are several ways to specify the metrics we want to obtain as you can learn in the ?calculate_lsm help file. For this example, we selected two metrics - marginal entropy (abbriviation "ent") and mutual information (abbriviate as "mutinf"):

my_metrics = sample_lsm(my_raster, my_grid, 
                        level = "landscape", metric = c("ent", "mutinf"))
my_metrics

## # A tibble: 252 x 8
##    layer level     class    id metric value plot_id percentage_inside
##                              
##  1     1 landscape    NA    NA ent    2.54        1               100
##  2     1 landscape    NA    NA mutinf 1.07        1               100
##  3     1 landscape    NA    NA ent    2.52        2               100
##  4     1 landscape    NA    NA mutinf 0.841       2               100
##  5     1 landscape    NA    NA ent    2.47        3               100
##  6     1 landscape    NA    NA mutinf 0.973       3               100
##  7     1 landscape    NA    NA ent    2.49        4               100
##  8     1 landscape    NA    NA mutinf 1.09        4               100
##  9     1 landscape    NA    NA ent    2.38        5               100
## 10     1 landscape    NA    NA mutinf 1.04        5               100
## # … with 242 more rows

The output data frame has two rows for each grid cell; therefore, if we want to connect the result with a spatial object, we need to reformat it. It can be done with the pivot_wider() function from the tidyr package.

library(tidyr)
my_metrics = pivot_wider(my_metrics, names_from = metric, values_from = value)
my_metrics

## # A tibble: 126 x 8
##    layer level     class    id plot_id percentage_inside   ent mutinf
##                              
##  1     1 landscape    NA    NA       1               100  2.54  1.07 
##  2     1 landscape    NA    NA       2               100  2.52  0.841
##  3     1 landscape    NA    NA       3               100  2.47  0.973
##  4     1 landscape    NA    NA       4               100  2.49  1.09 
##  5     1 landscape    NA    NA       5               100  2.38  1.04 
##  6     1 landscape    NA    NA       6               100  2.48  1.15 
##  7     1 landscape    NA    NA       7               100  2.65  1.12 
##  8     1 landscape    NA    NA       8               100  2.41  0.779
##  9     1 landscape    NA    NA       9               100  2.28  0.914
## 10     1 landscape    NA    NA      10               100  2.44  0.857
## # … with 116 more rows

This function moves metrics values from the value column into two new columns - ent and mutinf. Now, we are able to connect it to the spatial object with bind_cols():

my_grid2 = bind_cols(my_grid, my_metrics)

Plotting of the result requires providing the name of the column of a metric:

plot(my_grid2["ent"])

plot(my_grid2["mutinf"])

Summary

The sample_lsm() function offers more than calculations for regular areas. It is also possible to provide irregular polygons as the second argument and calculate any landscape metrics for them. The landscapemetrics package also has many additional features, including calculation of metrics in moving window with window_lsm() and in subsequential buffers around points of interest with scale_sample(). Learn more about landscape metrics and the landscapemetrics package at https://r-spatialecology.github.io/landscapemetrics/ and http://dx.doi.org/10.1111/ecog.04617.

Footnotes

This function also allows for many more possibilities, including specifying 2-column matrix with coordinates, SpatialPoints, SpatialLines, SpatialPolygons, sf points or sf polygons as the second argument. You can learn all of the possible options using ?sample_lsm.↩︎
To learn more about the structure of the output read the Efficient landscape metrics calculations for buffers around sampling points blog post.↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2020,
  author = {Nowosad, Jakub},
  title = {How to Calculate Landscape Metrics for Local Landscapes?},
  date = {2020-01-23},
  url = {https://jakubnowosad.com/posts/2020-01-23-lsm_bp2/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2020. “How to Calculate Landscape Metrics for Local Landscapes?” January 23, 2020. https://jakubnowosad.com/posts/2020-01-23-lsm_bp2/.

Evaluation of the new palette() for R

Jakub Nowosad — Sun, 01 Dec 2019 00:00:00 GMT

R version 4.0 is just around the corner. One of the changes in the new version is the improved default color palette using the palette() function.

Proposed colors

The new proposed palette() default is less saturated and more balanced, while at the same time, it follows the same basic pattern of colors (hues). You can read more about it at https://developer.r-project.org/Blog/public/2019/11/21/a-new-palette-for-r/.

col_ver3 = c("#000000", "#FF0000", "#00CD00", "#0000FF",
             "#00FFFF", "#FF00FF", "#FFFF00", "#BEBEBE")
col_ver4 = c("#000000", "#DF536B", "#61D04F", "#2297E6",
             "#28E2E5", "#CD0BBC", "#EEC21F", "#9E9E9E")

library(colorspace)
swatchplot("Version 3" = col_ver3,
           "Version 4" = col_ver4)

This proposal is still being disussed and modified as mentioned by Achim Zeileis on Twitter:

Therefore, I decided it is a good time to test the properties of the proposed palette() default for color vision deficiencies - deuteranopia, protanopia, and tritanopia.

Comparision between palettes

I used the colorblindcheck package for this purpose.

# remotes::install_github("nowosad/colorblindcheck")
library(colorblindcheck)

This tiny R package provides tools for helping to decide if the selected color palette is colorblind-friendly. You can see examples of its use at https://nowosad.github.io/colorblindcheck.

The primary function in this package is palette_check(), which creates summary statistics comparing the original input palette and simulations of color vision deficiencies.

palette_check(col_ver3, plot = TRUE)

          name n tolerance ncp ndcp  min_dist mean_dist  max_dist
1       normal 8  26.64945  28   28 26.649454  58.57976 105.67883
2 deuteranopia 8  26.64945  28   26 12.790929  51.65788  99.81401
3   protanopia 8  26.64945  28   24  4.337187  54.06193  95.06426
4   tritanopia 8  26.64945  28   23 13.934054  51.49383  90.45153

Visual inspection of the old palette() default allows seeing that it is not suitable for people with color vision deficiencies. For example, people with protanopia could have problems distinguishing the first from the second color and the forth from the sixth color. This problem is also confirmed in the summary statistics, where the minimal distance between colors of the original palette is about 26, while it is only about 4 for protanopia.

palette_check(col_ver4, plot = TRUE)

          name n tolerance ncp ndcp  min_dist mean_dist max_dist
1       normal 8  23.51878  28   28 23.518780  50.21307 95.04017
2 deuteranopia 8  23.51878  28   22 12.094062  41.11547 78.45654
3   protanopia 8  23.51878  28   24  5.402646  42.28841 81.02547
4   tritanopia 8  23.51878  28   22 11.032589  44.47677 83.19068

The proposed palette() looks considerably better, as it is easier to distinguish between colors for each color vision deficiency. However, the minimal distance between colors for protanopia is just marginally better with a value of about 5.

Protanomaly

Let’s use the palette_dist() function to compare each pair of colors in the old and proposed palette() using the protanopia color vision deficiency.

palette_dist(col_ver3, cvd = "pro")

     [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]
[1,]   NA 13.89 63.29 40.08 85.59 41.23 90.98 66.30
[2,]   NA    NA 54.81 52.02 82.32 52.17 82.02 62.61
[3,]   NA    NA    NA 78.71 43.28 75.54 18.13 28.28
[4,]   NA    NA    NA    NA 52.63  4.34 95.06 52.40
[5,]   NA    NA    NA    NA    NA 48.08 44.74 14.58
[6,]   NA    NA    NA    NA    NA    NA 91.20 48.37
[7,]   NA    NA    NA    NA    NA    NA    NA 31.09
[8,]   NA    NA    NA    NA    NA    NA    NA    NA

The shortest distance between colors in the old palette() default was between the fourth and sixth color (4.33).

palette_dist(col_ver4, cvd = "pro")

     [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]
[1,]   NA 30.25 68.88 53.60 74.53 35.78 71.67 51.95
[2,]   NA    NA 41.26 27.23 34.35 26.52 43.84 21.75
[3,]   NA    NA    NA 55.72 37.58 78.62  5.40 26.47
[4,]   NA    NA    NA    NA 20.76 31.85 58.06 23.89
[5,]   NA    NA    NA    NA    NA 51.88 39.84 16.37
[6,]   NA    NA    NA    NA    NA    NA 81.03 45.51
[7,]   NA    NA    NA    NA    NA    NA    NA 29.48
[8,]   NA    NA    NA    NA    NA    NA    NA    NA

This pair of colors is substantially more distinguishable in the new proposed palette() default with a distance of about 32. However, the shortest distance in this palette was 5.40 between the third and seventh color.

Summary

The new proposed palette() default is a step in the right direction with more balanced luminance while keeping similar hues to the old version. This constraint, however, results in having a pair of very similar colors for people with protanopia.

What can be done then to ensure that the color palette we use is colorblind friendly? Gladly, there are many additional color palettes available in R that can be used. It includes some of the palettes introduced in the R 3.6 function hcl.colors(). Read more about them at https://developer.r-project.org/Blog/public/2019/04/01/hcl-based-color-palettes-in-grdevices/ or see them by yourself using example("hcl.colors". Additionally, a new palette.colors() function will be added to R 4.0 with several sensible predefined palettes for representing qualitative data.

UPDATE 2019-12-04

The proposed palette() was updated based on the provided feedback and suggestions. The current version is:

col_ver4b = c("#000000", "#DF536B", "#61D04F", "#2297E6",
              "#28E2E5", "#CD0BBC", "#F5C710", "#9E9E9E")

swatchplot("Version 4b" = col_ver4b)

Let’s check how it changes our palette’s summaries:

palette_check(col_ver4b, plot = TRUE)

          name n tolerance ncp ndcp  min_dist mean_dist max_dist
1       normal 8  23.51878  28   28 23.518780  50.51179 95.04017
2 deuteranopia 8  23.51878  28   22 13.714363  41.47104 81.33288
3   protanopia 8  23.51878  28   24  6.851961  42.56735 81.90781
4   tritanopia 8  23.51878  28   22 11.032589  44.46748 83.19068

The new version is an improvement with a greater distance between the most similar pair of colors for people with protanomaly. It was 4.33 in the version 3 default, 5.4 in the previously proposed version 4 default, and it is 6.85 now. That being said, it is relevant to mention that the evaluation of color palettes cannot be done in an entirely automated fashion: tools in colorblindcheck should be used together with visual judgments.

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{nowosad2019,
  author = {Nowosad, Jakub},
  title = {Evaluation of the New Palette() for {R}},
  date = {2019-12-01},
  url = {https://jakubnowosad.com/posts/2019-12-01-cbc-bp1/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2019. “Evaluation of the New Palette() for R.” December 1, 2019. https://jakubnowosad.com/posts/2019-12-01-cbc-bp1/.