A method for universal superpixels-based regionalization

class: inverse, left, nonum, clear
background-image: url("figs/cover.png")
background-size: cover

.titlestyle2[A method for] <br>
.titlestyle2[universal] <br>
.titlestyle2[superpixels-based] <br>
.titlestyle2[regionalization]

.captionstyle[Jakub Nowosad, Tomasz Stepinski, Mateusz Iwicki]
.pull-right2[.captionstyle[FOSS4G 2022, 2022-08-26]]

---
# Spatial segmentation

.pull-left[
**Segmentation**: partitioning space into smaller segments while minimizing internal inhomogeneity and maximizing external isolation

In geography: **segmentation ~= regionalization.**

**Segmentation is an optimization problem** and heuristics needs to be used to keep computational time in check.

**One way to improve the output and reduce the cost of segmentation is to perform a preprocessing stage with superpixels.**
]

.pull-right[
<div class="figure" style="text-align: center">
<img src="figs/esch.png" alt="Esch (2008), doi:10.1109/LGRS.2008.919622" width="75%" />
<p class="caption">Esch (2008), doi:10.1109/LGRS.2008.919622</p>
</div>
]

---
# Superpixels

**Main idea: create groupings of adjacent cells that share common characteristics**, which result in an **over-segmentation**.

(1) pixels are not natural entities; they are a consequence of the discrete representation of data

(2) superpixels reduces the dimensionality of the data making a segmentation task easier

<div class="figure" style="text-align: center">
<img src="figs/blaschke.png" alt="Blaschke (2010), doi:10.1016/j.isprsjprs.2009.06.004" width="65%" />
<p class="caption">Blaschke (2010), doi:10.1016/j.isprsjprs.2009.06.004</p>
</div>

**The SLIC algorithm** (*Achanta et al. (2012), doi:10.1109/TPAMI.2012.120*) -- broadly used due to its simplicity, accuracy, and low computational cost.

---
# SLIC

.pull-left[
**SLIC starts with cluster centers** spaced by the interval of `$S$`.

**Each cell is assigned to the nearest cluster center**, and **the distance `$D$` is calculated between the cluster centers and cells** in the `$2S \times 2S$` region.

Afterward, **new cluster centers (centroids) are updated for the new superpixels**, and their color values are the average of all the cells belonging to the given superpixel.

<hr>

$$
D = \sqrt{\left(\frac{d_c}{m}\right)^2 + \left(\frac{d_s}{S}\right)^2}
$$

where `$d_c$` is the color (spectral) distance, `$m$` is the compactness parameter, `$d_s$` is the spatial (Euclidean) distance, and `$S$` is the interval between the initial cluster centers.
]

.pull-right[
<img src="figs/ga0.gif" width="90%" style="display: block; margin: auto;" />
]

---
# SLIC

The color (spectral) distance is calculated between values `$I(x_i,y_i,s_p)$` and `$I(x_j,y_j,s_p)$` for a spectral band `$s_p$` in the set of spectral bands `$B$`:

$$
dc = \sqrt{\sum{p \in B}{(I(x_i,y_i,s_p)-I(x_j,y_j,s_p))^2}}
$$

The spatial (Euclidean) distance between cells represents spatial proximity:

$$
d_s = \sqrt{(x_j - x_i)^2 + (y_j - y_i)^2}
$$

**The color distance controls the homogeneity of superpixels.**

**The spatial distance is related to spatial contiguity.**

---
# SLIC

.pull-left[
**The SLIC algorithm works iteratively**, repeating the above process until it reaches the expected number of iterations.

]
.pull-right[
<img src="figs/ga.gif" width="90%" style="display: block; margin: auto;" />
]

---
class: center, middle, clear

### As originally implemented by its authors, the SLIC algorithm has the RGB image hard-wired as input data.

### Thus, its geospatial applications remain restricted to images, RGB, multispectral, or hyperspectral.

---
class: center, middle, clear

### We propose extension of SLIC that can be applied to non-imagery geospatial rasters that carry:

### - pattern information (co-occurrence matrices)
### - compositional information (histograms)
### - time-series information (ordered sequences)
### - other forms of information for which the use of Euclidean distance may not be justified

---
# Extension

**The extended SLIC allows using any distance measure to calculate the semantic distance --  `$d_c$` can be replaced with any distance/dissimilarity measure.**

For example, **raster time-series could be compared with dynamic time warping**, while **distances between sets of categorical variables could be calculated using Jenson-Shannon distance**:

$$
d_c = H(\frac{A + B}{2}) - \frac{1}{2}[H(A) + H(B)]
$$

where A and B are normalized sets of values characterizing the compared cells, and `$H(A)$` and `$H(B)$` indicates values of Shannon's entropy for these sets:

$$
H(A) = -\sum_{p \in A}{A_p log_2 A_p}
$$

`$A_p$` is the `$pth$` value of the first of the compared cell.

---
# Extension

.pull-left[
Also, notice that in the SLIC iterations, **new cluster centers (centroids) have color values that are the average of all the cells belonging to the given superpixel**.

However, different types of input variables could require different averaging functions.

Therefore, **our extension also allows applying other averaging functions that just the mean**.
]

.pull-right[
<img src="figs/logo.png" width="52%" style="display: block; margin: auto;" />

We implemented the above idea in the R programming language as an open-source package **supercells**.

The package installation instructions and documentation can be found at https://jakubnowosad.com/supercells/.
]

---
# Methods and software

Workflow:

<div id="htmlwidget-5ced0e33b38866a72b09" style="width:100%;height:300px;" class="grViz html-widget"></div>
<script type="application/json" data-for="htmlwidget-5ced0e33b38866a72b09">{"x":{"diagram":"digraph {\n\ngraph [layout = dot, rankdir = LR]\n\n# define the global styles of the nodes. We can override these in box if we wish\nnode [shape = rectangle, style = filled, fillcolor = Linen]\n\ninputraster [label = \"Geospatial\nraster data\", shape = square, fillcolor = Beige, penwidth = 3]\n\nsuperpixels1 [label = \"Original SLIC\"]\nsuperpixels2 [label = \"Extended SLIC\", penwidth = 3]\n\nkmeans [label = \"k-means clustering\"]\nconcom [label = \"Connected-component\nlabeling\"]\nskater [label = \"Graph-based \nregionalization\", penwidth = 3]\n\nresults [label= \"Resulting\nregions\", shape = square, fillcolor = Beige, penwidth = 3]\n\n# edge definitions with the node IDs\ninputraster  -> kmeans    [ style=dashed headlabel=\"too many\nregions\" labeldistance=10, labelangle=0  fontsize=11];\ninputraster  -> skater    [ style=dashed headlabel=\"too \ncomputationally \ndemanding\" labeldistance=10, labelangle=6 fontsize=11];\n\ninputraster  -> {superpixels1, superpixels2} -> {kmeans, skater} \nkmeans -> concom \n{concom, skater} -> results\n\n}","config":{"engine":"dot","options":null}},"evals":[],"jsHooks":[]}</script>

<hr>

.lc[

Software:
<img src="figs/Rlogo.png" width="50%" style="display: block; margin: auto;" />
]

.rc[
<br><br>

and it's packages: **supercells**, **rgeoda**, **regional**, **sf**, **terra**, **ggplot2**, and **tmap**
]

---
# Example 1: local texture or pattern

The study area of 38x21 km is located in western Algeria and features a field of dunes (*Copernicus DEM*).

**The goal:** to count individual dunes automatically.

---
# Example 1: local texture or pattern

For extended and original SLIC, we selected parameters that **resulted in supercells large enough to contain dunes**.

Next, **we classified supercells as dune/no dune** using the k-means clustering algorithm.

.lc2[
<img src="figs/dunes-results.png" width="90%" style="display: block; margin: auto;" />
]

.rc2[
<br>

|                   |Extended |Original |
|:------------------|:--------|:--------|
|Accuracy           |**0.86** |0.78     |
|True positive rate |**0.82** |0.66     |
|True negative rate |**0.91** |0.87     |
]

---
# Example 2: discrete probability distributions

.lc[
A site located in the eastern Netherlands having the size of 507x1105 cells.

Fractions of a pixel's area covered by different land cover classes (*Copernicus Global Land Service: 2019 Land Cover 100m-resolution data*).

**The goal:** to regionalize fractional land cover data.

<hr>

The workflow: (a) **delineating supercells using SLIC**, and (b) **delineating regions** by performing regionalization using the SKATER algorithm.
]

.rc[
<img src="figs/fig-data.png" width="90%" style="display: block; margin: auto;" />
]

---
# Example 2: discrete probability distributions

.pull-left[
<img src="index_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" />
]

.pull-right[

**Extended SLIC**: an entire histogram (eight dimensions) and the Jensen-Shannon divergence.

**Original SLIC**: a false-color image with values of RGB derived from the first three principal components of the data and the Euclidean distance.

|                 |Area-weighted inhomogeneity |Isolation |
|:----------------|:---------------------------|:---------|
|Supercells (JSD) |**0.09**                    |**0.26**  |
|Supercells (EUC) |0.1                         |0.25      |
]

---
# Example 2: discrete probability distributions

.pull-left[
SKATER workflows resulted in an approximately normal distribution of region areas.

K-means workflow had power-law-like distribution of region areas.

**Extended SLIC also performed better than the original one.**

]

.pull-right[

|                    |Area-weighted inhomogeneity |Isolation |
|:-------------------|:---------------------------|:---------|
|Extended SLIC (468) |**0.13**                    |**0.48**  |
|K-means (468)       |0.2                         |0.39      |
|Original SLIC (468) |0.17                        |0.35      |
]

---
# Example 3: time-series

**Great Britain**. *WorldClim gridded climate data* was normalized to be between 0 and 1.
 
<img src="index_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" />

The goal: to regionalize Great Britain's climates

---
# Example 3: time-series

**Extended SLIC workflow uses the dynamic time warping (DTW) distance** function rather than the Euclidean distance.

.pull-left[
<img src="index_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" />
]

.pull-right[
<img src="index_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" />
]

---
# Example 3: time-series

.lc2[
<img src="figs/meteo-regs.png" width="80%" style="display: block; margin: auto;" />
*Seven regions*
]

.rc2[
**Extended SLIC**: a more homogeneous regionalization.

**Original SLIC**: more isolated regions.

|SLIC     |Inhomogeneity |Isolation |
|:--------|:-------------|:---------|
|extended |**0.30**      |0.59      |
|original |0.37          |**0.75**  |

**The raster of time series compressed from 24 dimensions to three principal components preserving 99% of variability.**

]

---
class: left, top, clear2

.pull-left[

## Summary
- We propose the SLIC algorithm extension to work with non-imagery data structures without data reduction and conversion to the false-color image

- It allows for using a data distance measure most appropriate to a particular data structure and a custom function for averaging values of clusters centers

- We compared our extension and original SLIC algorithms on three examples of non-imagery data

- An advantage of the extended SLIC is inversely proportional to the compressibility of the data to just three dimensions

<img src="figs/dunes-results2b.png" width="844" style="display: block; margin: auto;" />
]

.pull-right[
## Contact

Twitter:  [jakub_nowosad](https://twitter.com/jakub_nowosad)

Website: https://jakubnowosad.com/, http://sil.uc.edu/

## Resources

Slides: [jakubnowosad.com/foss4g-2022](https://jakubnowosad.com/foss4g-2022)

Articles: [conference paper](https://talks.osgeo.org/media/foss4g-2022-academic-track/submissions/VUQSVM/resources/foss4g2022_nowosadV3_REuhruh.pdf), [journal paper](https://doi.org/10.1016/j.jag.2022.102935)

Software: [*spatial superpixels*](https://github.com/Nowosad/supercells), [*quality of regions*](https://github.com/Nowosad/regional), [*regionalization*](https://github.com/geodacenter/rgeoda)

<hr>

.font70[*This work was supported by the National Science Centre (Poland) under grant number 2019/03/X/ST10/00776, and the grant 038/04/NP/0020 funded by the Initiative of Excellence - Research University project at Adam Mickiewicz University, Poznan.*]

]