Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

A method for
universal
superpixels-based
regionalization






Jakub Nowosad, Tomasz Stepinski, Mateusz Iwicki FOSS4G 2022, 2022-08-26

1 / 20

Spatial segmentation

Segmentation: partitioning space into smaller segments while minimizing internal inhomogeneity and maximizing external isolation

In geography: segmentation ~= regionalization.

Segmentation is an optimization problem and heuristics needs to be used to keep computational time in check.




One way to improve the output and reduce the cost of segmentation is to perform a preprocessing stage with superpixels.

Esch (2008), doi:10.1109/LGRS.2008.919622

Esch (2008), doi:10.1109/LGRS.2008.919622

2 / 20

Superpixels

Main idea: create groupings of adjacent cells that share common characteristics, which result in an over-segmentation.

(1) pixels are not natural entities; they are a consequence of the discrete representation of data

(2) superpixels reduces the dimensionality of the data making a segmentation task easier

Blaschke (2010), doi:10.1016/j.isprsjprs.2009.06.004

Blaschke (2010), doi:10.1016/j.isprsjprs.2009.06.004

The SLIC algorithm (Achanta et al. (2012), doi:10.1109/TPAMI.2012.120) -- broadly used due to its simplicity, accuracy, and low computational cost.

3 / 20

SLIC

SLIC starts with cluster centers spaced by the interval of S.

Each cell is assigned to the nearest cluster center, and the distance D is calculated between the cluster centers and cells in the 2S×2S region.

Afterward, new cluster centers (centroids) are updated for the new superpixels, and their color values are the average of all the cells belonging to the given superpixel.


D=(dcm)2+(dsS)2

where dc is the color (spectral) distance, m is the compactness parameter, ds is the spatial (Euclidean) distance, and S is the interval between the initial cluster centers.

4 / 20

SLIC

The color (spectral) distance is calculated between values I(xi,yi,sp) and I(xj,yj,sp) for a spectral band sp in the set of spectral bands B:

dc=pB(I(xi,yi,sp)I(xj,yj,sp))2




The spatial (Euclidean) distance between cells represents spatial proximity:

ds=(xjxi)2+(yjyi)2




The color distance controls the homogeneity of superpixels.

The spatial distance is related to spatial contiguity.

5 / 20

SLIC

The SLIC algorithm works iteratively, repeating the above process until it reaches the expected number of iterations.

6 / 20

As originally implemented by its authors, the SLIC algorithm has the RGB image hard-wired as input data.

Thus, its geospatial applications remain restricted to images, RGB, multispectral, or hyperspectral.

7 / 20

We propose extension of SLIC that can be applied to non-imagery geospatial rasters that carry:

- pattern information (co-occurrence matrices)

- compositional information (histograms)

- time-series information (ordered sequences)

- other forms of information for which the use of Euclidean distance may not be justified

8 / 20

Extension

The extended SLIC allows using any distance measure to calculate the semantic distance -- dc can be replaced with any distance/dissimilarity measure.




9 / 20

Extension

The extended SLIC allows using any distance measure to calculate the semantic distance -- dc can be replaced with any distance/dissimilarity measure.




For example, raster time-series could be compared with dynamic time warping, while distances between sets of categorical variables could be calculated using Jenson-Shannon distance:

dc=H(A+B2)12[H(A)+H(B)]

where A and B are normalized sets of values characterizing the compared cells, and H(A) and H(B) indicates values of Shannon's entropy for these sets:

H(A)=pAAplog2Ap

Ap is the pth value of the first of the compared cell.

9 / 20

Extension

Also, notice that in the SLIC iterations, new cluster centers (centroids) have color values that are the average of all the cells belonging to the given superpixel.

However, different types of input variables could require different averaging functions.

Therefore, our extension also allows applying other averaging functions that just the mean.

We implemented the above idea in the R programming language as an open-source package supercells.

The package installation instructions and documentation can be found at https://jakubnowosad.com/supercells/.

10 / 20

Methods and software

Workflow:

%0 inputraster Geospatial raster data superpixels1 Original SLIC inputraster->superpixels1 superpixels2 Extended SLIC inputraster->superpixels2 kmeans k-means clustering inputraster->kmeans too many regions skater Graph-based regionalization inputraster->skater too computationally demanding superpixels1->kmeans superpixels1->skater superpixels2->kmeans superpixels2->skater concom Connected-component labeling kmeans->concom results Resulting regions concom->results skater->results
11 / 20

Methods and software

Workflow:

%0 inputraster Geospatial raster data superpixels1 Original SLIC inputraster->superpixels1 superpixels2 Extended SLIC inputraster->superpixels2 kmeans k-means clustering inputraster->kmeans too many regions skater Graph-based regionalization inputraster->skater too computationally demanding superpixels1->kmeans superpixels1->skater superpixels2->kmeans superpixels2->skater concom Connected-component labeling kmeans->concom results Resulting regions concom->results skater->results

Software:



and it's packages: supercells, rgeoda, regional, sf, terra, ggplot2, and tmap

11 / 20

Example 1: local texture or pattern

The study area of 38x21 km is located in western Algeria and features a field of dunes (Copernicus DEM).

The goal: to count individual dunes automatically.

12 / 20

Example 1: local texture or pattern

For extended and original SLIC, we selected parameters that resulted in supercells large enough to contain dunes.

Next, we classified supercells as dune/no dune using the k-means clustering algorithm.

13 / 20

Example 1: local texture or pattern

For extended and original SLIC, we selected parameters that resulted in supercells large enough to contain dunes.

Next, we classified supercells as dune/no dune using the k-means clustering algorithm.

13 / 20

Example 1: local texture or pattern

For extended and original SLIC, we selected parameters that resulted in supercells large enough to contain dunes.

Next, we classified supercells as dune/no dune using the k-means clustering algorithm.


Extended Original
Accuracy 0.86 0.78
True positive rate 0.82 0.66
True negative rate 0.91 0.87
13 / 20

Example 2: discrete probability distributions

A site located in the eastern Netherlands having the size of 507x1105 cells.

Fractions of a pixel's area covered by different land cover classes (Copernicus Global Land Service: 2019 Land Cover 100m-resolution data).

The goal: to regionalize fractional land cover data.


The workflow: (a) delineating supercells using SLIC, and (b) delineating regions by performing regionalization using the SKATER algorithm.

14 / 20

Example 2: discrete probability distributions

Extended SLIC: an entire histogram (eight dimensions) and the Jensen-Shannon divergence.

Original SLIC: a false-color image with values of RGB derived from the first three principal components of the data and the Euclidean distance.

Area-weighted inhomogeneity Isolation
Supercells (JSD) 0.09 0.26
Supercells (EUC) 0.1 0.25
15 / 20

Example 2: discrete probability distributions

SKATER workflows resulted in an approximately normal distribution of region areas.

K-means workflow had power-law-like distribution of region areas.

Extended SLIC also performed better than the original one.

16 / 20

Example 2: discrete probability distributions

SKATER workflows resulted in an approximately normal distribution of region areas.

K-means workflow had power-law-like distribution of region areas.

Extended SLIC also performed better than the original one.

Area-weighted inhomogeneity Isolation
Extended SLIC (468) 0.13 0.48
K-means (468) 0.2 0.39
Original SLIC (468) 0.17 0.35
16 / 20

Example 3: time-series

Great Britain. WorldClim gridded climate data was normalized to be between 0 and 1.

The goal: to regionalize Great Britain's climates

17 / 20

Example 3: time-series

Extended SLIC workflow uses the dynamic time warping (DTW) distance function rather than the Euclidean distance.

18 / 20

Example 3: time-series

Extended SLIC workflow uses the dynamic time warping (DTW) distance function rather than the Euclidean distance.

18 / 20

Example 3: time-series

Seven regions

19 / 20

Example 3: time-series

Seven regions

Extended SLIC: a more homogeneous regionalization.

Original SLIC: more isolated regions.

SLIC Inhomogeneity Isolation
extended 0.30 0.59
original 0.37 0.75




The raster of time series compressed from 24 dimensions to three principal components preserving 99% of variability.

19 / 20

Summary

  • We propose the SLIC algorithm extension to work with non-imagery data structures without data reduction and conversion to the false-color image

  • It allows for using a data distance measure most appropriate to a particular data structure and a custom function for averaging values of clusters centers

  • We compared our extension and original SLIC algorithms on three examples of non-imagery data

  • An advantage of the extended SLIC is inversely proportional to the compressibility of the data to just three dimensions

Contact

Twitter: jakub_nowosad

Website: https://jakubnowosad.com/, http://sil.uc.edu/

Resources

Slides: jakubnowosad.com/foss4g-2022

Articles: conference paper, journal paper

Software: spatial superpixels, quality of regions, regionalization


This work was supported by the National Science Centre (Poland) under grant number 2019/03/X/ST10/00776, and the grant 038/04/NP/0020 funded by the Initiative of Excellence - Research University project at Adam Mickiewicz University, Poznan.

20 / 20

Spatial segmentation

Segmentation: partitioning space into smaller segments while minimizing internal inhomogeneity and maximizing external isolation

In geography: segmentation ~= regionalization.

Segmentation is an optimization problem and heuristics needs to be used to keep computational time in check.




One way to improve the output and reduce the cost of segmentation is to perform a preprocessing stage with superpixels.

Esch (2008), doi:10.1109/LGRS.2008.919622

Esch (2008), doi:10.1109/LGRS.2008.919622

2 / 20
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow