A method for
universal
superpixels-based
regionalization
Jakub Nowosad, Tomasz Stepinski, Mateusz Iwicki FOSS4G 2022, 2022-08-26
Segmentation: partitioning space into smaller segments while minimizing internal inhomogeneity and maximizing external isolation
In geography: segmentation ~= regionalization.
Segmentation is an optimization problem and heuristics needs to be used to keep computational time in check.
One way to improve the output and reduce the cost of segmentation is to perform a preprocessing stage with superpixels.
Esch (2008), doi:10.1109/LGRS.2008.919622
Main idea: create groupings of adjacent cells that share common characteristics, which result in an over-segmentation.
(1) pixels are not natural entities; they are a consequence of the discrete representation of data
(2) superpixels reduces the dimensionality of the data making a segmentation task easier
Blaschke (2010), doi:10.1016/j.isprsjprs.2009.06.004
The SLIC algorithm (Achanta et al. (2012), doi:10.1109/TPAMI.2012.120) -- broadly used due to its simplicity, accuracy, and low computational cost.
SLIC starts with cluster centers spaced by the interval of S.
Each cell is assigned to the nearest cluster center, and the distance D is calculated between the cluster centers and cells in the 2S×2S region.
Afterward, new cluster centers (centroids) are updated for the new superpixels, and their color values are the average of all the cells belonging to the given superpixel.
D=√(dcm)2+(dsS)2
where dc is the color (spectral) distance, m is the compactness parameter, ds is the spatial (Euclidean) distance, and S is the interval between the initial cluster centers.
The color (spectral) distance is calculated between values I(xi,yi,sp) and I(xj,yj,sp) for a spectral band sp in the set of spectral bands B:
dc=√∑p∈B(I(xi,yi,sp)−I(xj,yj,sp))2
The spatial (Euclidean) distance between cells represents spatial proximity:
ds=√(xj−xi)2+(yj−yi)2
The color distance controls the homogeneity of superpixels.
The spatial distance is related to spatial contiguity.
The SLIC algorithm works iteratively, repeating the above process until it reaches the expected number of iterations.
The extended SLIC allows using any distance measure to calculate the semantic distance -- dc can be replaced with any distance/dissimilarity measure.
The extended SLIC allows using any distance measure to calculate the semantic distance -- dc can be replaced with any distance/dissimilarity measure.
For example, raster time-series could be compared with dynamic time warping, while distances between sets of categorical variables could be calculated using Jenson-Shannon distance:
dc=H(A+B2)−12[H(A)+H(B)]
where A and B are normalized sets of values characterizing the compared cells, and H(A) and H(B) indicates values of Shannon's entropy for these sets:
H(A)=−∑p∈AAplog2Ap
Ap is the pth value of the first of the compared cell.
Also, notice that in the SLIC iterations, new cluster centers (centroids) have color values that are the average of all the cells belonging to the given superpixel.
However, different types of input variables could require different averaging functions.
Therefore, our extension also allows applying other averaging functions that just the mean.
We implemented the above idea in the R programming language as an open-source package supercells.
The package installation instructions and documentation can be found at https://jakubnowosad.com/supercells/.
Workflow:
Workflow:
Software:
and it's packages: supercells, rgeoda, regional, sf, terra, ggplot2, and tmap
The study area of 38x21 km is located in western Algeria and features a field of dunes (Copernicus DEM).
The goal: to count individual dunes automatically.
For extended and original SLIC, we selected parameters that resulted in supercells large enough to contain dunes.
Next, we classified supercells as dune/no dune using the k-means clustering algorithm.
For extended and original SLIC, we selected parameters that resulted in supercells large enough to contain dunes.
Next, we classified supercells as dune/no dune using the k-means clustering algorithm.
For extended and original SLIC, we selected parameters that resulted in supercells large enough to contain dunes.
Next, we classified supercells as dune/no dune using the k-means clustering algorithm.
Extended | Original | |
---|---|---|
Accuracy | 0.86 | 0.78 |
True positive rate | 0.82 | 0.66 |
True negative rate | 0.91 | 0.87 |
A site located in the eastern Netherlands having the size of 507x1105 cells.
Fractions of a pixel's area covered by different land cover classes (Copernicus Global Land Service: 2019 Land Cover 100m-resolution data).
The goal: to regionalize fractional land cover data.
The workflow: (a) delineating supercells using SLIC, and (b) delineating regions by performing regionalization using the SKATER algorithm.
Extended SLIC: an entire histogram (eight dimensions) and the Jensen-Shannon divergence.
Original SLIC: a false-color image with values of RGB derived from the first three principal components of the data and the Euclidean distance.
Area-weighted inhomogeneity | Isolation | |
---|---|---|
Supercells (JSD) | 0.09 | 0.26 |
Supercells (EUC) | 0.1 | 0.25 |
SKATER workflows resulted in an approximately normal distribution of region areas.
K-means workflow had power-law-like distribution of region areas.
Extended SLIC also performed better than the original one.
SKATER workflows resulted in an approximately normal distribution of region areas.
K-means workflow had power-law-like distribution of region areas.
Extended SLIC also performed better than the original one.
Area-weighted inhomogeneity | Isolation | |
---|---|---|
Extended SLIC (468) | 0.13 | 0.48 |
K-means (468) | 0.2 | 0.39 |
Original SLIC (468) | 0.17 | 0.35 |
Great Britain. WorldClim gridded climate data was normalized to be between 0 and 1.
The goal: to regionalize Great Britain's climates
Extended SLIC workflow uses the dynamic time warping (DTW) distance function rather than the Euclidean distance.
Extended SLIC workflow uses the dynamic time warping (DTW) distance function rather than the Euclidean distance.
Seven regions
Seven regions
Extended SLIC: a more homogeneous regionalization.
Original SLIC: more isolated regions.
SLIC | Inhomogeneity | Isolation |
---|---|---|
extended | 0.30 | 0.59 |
original | 0.37 | 0.75 |
The raster of time series compressed from 24 dimensions to three principal components preserving 99% of variability.
We propose the SLIC algorithm extension to work with non-imagery data structures without data reduction and conversion to the false-color image
It allows for using a data distance measure most appropriate to a particular data structure and a custom function for averaging values of clusters centers
We compared our extension and original SLIC algorithms on three examples of non-imagery data
An advantage of the extended SLIC is inversely proportional to the compressibility of the data to just three dimensions
Twitter: jakub_nowosad
Website: https://jakubnowosad.com/, http://sil.uc.edu/
Slides: jakubnowosad.com/foss4g-2022
Articles: conference paper, journal paper
Software: spatial superpixels, quality of regions, regionalization
This work was supported by the National Science Centre (Poland) under grant number 2019/03/X/ST10/00776, and the grant 038/04/NP/0020 funded by the Initiative of Excellence - Research University project at Adam Mickiewicz University, Poznan.
Segmentation: partitioning space into smaller segments while minimizing internal inhomogeneity and maximizing external isolation
In geography: segmentation ~= regionalization.
Segmentation is an optimization problem and heuristics needs to be used to keep computational time in check.
One way to improve the output and reduce the cost of segmentation is to perform a preprocessing stage with superpixels.
Esch (2008), doi:10.1109/LGRS.2008.919622
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |