class: inverse, left, nonum, clear background-image: url("figs/cover.png") background-size: cover <link rel="stylesheet" type="text/css" href="https://fonts.googleapis.com/css?family=Georama" /> .titlestyle2[A method for] <br> .titlestyle2[universal] <br> .titlestyle2[superpixels-based] <br> .titlestyle2[regionalization] <br><br><br><br><br> .captionstyle[Jakub Nowosad, Tomasz Stepinski, Mateusz Iwicki] .pull-right2[.captionstyle[FOSS4G 2022, 2022-08-26]] --- # Spatial segmentation .pull-left[ **Segmentation**: partitioning space into smaller segments while minimizing internal inhomogeneity and maximizing external isolation In geography: **segmentation ~= regionalization.** <!-- (Geographic) **object-based image analysis (OBIA)** applies many techniques, including the multiresolution segmentation (MRS) that uses cells as the underlying representation. --> **Segmentation is an optimization problem** and heuristics needs to be used to keep computational time in check. <br><hr><br> **One way to improve the output and reduce the cost of segmentation is to perform a preprocessing stage with superpixels.** ] .pull-right[ <div class="figure" style="text-align: center"> <img src="figs/esch.png" alt="Esch (2008), doi:10.1109/LGRS.2008.919622" width="75%" /> <p class="caption">Esch (2008), doi:10.1109/LGRS.2008.919622</p> </div> ] --- # Superpixels **Main idea: create groupings of adjacent cells that share common characteristics**, which result in an **over-segmentation**. <!-- **Each superpixel represents a desired level of homogeneity** while at the same time maintains boundaries and structures. --> (1) pixels are not natural entities; they are a consequence of the discrete representation of data (2) superpixels reduces the dimensionality of the data making a segmentation task easier <div class="figure" style="text-align: center"> <img src="figs/blaschke.png" alt="Blaschke (2010), doi:10.1016/j.isprsjprs.2009.06.004" width="65%" /> <p class="caption">Blaschke (2010), doi:10.1016/j.isprsjprs.2009.06.004</p> </div> <!-- Superpixels are groups of adjacent pixels that share common characteristics. The motivations for superpixels are: (1) pixels are not natural entities; they are merely a consequence of the discrete representation of data; and (2) superpixels' representation of the scene reduces the dimensional of the data by orders of magnitude making a segmentation task easier \citep{ren2003learning}. Superpixels take advantage of autocorrelation of spatial distributions in nature to capture scene redundancy. They are meaningful atomic regions of the scene. --> <!-- **Superpixels** also **carry more information than each cell alone**, and thus they **can speed up the subsequent processing efforts**. --> <!-- A large number of methods for creating superpixels were developed (*Stutz et al (2018), doi:10.1016/j.cviu.2017.03.007*), with **the SLIC algorithm** (*Achanta et al. (2012), doi:10.1109/TPAMI.2012.120*) **being the most prominent**. --> <!-- the Simple Linear Iterative Clustering (SLIC) --> **The SLIC algorithm** (*Achanta et al. (2012), doi:10.1109/TPAMI.2012.120*) -- broadly used due to its simplicity, accuracy, and low computational cost. --- # SLIC .pull-left[ **SLIC starts with cluster centers** spaced by the interval of `\(S\)`. **Each cell is assigned to the nearest cluster center**, and **the distance `\(D\)` is calculated between the cluster centers and cells** in the `\(2S \times 2S\)` region. Afterward, **new cluster centers (centroids) are updated for the new superpixels**, and their color values are the average of all the cells belonging to the given superpixel. <hr> $$ D = \sqrt{\left(\frac{d_c}{m}\right)^2 + \left(\frac{d_s}{S}\right)^2} $$ where `\(d_c\)` is the color (spectral) distance, `\(m\)` is the compactness parameter, `\(d_s\)` is the spatial (Euclidean) distance, and `\(S\)` is the interval between the initial cluster centers. ] .pull-right[ <img src="figs/ga0.gif" width="90%" style="display: block; margin: auto;" /> ] --- # SLIC The color (spectral) distance is calculated between values `\(I(x_i,y_i,s_p)\)` and `\(I(x_j,y_j,s_p)\)` for a spectral band `\(s_p\)` in the set of spectral bands `\(B\)`: $$ dc = \sqrt{\sum{p \in B}{(I(x_i,y_i,s_p)-I(x_j,y_j,s_p))^2}} $$ <br><hr><br> The spatial (Euclidean) distance between cells represents spatial proximity: $$ d_s = \sqrt{(x_j - x_i)^2 + (y_j - y_i)^2} $$ <br><hr><br> **The color distance controls the homogeneity of superpixels.** **The spatial distance is related to spatial contiguity.** --- # SLIC .pull-left[ **The SLIC algorithm works iteratively**, repeating the above process until it reaches the expected number of iterations. <!-- Experiments of Achanta et al. (2012) showed that **between 4 and 10 iterations suffices in the case of RGB images**. --> ] .pull-right[ <img src="figs/ga.gif" width="90%" style="display: block; margin: auto;" /> ] --- class: center, middle, clear <!-- # Problem --> ### As originally implemented by its authors, the SLIC algorithm has the RGB image hard-wired as input data. ### Thus, its geospatial applications remain restricted to images, RGB, multispectral, or hyperspectral. <!-- This short survey shows that the SLIC algorithm is as popular in the remote sensing domain as it is in the computer vision domain. However, its applications remain restricted to images, RGB, multispectral, or hyperspectral. --> <!-- To the best of our knowledge, the SLIC algorithm was not previously applied to non-imagery geospatial rasters. --> --- class: center, middle, clear ### We propose extension of SLIC that can be applied to non-imagery geospatial rasters that carry: ### - pattern information (co-occurrence matrices) ### - compositional information (histograms) ### - time-series information (ordered sequences) ### - other forms of information for which the use of Euclidean distance may not be justified --- # Extension **The extended SLIC allows using any distance measure to calculate the semantic distance -- `\(d_c\)` can be replaced with any distance/dissimilarity measure.** <br><hr><br> -- For example, **raster time-series could be compared with dynamic time warping**, while **distances between sets of categorical variables could be calculated using Jenson-Shannon distance**: $$ d_c = H(\frac{A + B}{2}) - \frac{1}{2}[H(A) + H(B)] $$ where A and B are normalized sets of values characterizing the compared cells, and `\(H(A)\)` and `\(H(B)\)` indicates values of Shannon's entropy for these sets: $$ H(A) = -\sum_{p \in A}{A_p log_2 A_p} $$ `\(A_p\)` is the `\(pth\)` value of the first of the compared cell. --- # Extension .pull-left[ Also, notice that in the SLIC iterations, **new cluster centers (centroids) have color values that are the average of all the cells belonging to the given superpixel**. However, different types of input variables could require different averaging functions. <!-- Simple averaging (mean) is proper for most of the continuous variables but is not appropriate for categorical ones. --> Therefore, **our extension also allows applying other averaging functions that just the mean**. ] .pull-right[ <img src="figs/logo.png" width="52%" style="display: block; margin: auto;" /> We implemented the above idea in the R programming language as an open-source package **supercells**. <!-- **The package allows for:** --> <!-- - any number of variables (raster layers) --> <!-- - the use of about 50 build-in distance measures (including Euclidean, Manhattan, Jensen-Shannon, and dynamic time wrapping), and accepts any user-defined ones --> <!-- - the use of any defined averaging function, including mean and median --> The package installation instructions and documentation can be found at https://jakubnowosad.com/supercells/. ] --- # Methods and software Workflow:
-- <hr> .lc[ Software: <img src="figs/Rlogo.png" width="50%" style="display: block; margin: auto;" /> ] .rc[ <br><br> and it's packages: **supercells**, **rgeoda**, **regional**, **sf**, **terra**, **ggplot2**, and **tmap** ] --- # Example 1: local texture or pattern The study area of 38x21 km is located in western Algeria and features a field of dunes (*Copernicus DEM*). <img src="figs/dunes_data.png" width="75%" style="display: block; margin: auto;" /> **The goal:** to count individual dunes automatically. --- # Example 1: local texture or pattern For extended and original SLIC, we selected parameters that **resulted in supercells large enough to contain dunes**. Next, **we classified supercells as dune/no dune** using the k-means clustering algorithm. -- .lc2[ <img src="figs/dunes-results.png" width="90%" style="display: block; margin: auto;" /> ] -- .rc2[ <br> | |Extended |Original | |:------------------|:--------|:--------| |Accuracy |**0.86** |0.78 | |True positive rate |**0.82** |0.66 | |True negative rate |**0.91** |0.87 | ] --- # Example 2: discrete probability distributions .lc[ A site located in the eastern Netherlands having the size of 507x1105 cells. Fractions of a pixel's area covered by different land cover classes (*Copernicus Global Land Service: 2019 Land Cover 100m-resolution data*). **The goal:** to regionalize fractional land cover data. <hr> The workflow: (a) **delineating supercells using SLIC**, and (b) **delineating regions** by performing regionalization using the SKATER algorithm. ] .rc[ <img src="figs/fig-data.png" width="90%" style="display: block; margin: auto;" /> ] --- # Example 2: discrete probability distributions .pull-left[ <img src="index_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> ] .pull-right[ **Extended SLIC**: an entire histogram (eight dimensions) and the Jensen-Shannon divergence. **Original SLIC**: a false-color image with values of RGB derived from the first three principal components of the data and the Euclidean distance. | |Area-weighted inhomogeneity |Isolation | |:----------------|:---------------------------|:---------| |Supercells (JSD) |**0.09** |**0.26** | |Supercells (EUC) |0.1 |0.25 | ] --- # Example 2: discrete probability distributions <img src="index_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> .pull-left[ SKATER workflows resulted in an approximately normal distribution of region areas. K-means workflow had power-law-like distribution of region areas. <!-- with more than half of the area concentrated in a single, largest region, and a large number of very small regions. These maps also reveal undersegmentation errors of the KM-CCL workflow. --> **Extended SLIC also performed better than the original one.** <!-- Recall that regionalization is obtained from partitioning of a weighted graph constructed on supercells where the weight values are data distances between linked supercells (section 2.2). Thus, the extents of regions will depend on whether the extended or original SLIC are used because the hierarchy of distances between supercells depends on how they are delineated. --> ] -- .pull-right[ | |Area-weighted inhomogeneity |Isolation | |:-------------------|:---------------------------|:---------| |Extended SLIC (468) |**0.13** |**0.48** | |K-means (468) |0.2 |0.39 | |Original SLIC (468) |0.17 |0.35 | ] --- # Example 3: time-series **Great Britain**. *WorldClim gridded climate data* was normalized to be between 0 and 1. <img src="index_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> The goal: to regionalize Great Britain's climates --- # Example 3: time-series **Extended SLIC workflow uses the dynamic time warping (DTW) distance** function rather than the Euclidean distance. .pull-left[ <img src="index_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="index_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> ] --- # Example 3: time-series .lc2[ <img src="figs/meteo-regs.png" width="80%" style="display: block; margin: auto;" /> *Seven regions* ] -- .rc2[ **Extended SLIC**: a more homogeneous regionalization. **Original SLIC**: more isolated regions. |SLIC |Inhomogeneity |Isolation | |:--------|:-------------|:---------| |extended |**0.30** |0.59 | |original |0.37 |**0.75** | <br><hr><br> **The raster of time series compressed from 24 dimensions to three principal components preserving 99% of variability.** <!-- The most compressible data was the raster of climatic time series, which compressed from 24 dimensions to three principal components preserving 99% of variability. --> ] --- class: left, top, clear2 .pull-left[ <img src="figs/dunes-results2a.png" width="844" style="display: block; margin: auto;" /> ## Summary - We propose the SLIC algorithm extension to work with non-imagery data structures without data reduction and conversion to the false-color image - It allows for using a data distance measure most appropriate to a particular data structure and a custom function for averaging values of clusters centers - We compared our extension and original SLIC algorithms on three examples of non-imagery data - An advantage of the extended SLIC is inversely proportional to the compressibility of the data to just three dimensions <!-- - other applications --> <img src="figs/dunes-results2b.png" width="844" style="display: block; margin: auto;" /> ] .pull-right[ ## Contact Twitter: [jakub_nowosad](https://twitter.com/jakub_nowosad) Website: https://jakubnowosad.com/, http://sil.uc.edu/ ## Resources Slides: [jakubnowosad.com/foss4g-2022](https://jakubnowosad.com/foss4g-2022) Articles: [conference paper](https://talks.osgeo.org/media/foss4g-2022-academic-track/submissions/VUQSVM/resources/foss4g2022_nowosadV3_REuhruh.pdf), [journal paper](https://doi.org/10.1016/j.jag.2022.102935) Software: [*spatial superpixels*](https://github.com/Nowosad/supercells), [*quality of regions*](https://github.com/Nowosad/regional), [*regionalization*](https://github.com/geodacenter/rgeoda) <hr> .font70[*This work was supported by the National Science Centre (Poland) under grant number 2019/03/X/ST10/00776, and the grant 038/04/NP/0020 funded by the Initiative of Excellence - Research University project at Adam Mickiewicz University, Poznan.*] ]