III Congreso & XIV Jornadas de Usuarios de R, Sevilla, Spain
2024-11-07
The following perspectives are incomplete and subjective
I am happy to discuss them further – catch me after the talk!
How to represent the real world with data?
Spatial vector data model: describing the real world as discrete objects
Simple features (sf) is an open standard developed and endorsed by the Open Geospatial Consortium (OGC)
Simple feature collection with 1 feature and 1 field
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 0.8 ymin: 1 xmax: 1 ymax: 1.2
Geodetic CRS: WGS 84
# A tibble: 1 × 2
id geom
<chr> <LINESTRING [°]>
1 1 (0.8 1, 0.8 1.2, 1 1.2)
Spatial raster data model: describing the real world as continuous surfaces
Raster maps represent continuous phenomena such as elevation, temperature, population density or spectral data
They also can represent discrete features such as soil or land-cover classes
class : SpatRaster
dimensions : 1450, 1115, 1 (nrow, ncol, nlyr)
resolution : 1000, 1000 (x, y)
extent : 995456.5, 2110457, 4741961, 6191961 (xmin, xmax, ymin, ymax)
coord. ref. : NZGD2000 / New Zealand Transverse Mercator 2000 (EPSG:2193)
source : nz_elev.tif
name : elevation
min value : 0.000
max value : 4140.333
Data cubes/Spatiotemporal Arrays: https://r-spatial.github.io/stars/
Meshes: https://rpubs.com/cyclemumner/mdsumner-geomesh-foss4g
Point clouds: https://r-lidar.github.io/lidRbook/
Geographical coordinates
Projected coordinates
CRSs are crucial for spatial data analysis:
Spatial statistics is a branch of statistics that explicitly takes location into account when describing and drawing inferences about the data.
Analysis type | Goal |
---|---|
Point pattern analysis | Understand the spatial distribution of points |
Network analysis | Understand the spatial relationships between objects |
Geostatistics | Understand the spatial variation of a continuous variable, with interpolation being a common task |
Areal data analysis | Understand the spatial variation of a variable aggregated over areas |
Spatial statistics is a branch of statistics that explicitly takes location into account when describing and drawing inferences about the data.
Analysis type | Data types |
---|---|
Point pattern analysis | Points, polygons (observation window), raster (e.g., density estimation) |
Network analysis | Lines (creating a network), points (nodes and edges) |
Geostatistics | Points (with values), raster (e.g., the interpolation or simulation results) |
Areal data analysis | Polygons (with values), neighborhood lists, weight lists |
Data types conversion exist in many R spatial statistics packages
spatstat:
sfnetworks:
spdep:
The First Law of Geography:
“Everything is related to everything else, but near things are more related than distant things.”
—Tobler, 1970
…but, at the same time:
Autocorrelation may be found in the data also due to, for example, omitted variables or inappropriate levels of aggregation
library(tmap)
library(gstat)
data("lsl", "study_mask", package = "spDataLarge")
lsl_sf = st_as_sf(lsl, coords = c("x", "y"))
plot(lsl_sf["lslpts"], cex = 2)
plot(variogram(lslpts ~ 1, locations = lsl_sf))
Strategies when models’ residuals have spatial autocorrelation:
Spatial simultaneous autoregressive error model:
Machine learning
The results of the hyperparameter tuning, feature selection, and (often) model evaluation depend on the selected cross-validation strategy.
Spatial data is stored in various file formats
Vector data:
Raster data:
Point clouds:
A shapefile example
myurl = paste0("/vsicurl/https://zenodo.org/record/6211990/files/",
"copernicus_DSM_100m_mood_bbox_epsg3035.tif") #~9.2 GB
dem = rast(myurl)
dem
class : SpatRaster
dimensions : 73590, 78430, 1 (nrow, ncol, nlyr)
resolution : 100, 100 (x, y)
extent : 869000, 8712000, -485000, 6874000 (xmin, xmax, ymin, ymax)
coord. ref. : ETRS89-extended / LAEA Europe (EPSG:3035)
source : copernicus_DSM_100m_mood_bbox_epsg3035.tif
name : copernicus_DSM_100m_mood_bbox_epsg3035
min value : -427025
max value : 5606973
Open big Earth Observation (EO) data availability has driven cloud computing solutions
Google Earth Engine (GEE):
Microsoft Planetary Computer (MPC):
Open big Earth Observation (EO) data availability has driven cloud computing solutions
Google Earth Engine (GEE):
Microsoft Planetary Computer (MPC):
R cartography
A lot of developments in the last 10 years
At the same time, there seems to be many challenges and opportunities ahead – how to reimagine cartography using programming languages?
(Keep in mind to use the right tool for the right job)
Many geospatial methods are generalizable
Learn
A lot of resources are available: books, websites, and courses
Apply
Think if you can use R for your spatial data analysis: your domain expertise is crucial
Collaborate
Conversation is needed: learning from other successes and mistakes; learning from other fields
Extend
More people are welcome to the R spatial community: there are still many things to do
Learn
Apply
Collaborate
Extend
Mastodon: fosstodon.org/@nowosad
Website: https://jakubnowosad.com
http://jakubnowosad.com/IIIRqueR/