class: center, middle, inverse, title-slide # Reproducible spatial data analysis: ## An example of transportation analysis for Bristol ### Jakub Nowosad
https://nowosad.github.io
### Collegium Da Vinci, 2019-01-29, Poznań --- # Spatial data analysis .pull-left[ - Traditionally spatial analysis is done using a desktop Geographic Information System (GIS) such as QGIS, ArcMap, GRASS or SAGA <table> <thead> <tr> <th style="text-align:left;"> Attribute </th> <th style="text-align:left;"> Desktop GIS (GUI) </th> <th style="text-align:left;"> R </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Home disciplines </td> <td style="text-align:left;"> Geography </td> <td style="text-align:left;"> Computing, Statistics </td> </tr> <tr> <td style="text-align:left;"> Software focus </td> <td style="text-align:left;"> Graphical User Interface </td> <td style="text-align:left;"> Command line </td> </tr> <tr> <td style="text-align:left;"> Reproducibility </td> <td style="text-align:left;"> Minimal </td> <td style="text-align:left;"> Maximal </td> </tr> </tbody> </table> ] .pull-right[ <img src="figs/qgis.png" width="90%" style="display: block; margin: auto;" /><img src="figs/r.png" width="90%" style="display: block; margin: auto;" /> ] --- # Geocomputation .pull-left[ Geocomputation: - Facilitates **the automation of repetitive tasks** - Enables **transparency and reproducibility**, the backbone of good scientific practice and data science - Encourages **software development** by providing tools to modify existing functions and implement new ones - Helps develop **future-proof programming skills** which are in high demand in many disciplines and industries - Is **user-friendly and fast**, allowing an efficient workflow ] .pull-right[ <img src="figs/qgis.png" width="90%" style="display: block; margin: auto;" /><img src="figs/r.png" width="90%" style="display: block; margin: auto;" /> ] --- # Geocomputation with R An important feature of R (and Python) is that it is an interpreted language. This is advantageous because it enables interactive programming in a Read–Eval–Print Loop (REPL): code entered into the console is immediately executed and the result is printed, rather than waiting for the intermediate stage of compilation. R packages: - **sf**, **raster** - spatial classes - **dplyr**, **rmapshaper** - processing of atttribute tables/geometries - **rnaturalearth**, **osmdata**, **getlandsat** - spatial data download - **rgrass7**, **RQGIS**, **RSAGA**, **link2GI** - connecting with GIS software - **gstat**, **mlr**, **CAST** - spatial data modeling - **rasterVis**, **tmap**, **ggplot** - static visualizations - **leaflet**, **mapview**, **mapdeck** - interactive visualizations - many more... Learn more at https://cran.r-project.org/web/views/Spatial.html. --- class: inverse, center, middle # Basics --- # R <img src="figs/geocompr.gif" width="90%" style="display: block; margin: auto;" /> --- # Vector data .pull-left[ <img src="figs/sf-classes.png" height="550" style="display: block; margin: auto;" /> ] .pull-right[ Examples: - Location of a building - Road - River - Administrative unit ] --- # Vector data .pull-left[
] .pull-right[ ```r library(sf) head(seine) ``` ``` ## Simple feature collection with 3 features and 1 field ## geometry type: MULTILINESTRING ## dimension: XY ## bbox: xmin: 518344.7 ymin: 6660431 xmax: 879955.3 ymax: 6938864 ## epsg (SRID): 2154 ## proj4string: +proj=lcc +lat_1=49 +lat_2=44 +lat_0=46.5 +lon_0=3 +x_0=700000 +y_0=6600000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs ## name geometry ## 1 Marne MULTILINESTRING ((879955.3 ... ## 2 Seine MULTILINESTRING ((828893.6 ... ## 3 Yonne MULTILINESTRING ((773482.1 ... ``` <!-- - Model wektorowy danych oparty jest na punktach położonych wewnątrz danego układu współrzędnych --> - The **sf** package provides a class system for spatial vector data - Additionally, this package uses the PROJ, GDAL and GEOS libraries, which allows for transformation between different coordinate reference systems, reading and writing many spatial data file formats, and executing geometric operations ] --- # Raster data <img src="figs/raster-intro-plot2-1.png" height="400" style="display: block; margin: auto;" /> - Raster maps usually represent continuous phenomena such as elevation, temperature, population density or spectral data - Of course, we can represent discrete features such as soil or land-cover classes also with the help of a raster data model --- # Raster data .lc[ - The **raster** package provides a class system for spatial raster data, which consists of simple `RasterLayer`, and multilayer `RasterStack` and `RasterBrick` - Operations on small raster are executed in RAM, and there is also a possibility of processing of large rasters by dividing them into smaller chunks ] .rc[ <img src="figs/raster-intro-plot-1.png" height="200" style="display: block; margin: auto;" /> ```r library(raster) elev ``` ``` ## class : RasterLayer ## dimensions : 6, 6, 36 (nrow, ncol, ncell) ## resolution : 0.5, 0.5 (x, y) ## extent : -1.5, 1.5, -1.5, 1.5 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ## data source : in memory ## names : layer ## values : 1, 36 (min, max) ``` ] --- class: inverse, center, middle # Transportation analysis --- # Introduction > The purpose of transportation is to overcome space (Rodrigue, Comtois, and Slack 2013) - Transport involves traversing continuous geographic space between A and B, and infinite localities in between - **The purpose of geographic transport modeling is simplifying the complexity in a way that captures the essence of transport problems** - Selecting an appropriate level of geographic analysis can help simplify this complexity <!-- about transportation --> --- # Introduction The geographic analysis of transport systems could be done at different geographic levels, including: - **Areal units**: transport patterns can be understood with reference to zonal aggregates such as the main mode of travel (by car, bike or foot, for example) and the average distance of trips made by people living in a particular zone - **Desire lines**: straight lines that represent ‘origin-destination’ data that records how many people travel (or could travel) between places (points or zones) in geographic space - **Routes**: these are lines representing a path along the route network along the desire lines defined in the previous bullet point - **Nodes**: these are points in the transport system that can represent common origins and destinations and public transport stations such as bus stops and rail stations - **Route networks**: these represent the system of roads, paths and other linear features in an area. They can be represented as geographic features (representing route segments) or structured as an interconnected graph, with the level of traffic on different segments referred to as ‘flow’ by transport modelers <!-- different geographic levels --> --- # Bristol .pull-left[ Typically, models are designed to solve a particular problem. - **Bristol - a city in the west of England with a population of half a million people** - In terms of transport, Bristol is well served by rail and road links and has a relatively high level of active travel. 19% of its citizens' cycle and 88% walk at least once per month according to the Active People Survey (the national average is 15% and 81%, respectively). - 8% of the population said they cycled work in the 2011 census, compared with only 3% nationwide - Despite impressive walking and cycling statistics, **the city has a major congestion problem** ] .pull-right[ <img src="figs/bristol.png" height="550" style="display: block; margin: auto;" /> ] --- # Aims .pull-left[ - For this reason, the question is **how best to increase the share of walking and cycling in the city of Bristol?** This high-level aim will be met via the following objectives: - Describe the geographical pattern of transport behavior in the city - Identify key public transport nodes and routes along which cycling to rail stations could be encouraged, as the first stage in multi-model trips - Analyze travel ‘desire lines’, to find where many people drive short distances - Identify cycle route locations that will encourage less car driving and more cycling ] .pull-right[ <img src="figs/bristol.png" height="550" style="display: block; margin: auto;" /> ] --- # Setup ```r library(sf) # spatial data classes library(dplyr) # data manipulation library(spDataLarge) # example datasets library(stplanr) # transport planning library(tmap) # visualization package ``` --- # Transportation zones Two zone types will typically be of particular interest: - **origin** (typically residential areas) zones - **destination** (typically containing ‘trip attractors’ such as offices, schools, and shops) zones The origin and destination zones used here are officially defined zones of intermediate geographic resolution (their official name is Middle layer Super Output Areas or MSOAs). Each houses around 8,000 people .lc[ <img src="index_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> ] .rc[ <table> <thead> <tr> <th style="text-align:left;"> o </th> <th style="text-align:left;"> d </th> <th style="text-align:right;"> all </th> <th style="text-align:right;"> bicycle </th> <th style="text-align:right;"> foot </th> <th style="text-align:right;"> car_driver </th> <th style="text-align:right;"> train </th> <th style="text-align:right;"> Active </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> E02002985 </td> <td style="text-align:left;"> E02002985 </td> <td style="text-align:right;"> 209 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 127 </td> <td style="text-align:right;"> 59 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 63.157895 </td> </tr> <tr> <td style="text-align:left;"> E02002985 </td> <td style="text-align:left;"> E02002987 </td> <td style="text-align:right;"> 121 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 35 </td> <td style="text-align:right;"> 62 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 34.710744 </td> </tr> <tr> <td style="text-align:left;"> E02002985 </td> <td style="text-align:left;"> E02003036 </td> <td style="text-align:right;"> 32 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 9.375000 </td> </tr> <tr> <td style="text-align:left;"> E02002985 </td> <td style="text-align:left;"> E02003043 </td> <td style="text-align:right;"> 141 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 56 </td> <td style="text-align:right;"> 17 </td> <td style="text-align:right;"> 2.127660 </td> </tr> <tr> <td style="text-align:left;"> E02002985 </td> <td style="text-align:left;"> E02003049 </td> <td style="text-align:right;"> 56 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 36 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 10.714286 </td> </tr> <tr> <td style="text-align:left;"> E02002985 </td> <td style="text-align:left;"> E02003054 </td> <td style="text-align:right;"> 42 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 9.523810 </td> </tr> <tr> <td style="text-align:left;"> E02002985 </td> <td style="text-align:left;"> E02003100 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 19 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.000000 </td> </tr> <tr> <td style="text-align:left;"> E02002985 </td> <td style="text-align:left;"> E02003106 </td> <td style="text-align:right;"> 48 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 33 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 8.333333 </td> </tr> <tr> <td style="text-align:left;"> E02002985 </td> <td style="text-align:left;"> E02003108 </td> <td style="text-align:right;"> 31 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.000000 </td> </tr> <tr> <td style="text-align:left;"> E02002985 </td> <td style="text-align:left;"> E02003121 </td> <td style="text-align:right;"> 42 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 34 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 7.142857 </td> </tr> </tbody> </table> ] --- # Transportation zones .pull-left[ ```r zones_origin = bristol_od %>% group_by(o) %>% summarize_if(is.numeric, sum) %>% dplyr::select(geo_code = o, all_origin = all) zones_origin ``` ``` ## # A tibble: 102 x 2 ## geo_code all_origin ## <chr> <int> ## 1 E02002985 868 ## 2 E02002987 898 ## 3 E02003005 786 ## 4 E02003012 3312 ## 5 E02003013 3715 ## 6 E02003014 2220 ## 7 E02003015 1633 ## 8 E02003016 2411 ## 9 E02003017 1590 ## 10 E02003018 1690 ## # ... with 92 more rows ``` ] .pull-right[ ```r zones_dest = bristol_od %>% group_by(d) %>% summarize_if(is.numeric, sum) %>% dplyr::select(geo_code = d, all_dest = all) zones_dest ``` ``` ## # A tibble: 102 x 2 ## geo_code all_dest ## <chr> <int> ## 1 E02002985 744 ## 2 E02002987 561 ## 3 E02003005 427 ## 4 E02003012 701 ## 5 E02003013 940 ## 6 E02003014 3469 ## 7 E02003015 4980 ## 8 E02003016 297 ## 9 E02003017 1459 ## 10 E02003018 128 ## # ... with 92 more rows ``` ] --- # Transportation zones ```r zones_origin = left_join(bristol_zones, zones_origin, by = "geo_code") ``` <img src="index_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> --- # Desire lines - Unlike zones, which represent trip origins and destinations, **desire lines connect the centroid of the origin and the destination zone**, and thereby represent where people desire to go between zones - They represent the quickest ‘crow flies’ route between A and B that would be taken, if it were not for obstacles such as buildings and windy roads getting in the way ```r od_inter = filter(bristol_od, o != d) desire_lines = od2line(od_inter, zones_od) ``` ```r bristol_od$Active = (bristol_od$bicycle + bristol_od$foot) / bristol_od$all * 100 ``` --- # Desire lines .lc[ - **The map shows that the city center dominates transport patterns in the region, suggesting policies should be prioritized there**, although a number of peripheral sub-centers can also be seen - Next, it would be interesting to have a look at the distribution of interzonal modes, e.g. between which zones is cycling the least mean of transport ] .rc[ <img src="index_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> ] --- # Routes - **Routes are desire lines that are no longer straight** - **Routes are generated from desire lines** — or more commonly origin-destination pairs — **using routing services** - The benefits of cycling trips are greatest when they replace car trips - **5 km Euclidean distance (or around 6-8 km of route distance) can realistically be cycled by many people** - We will therefore only route desire lines along which a high (300+) number of car trips take place that are up to 5 km in distance: ```r desire_lines$distance = as.numeric(st_length(desire_lines)) desire_carshort = dplyr::filter(desire_lines, car_driver > 300 & distance < 5000) route_carshort = line2route(desire_carshort, route_fun = route_osrm) ``` --- # Routes .pull-left[ - **Plotting the results shows that many short car trips take place in and around Bradley Stoke** - Why? Bradley Stoke is “Europe’s largest new town built with private investment”, suggesting limited public transport provision - Furthermore, the town is surrounded by large (cycling unfriendly) road structures, “such as junctions on both the M4 and M5 motorways” ] .pull-right[
] --- # Nodes .pull-left[ - Nodes in geographic transport data are zero-dimensional features (points) - From an active travel perspective, public transport ‘legs’ of longer journeys divide trips into three: * **The origin leg**, typically from residential areas to public transport stations * **The public transport leg**, which typically goes from the station nearest a trip’s origin to the station nearest its destination * **The destination leg**, from the station of alighting to the destination ] .pull-right[ - We will use railway stations to illustrate public transport nodes
] --- # Nodes - As an example, we will select only the top three desire lines in terms of rails use: ```r desire_rail = top_n(desire_lines, n = 3, wt = train) ``` <center>
</center> --- # Nodes - The challenge now is to ‘break-up’ each of these lines into three pieces, representing travel via public transport nodes: ```r desire_rail = line_via(desire_rail, bristol_stations) ``` <center>
</center> --- # Route networks - Route networks can usefully be represented as mathematical graphs, with nodes on the network connected by edges - As an example, we will select only the roads with maximum speed of 70 mph (~113 km/h) : ```r ways_freeway = bristol_ways %>% filter(maxspeed == "70 mph") ``` <center>
</center> --- # Route networks - In the example below, the ‘edge betweenness’, meaning the number of shortest paths passing through each edge, is calculated: ``` ways_sln = SpatialLinesNetwork(ways_freeway) ways_sln@sl$e = igraph::edge_betweenness(ways_sln@g) ``` <center>
</center> --- # Prioritizing new infrastructure .pull-left[ ```r route_rail = desire_rail %>% st_set_geometry("leg_orig") %>% line2route(route_fun = route_osrm) %>% st_set_crs(4326) route_cycleway = rbind(route_rail, route_carshort) route_cycleway$all = c(desire_rail$all, desire_carshort$all) ``` - The results (blue lines) shows routes with high levels of car dependency and highlights opportunities for cycling rail stations ] .pull-right[