Best practices in code organization

“R as GIS” course

Jakub Nowosad, https://jakubnowosad.com/
Remote Sensing and Spatial Modelling Research Group, University of Münster

March, 2025

Reproduciblity spectrum

Based on: https://doi.org/10.1126/science.1213847

Reproducible/Replicable
Not only for publications!

Yes, but why?

Internal reasons:

To reproduce; To replicate
To fix/update/modify
To extend
To share
(To not repeat ourselves)

External reproduciblity:

Reproducible
Robust
Transparent
Reusable
Shareable

Story of my life

Issue: human memory
http://dx.doi.org/10.2478/quageo-2014-0005
Solution: code!

R code

library(terra)
library(supercells)
# setwd("")
s <- rast(system.file("ex/logo.tif", package = "terra"))   
sc = supercells(s, 500, compactness = 50,transform='to_LAB')
sck = kmeans(sf::st_drop_geometry(sc[4:6]), centers = 10)
plot(sf::st_geometry(sc[0]), col = sck$cluster)

Issue: working directory
Issue: code style
Issue: randomness

R code organization: a code style

Code style:

Name your objects in an understandable way, e.g., temperature = c(10, 15) vs x = c(10, 15)
Pick your naming convention, e.g., a snake case: snake_case
A proper use of spaces, e.g.

average = mean(values, na.rm = TRUE)
# vs
average=mean ( values,na.rm= TRUE )

Use a consistent code style
Some suggested code style guides exist
Some tools exist that help to keep a consistent style

R code organization: a single script

Naming of the scripts:

File names should meet three requirements: be easy (i) to read by a computer, (ii) to read by a human, (iii) to sort.
Do not use spaces in the file names

Code organization:

Attach packages at the beginning of the script
Comment your code (for yourself and others), but you do not need to explain every line of code
Specify the random number generation seed (set.seed) when necessary
Try to keep each line of code to be under 80 characters
Use relative paths over absolute paths

R code

RStudio: File > New Project > New Directory -> New Project -> …

Also: clear environment + restart R

R project organization

A possible organization of the files in a project:

project  
    ├── README.md
    ├── data  
    ├── figs  
    ├── R  
    └── raw-data

Note the distinction between raw data and processed data.

{reprex}: reproducible example

Why: to ask a question; to report a bug; to fix a bug; to showcase some examples; …
Input: minimal code allowing to reproduce your problem/example (strip away everything that is not directly related to your problem)
Output: resulting runnable code + output as Markdown (including code results and plots) + (optionally) session info

{reprex}: reproducible example

Additional topics

Version control, e.g., git with GitHub or Codeberg: helps to keep track of changes in the code, to collaborate with others, and to share the code
R packages: helps to organize the code, to share the code, and to document the code
{renv}: helps to create reproducible environments for your R projects (e.g., each project has its own private library)
{targets}: creates a reproducible workflow; it skips costly runtime for tasks that are already up to date; it allows to easy parallelization of the tasks
Docker: creating containers that include all dependencies and more
CI/CD: continuous integration (CI) and continuous deployment (CD)