Best practices in code organization

“R as GIS” course

Jakub Nowosad, https://jakubnowosad.com/
Remote Sensing and Spatial Modelling Research Group, University of Münster

March, 2025

Reproduciblity spectrum

  • Reproducible/Replicable
  • Not only for publications!

Yes, but why?

Internal reasons:

  • To reproduce; To replicate
  • To fix/update/modify
  • To extend
  • To share
  • (To not repeat ourselves)

External reproduciblity:

  • Reproducible
  • Robust
  • Transparent
  • Reusable
  • Shareable

Story of my life

R code

library(terra)
library(supercells)
# setwd("")
s <- rast(system.file("ex/logo.tif", package = "terra"))   
sc = supercells(s, 500, compactness = 50,transform='to_LAB')
sck = kmeans(sf::st_drop_geometry(sc[4:6]), centers = 10)
plot(sf::st_geometry(sc[0]), col = sck$cluster)

  • Issue: working directory
  • Issue: code style
  • Issue: randomness

R code organization: a code style

Code style:

  • Name your objects in an understandable way, e.g., temperature = c(10, 15) vs x = c(10, 15)
  • Pick your naming convention, e.g., a snake case: snake_case
  • A proper use of spaces, e.g.
average = mean(values, na.rm = TRUE)
# vs
average=mean ( values,na.rm= TRUE ) 

R code organization: a single script

Naming of the scripts:

  • File names should meet three requirements: be easy (i) to read by a computer, (ii) to read by a human, (iii) to sort.
  • Do not use spaces in the file names

Code organization:

  • Attach packages at the beginning of the script
  • Comment your code (for yourself and others), but you do not need to explain every line of code
  • Specify the random number generation seed (set.seed) when necessary
  • Try to keep each line of code to be under 80 characters
  • Use relative paths over absolute paths

R code

RStudio: File > New Project > New Directory -> New Project -> …

Also: clear environment + restart R

R project organization

A possible organization of the files in a project:

project  
    ├── README.md
    ├── data  
    ├── figs  
    ├── R  
    └── raw-data  

Note the distinction between raw data and processed data.

{reprex}: reproducible example

  • Why: to ask a question; to report a bug; to fix a bug; to showcase some examples; …

  • Input: minimal code allowing to reproduce your problem/example (strip away everything that is not directly related to your problem)

  • Output: resulting runnable code + output as Markdown (including code results and plots) + (optionally) session info

{reprex}: reproducible example

Additional topics

  • Version control, e.g., git with GitHub or Codeberg: helps to keep track of changes in the code, to collaborate with others, and to share the code
  • R packages: helps to organize the code, to share the code, and to document the code
  • {renv}: helps to create reproducible environments for your R projects (e.g., each project has its own private library)
  • {targets}: creates a reproducible workflow; it skips costly runtime for tasks that are already up to date; it allows to easy parallelization of the tasks
  • Docker: creating containers that include all dependencies and more
  • CI/CD: continuous integration (CI) and continuous deployment (CD)