4 min read

R package check with Docker on Codeship

Motivation

At Emarsys we have several internal private R packages which are essential to our day-to-day work. They are also under constant development. We want to work on these in a collaborative and safe way. Fast and trustworthy CI & CD are essential: they reduce the cost and risk of adding a small but useful feature to one of our packages.

Although in the R community Travis is the standard for CI, at Emarsys the whole company uses Codeship happily so we wanted to give it a go.

Our first try

At first we started with Codeship Basic, i.e. without using Docker. Starting from this script we managed to setup a build. However the build was extremely slow: ~ 15 minutes. We managed to cache installed R packages, however we did not quickly found a way to avoid reinstalling R from source every time which took up most of the check time. We also missed a straightforward way to handle missing system dependencies thus we could not buid vignettes using rmarkdown v2. You can ask Codeship support to install something you need on its standard machines. However, I prefer more direct control. You can see the details here.

Using Docker with Codeship Pro

Although I barely used Docker before, setting up a build with Docker based on tutorials was fairly easy and more importantly, builds are much faster now: under 2 minutes. (similar to Travis in case packages are cached on both.)

Besides these obvious benefits I also gained a better understanding of system dependencies of R packages and Docker in general. You can also test building the docker image locally and running commands within your docker container:

docker build -t dockerimagename .

docker run --rm -i -t dockerimagename /bin/bash

Here is a dummy R package showing the usage of Dockerfile, codeship-steps.yml and codeship-services.yml.

Dockerfile

If you haven’t heard of Docker yet, think of it as a virtual machine with Linux.

rocker/verse provides an installed R and critical system dependencies. Based on this you can install system dependencies, run R commands, etc in your Docker container or add files from your repository to your Docker container with the ADD command. Everything else is run in Docker. Every command creates a new Docker image which is cached independently from others. It means if you change a command in your Dockerfile (or an added file) codeship will reuse the cached image up until that command and re-run all commands after the effected command.

FROM rocker/r-ver:latest

RUN apt-get update \
  && apt-get install -y --no-install-recommends \
  libcurl4-openssl-dev \
  zlib1g-dev \
  libssl-dev \
  libxml2-dev

RUN mkdir /rpkg
WORKDIR /rpkg

ADD ./packrat /rpkg/packrat
RUN R --vanilla --slave -f "packrat/init.R" --args --bootstrap-packrat

ADD . /rpkg

This practically means that you only have to wait for installing system depencies once. You can start with a clean cache anytime you want.

The codeship-steps.yml and codeship-services.yml could be much more complicated, e.g. generally you can have more services in one repository, use external dependencies like database, etc. The case of an R package is fairly simple though. I used devtools::check but you can run any R command.

Packrat

We use packrat for tracking used package versions in each of our R projects - thus internal packages as well. The packrat folder contains not only the packages our package imports but packages solely used for development as well, most importantly devtools. So that we do not duplicate the work packrat does we wanted to install packages from the project specific packrat library on Codeship within Docker as well. This is achieved by running

RUN R --vanilla --slave -f "packrat/init.R" --args --bootstrap-packrat

after adding only the packrat folder (without lib, lib-R, lib-ext) to our Docker image first. This is crucial so that the relatively slow installing of packages runs only if a new dependency is added, not after every change in the R code. However, note that this way every package is reinstalled if you add a new dependency as for Docker one command is a unit, it does not know anything about packrat or R packages.

After installing packages we can move the rest of our repository to the docker image with ADD . /rpkg

Dependencies

Dependencies needed for devtools, knitr, roxygen2:

  • libcurl4-openssl-dev
  • zlib1g-dev
  • libssl-dev
  • libxml2-dev

Custom dependencies:

  • libmariadb-client-lgpl-dev (for RMySQL)
  • libpq-dev (RPostgreSQL)
  • pandoc (rmarkdown v2)
  • pandoc-citeproc (rmarkdown v2)
  • qpdf (vignette pdf check ?)

The so called recommended packages are usually installed alongside R and not necessarily show up in the packrat Requires field which can cause surprises as rocker/verse does not install them. So far we needed MASS for ggplot2 and lattice for zoo. We installed these before packrat bootstrap with RUN R -e "install.packages("MASS").

Future

The next step is to build an actual pipeline: after checking and building our R package we should push to our internal drat repository (triggered by a release tag).

We could also build a Docker image based on rocker/verse to contain the system dependencies we need to check most of our packages.

We will think about installing recommended packages always in the Dockerfile or choose a different base Docker image containing them.