Reproducibility in R: three things

A T rex in sunglasses with the text 'works on my machine'.

Avoid being this guy (Threddy the T. rex via Giphy)

Reproducevangelism

I spoke at the Department for Education’s Data Science Week. I wanted everyone – newer and more experienced users alike – to learn at least one new thing about reproduciblity with R and RStudio.

The slides are embedded below and you can also get them fullscreen online (press ‘F’ for fullscreen and ‘P’ for presenter notes) and find the source on GitHub.

Three things

The three things to achieve reproducibility were very broad. I focused on R and some specific packages that could be helpful, but the ideas are transferable and there’s lots of ways to achieve the same thing.

The things were:

1. Centralise everything

Get code, functions, data, documentation in one place. Use R Projects in RStudio and write packages. This makes code more shareable and improves the chance that others can recreate things on their machine.

2. Report with code

Put code inside your report so that updates to data and code will be reflected instantly. Use R Markdown and other formats like Yihui Xie’s {xaringan} for reproducible slides and {bookdown} by for reproducible books.

3. Manage workflows

Don’t use your brain to store information about the dependencies within your analysis. Use {drake} by Will Landau instead. It remembers all the relationships between the files, objects and fcuntions in your analysis and only re-runs what needs to be re-run following changes.

Acknowledgements

I keep referring to the same resources about reproducibility. Take a look at: