Thank you package-makers
Iβve used a lot of packages in 2019 and many have brought great joy to my R experience. Thank you to everyone who has created, maintained or contributed to a package this year.
Some particular packages of note for me have been:
- π€ {usethis} by Hadley Wickham and Jenny Bryan
- π¦ {drake} by Will Landau
- π {purrr} by Lionel Henry and Hadley Wickham
And some honourable mentions are:
- π {blogdown} by Yihui Xie
- βοΈ {xaringan} by Yihui Xie
- π {polite} by Dmytro Perepolkin
- βοΈ {arsenal} by Ethan Heinzen, Jason Sinnwell, Elizabeth Atkinson, Tina Gunderson and Gregory Dougherty
Click the package name to jump to that section.
Packages of note
{usethis}
The format and content of R packages is objectively odd. What files are necessary? What structure should it have? The {usethis} package from RStudioβs Hadley Wickham and Jenny Bryan makes it far easier for newcomers and experienced useRs alike.
In fact, you can make a minimal package in two lines:
create_package()
to create the necessary package structureuse_r()
to create in the right place an R script for your functions
But thereβs way more functions to help you set up your package. To name a few more that I use regularly:
use_vignette()
anduse_readme_md()
for more documentationuse_testthat()
anduse_test()
for setting up testsuse_package()
to add packages to theImports
section of theDESCRIPTION
fileuse_data()
anduse_data_raw()
to add data sets to the package and the code used to create themuse_*_license()
to add a license
There are also other flavours of function like git_*()
and pr_*()
to work with version control and proj_*()
for working with RStudio Projects.
I focused this year on making different types of package. {usethis} made it much easier to develop:
- {altcheckr} to read and assess image alt text from web pages
- {oystr} to handle London travel-history data from an Oyster card
- {gdstheme} to use a {xaringan} presentation theme and template
- {blogsnip} to insert blog-related code snippets via an RStudio addin (thereβs even a
use_addin()
function to create the all-importantinst/rstudio/addins.dcf
file)
For more package-development info, I recommend Emil Hvitfeldtβs {usethis} workflow, as well as Karl Bromanβs R Package Primer and Hadley Wickhamβs R Packages book. To help me remember this stuff, I also wrote some slides about developing a package from scratch with {usethis} functions.
{drake}
Your analysis has got 12 input data files. They pass through 15 functions There are some computationally-intensive, long-running processes. Plots and tables are produced and R Markdown files are rendered. How do you keep on top of this? Is it enough to have a set of numbered script files (01_read.R
, etc) or a single script file that sources the rest? What if something changes? Do you have to re-run everything from scratch?
You need a workflow manager. Save yourself some hassle and use Will Landauβs {drake} package, backed by rOpenSciβs peer review process. {drake} βremembersβ all the dependencies between files and only re-runs what needs to be re-run if any errors are found or changes are made. It also provides visualisations of your workflow and allows for high-performance computing.
In short, you:
- Supply the steps of your analysis as functions to
drake_plan()
, which generates a data frame of commands (functions) to operate over a set of targets (objects) - Run
make()
on your plan to run the steps and generate the outputs - If required, make changes anywhere in your workflow and re-
make()
the plan β {drake} will only re-run things that are dependent on what you changed
Below is an extreme example from a happy customer (click through to the image if you canβt see the embedded tweet). Each point on the graph is an object or function; black ones are out of date and will be updated when make()
is next run.
I'm *so* glad {drake} is tracking those dependencies between #rstats computations for me. pic.twitter.com/QsqCAH8Kg7
— FrederikAust@fediscience.org (@FrederikAust) December 12, 2019
Itβs hard to do {drake} justice in just a few paragraphs, but luckily itβs one of the best-documented packages out there. Take a look at:
- the {drake} rOpenSci website
- the thorough user manual
- the learndrake GitHub repo, which can be launched in the cloud
- the drakeplanner Shiny app
- Willβs {drake} examples page
- this rOpenSci community call
- a Journal of Open Source Software (JOSS) paper
- more things listed in the documentation section of the user manual
I wrote about {drake} earlier in the year and made a demo and some slides. I think it could be useful for reproducibility of statistical publications in particular.
{purrr}
You want to apply a function over the elements of some list or vector.
The map()
family of functions from the {purrr} packageβby Lionel Henry and Hadley Wickham of RStudioβhas a concise and consistent syntax for doing this.
You can choose what gets returned from your iterations by selecting the appropriate map_*()
variant: map()
for a list, map_df()
for a data frame, map_chr()
for a character vector and so on. Hereβs a trivial example that counts the number of Street Fighter characters from selected continents. Hereβs a list:
# Create the example list
street_fighter <- list(
china = "Chun Li", japan = c("Ryu", "E Honda"),
usa = c("Ken", "Guile", "Balrog"), `???` = "M Bison"
)
street_fighter # take a look at the list
## $china
## [1] "Chun Li"
##
## $japan
## [1] "Ryu" "E Honda"
##
## $usa
## [1] "Ken" "Guile" "Balrog"
##
## $`???`
## [1] "M Bison"
Now to map the length()
function to each element of the list and return a named integer vector.
library(purrr) # load the package
# Get the length of each list element
purrr::map_int(
street_fighter, # list
length # function
)
## china japan usa ???
## 1 2 3 1
But what if you want to iterate over two or more elements? You can use map2()
or pmap()
. And what if you want to get the side effects? walk()
and pwalk()
.
{purrr} is also great for working with data frames with columns that contain lists (listcols), like the starwars
data from the {dplyr} package. Letβs use the length()
function again, but in the context of a listcol, to get the characters in the most films.
# Load packages
suppressPackageStartupMessages(library(dplyr))
library(purrr)
# map() a listcol within a mutate() call
starwars %>%
mutate(films_count = map_int(films, length)) %>%
select(name, films, films_count) %>%
arrange(desc(films_count)) %>% head()
## # A tibble: 6 x 3
## name films films_count
## <chr> <list> <int>
## 1 R2-D2 <chr [7]> 7
## 2 C-3PO <chr [6]> 6
## 3 Obi-Wan Kenobi <chr [6]> 6
## 4 Luke Skywalker <chr [5]> 5
## 5 Leia Organa <chr [5]> 5
## 6 Chewbacca <chr [5]> 5
Why not just write a loop or use the *apply
functions? Jenny Bryan has a good {purrr} tutorial that explains why you might consider either choice. Basically, do what you feel; I like the syntax consistency and the ability to predict what function I need based on its name.
Check out the excellent {purrr} cheatsheet for some prompts and excellent visual guidance.
Honourable mentions
{blogdown}
This blog, and Iβm sure many others, wouldnβt exist without {blogdown} by Yihui Xie. {blogdown} lets you write and render R Markdown files into blog posts via static site generators like Hugo. This is brilliant if youβre trying to get R output into a blog post with minimal fuss. The {blogdown} book by Yihui, Amber Thomas, Alison Presmanes Hill is particularly helpful.
{xaringan}
{xaringan} is another great package from Yihui Xie that lets you turn R Markdown into a slideshow using remark.js. Itβs very customisable via CSS, to the extent that I was able to mimic the house style of my organisation this year. One of my favourite functions1 is inf_mr()
(Infinite Moon Reader), which lets you live-preview your outputs as theyβre written.
{polite}
Web scraping is ethically dubious if you fail to respect the terms of the sites youβre visiting. Dmytro Perepolkin has made it easy to be a good citizen of the internet with the {polite} package, which has just hit version 1.0.0 and is on CRAN (congratulations!). First you introduce yourself to the site with a bow()
and collect any information about limits and no-go pages from the robots.txt file, then you can modify search paths with a nod()
and collect information from them with a scrape()
. Very responsible.
{arsenal}
Iβve been using the handy2 {arsenal} package to compare data frames as part of a quality assurance process. First, you supply two data frames to comparedf()
to create a βcompareβ object. Run diffs()
on that object to create a new data frame where each row is a mismatch, given a tolerance, with columns for the location and values that are causing problems. We managed to quality assure nearly a million values with this method in next to no time. Check out their vignette on how to do this.
Bonus!
{govdown}
Aha, well done for reading this far. As a bonus, Iβm calling out Duncan Garmonswayβs {govdown} package. Duncan grappled with the complexities of things like Pandoc and Lua filters to build a package that applies the accessibility-friendly GOV.UK design system to R Markdown. This means you can create things like the the Reproducible Analaytical Pipelines (RAP) website in the style of GOV.UK. Endorsed by Yihui Xie himself! Check out Duncanβs {tidyxl} and {unpivotr} packages for handling nightmare Excel files while youβre at it.
Session info
## [1] "Last updated 2020-01-02"
## β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
## setting value
## version R version 3.6.1 (2019-07-05)
## os macOS Sierra 10.12.6
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_GB.UTF-8
## ctype en_GB.UTF-8
## tz Europe/London
## date 2020-01-02
##
## β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
## package * version date lib source
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
## backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.0)
## blogdown 0.17 2019-11-13 [1] CRAN (R 3.6.0)
## bookdown 0.16 2019-11-22 [1] CRAN (R 3.6.0)
## cli 2.0.0 2019-12-09 [1] CRAN (R 3.6.1)
## crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
## digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.0)
## dplyr * 0.8.3 2019-07-04 [1] CRAN (R 3.6.0)
## emo 0.0.0.9000 2019-12-23 [1] Github (hadley/emo@3f03b11)
## evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
## fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.0)
## glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
## htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0)
## knitr 1.26 2019-11-12 [1] CRAN (R 3.6.0)
## lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.0)
## magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
## pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
## purrr * 0.3.3 2019-10-18 [1] CRAN (R 3.6.0)
## R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0)
## Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.0)
## rlang 0.4.2 2019-11-23 [1] CRAN (R 3.6.0)
## rmarkdown 2.0 2019-12-12 [1] CRAN (R 3.6.0)
## rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
## stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
## stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
## tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
## tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0)
## utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.0)
## vctrs 0.2.1 2019-12-17 [1] CRAN (R 3.6.1)
## withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
## xfun 0.11 2019-11-12 [1] CRAN (R 3.6.0)
## yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
## zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)
##
## [1] /Users/matt.dray/Library/R/3.6/library
## [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
Along with
yolo: true
, of course.β©Unlike Arsenal FC in 2019, rofl.β©