tl;dr
R has an interactive spreadsheet editor for dataframes that you can access with edit()
. I made the function tide::tide()
to generate automatically some code that will reproduce the changes you made manually with edit()
.
Edit
R’s edit()
function invokes a text editor so you can amend an R object.1
Something special happens If you edit()
a data.frame object: a somewhat-janky interactive spreadsheet-like editor appears in a new window.2
Click in a cell to amend a value, or click in the header for a menu that lets you change the column name, or switch between real and character classes. There are even buttons to copy and paste values.
Click the ‘quit’ button to confirm your changes. The edited data is returned to you back in the console.
But that’s not very reproducible. How can anyone recreate the amended dataframe from the original if your clicks and keypresses weren’t recorded?
Can we make edit()
more reproducible?
Tide
Basic premise: create a function that accepts a dataframe as input, opens the edit menu, observes the updated values and generates code to reproduce the new object from the old.
I’ve created the concept package, {tide}, to do this.3 It has only one, eponymous function: tide()
.
Why ‘tide’? Well, it’s ‘edit’ backwards. And we’re ‘turning the tide’ on the edit()
function to make it reproducible, geddit?4
You can install {tide} from GitHub. The {clipr} package, which can copy text to your clipboard, will also be installed.
if (!require(remotes)) install.packages("remotes")
install_github("matt-dray/tide")
So let’s get our feet wet with an example. Here’s a thematically-related data.frame of the tide table for London Bridge for May 1 2022.
tide_table <- data.frame(
type = c("High", "Low", NA_character_, "Low"),
time_bst = c("02:58", "09:42", "15:20", "21:58"),
height_m = c(7.0, 0.5, 6.9, 70)
)
tide_table
## type time_bst height_m
## 1 High 02:58 7.0
## 2 Low 09:42 0.5
## 3 <NA> 15:20 6.9
## 4 Low 21:58 70.0
But whoops: the missing value should be ‘High’ and the height is wrong by two orders of magnitude for the 21:58 low tide.
So, let’s use tide::tide()
on the dataframe to edit those values.
library(tide)
tide(tide_table)
This opens a separate data-editor window. Here’s how it looks when it opens:5
And once I’ve made the adjustments manually:
And here’s what’s returned to the console once I’ve clicked the ‘Quit’ button:
Wrote code to clipboard
## type time_bst height_m
## 1 High 02:58 7.0
## 2 Low 09:42 0.5
## 3 High 15:20 6.9
## 4 Low 21:58 0.7
You can see the edits have been successfully returned. This is also what you’d see if you just used edit()
.
The extra feature from tide()
is evident in the message Wrote code to clipboard
: the function generated some lines of code that will take you from the original to the edited object.
So if we now paste from the clipboard we get:
tide_table[3, 1] <- "High"
tide_table[4, 3] <- 0.7
In other words, ‘replace the value in row 3, column 1 of the tide_table
object with the string value "High"
’, for example.
And if we actually run those lines, we can recreate the amended data.frame from the original:
tide_table
## type time_bst height_m
## 1 High 02:58 7.0
## 2 Low 09:42 0.5
## 3 High 15:20 6.9
## 4 Low 21:58 0.7
So, hurrah, we now have a method of manually editing the table and getting some code back that can reproduce it.
Diet
To borrow another anagram of ‘edit’ the capability of the package is quite… lightweight. Some issues are that:
- the function currently only works if you amend individual values (cells), not if you change headers, or add rows and columns
- the returned code will operate on a cell-by-cell basis, so you might get
x[1, 1] <- "A"
andx[2, 1] <- "B"
where actually it could have been the more convenient to getx[1:2, 1] <- c("A", "B")
- the returned code refers to columns by index, even though it’s more explicit to refer to them by name, like
x[1, "col1"] <- "A"
- the returned code will be written in base R and will edit in place by index (i.e.
[<-
), it doesn’t return {data.table}- or tidyverse-compliant code - you only get the code in your clipboard, it isn’t returned from the function
I might update the package to handle this stuff in future, or you can do it for me with a pull request in the GitHub repo.
But to be honest, the data editor is probably a bit too clunky and simple to be useful for most use cases. So there’s not much point expanding this package beyond a concept.
Or maybe the approach will pick up pace like the Severn Estuary tidal bore, who knows? Or maybe you think this post is a bore.6
Session info
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.1.1 (2021-08-10)
## os macOS Mojave 10.14.6
## system x86_64, darwin17.0
## ui X11
## language (EN)
## collate en_GB.UTF-8
## ctype en_GB.UTF-8
## tz Europe/London
## date 2022-04-27
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date lib source
## blogdown 1.5 2021-09-02 [1] CRAN (R 4.1.0)
## bookdown 0.25 2022-03-16 [1] CRAN (R 4.1.2)
## bslib 0.3.1 2021-10-06 [1] CRAN (R 4.1.0)
## cli 3.2.0 2022-02-14 [1] CRAN (R 4.1.2)
## digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.0)
## evaluate 0.15 2022-02-18 [1] CRAN (R 4.1.2)
## fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
## htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.0)
## jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.1.2)
## knitr 1.38 2022-03-25 [1] CRAN (R 4.1.2)
## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.1.2)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0)
## rlang 1.0.2 2022-03-04 [1] CRAN (R 4.1.2)
## rmarkdown 2.13 2022-03-10 [1] CRAN (R 4.1.2)
## rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
## sass 0.4.1 2022-03-23 [1] CRAN (R 4.1.2)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
## stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.0)
## stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
## withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.2)
## xfun 0.30 2022-03-02 [1] CRAN (R 4.1.2)
## yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.2)
##
## [1] /Users/matt.dray/Library/R/x86_64/4.1/library
## [2] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
I think this was new to quite a lot of people, based on a recent tweet I wrote about it. The
edit()
function feels like another one of those ‘hidden’ base functions. There’s not really a good reason for you to know about it unless someone told you about it. I got it from an old-school textbook when first learning R, just like thelocator()
function for retrieving interactively the coordinates from a plot, which I wrote about in a previous post.↩︎Note that the help for
?dataentry
, which underliesedit()
, says ‘the data entry editor is only available on some platforms and GUIs’ and ‘the details of interface to the data grid may differ by platform and GUI’.↩︎‘Concept’ means I can put it out there without any guarantees or polish, and I don’t even have to complete it if I don’t want to!↩︎
Also consider the allusion to King Canute trying to turn back the tide. It can be pretty hard to stop people writing non-reproducible code. And this package isn’t going to change that.↩︎
Note that the help for
?dataentry
, which underliesedit()
, says ‘the data entry editor is only available on some platforms and GUIs’ and ‘the details of interface to the data grid may differ by platform and GUI’.↩︎You came here for the wordplay anyway, right?↩︎