A Workflow for Data Analysis
workflow.Rmd
https://r4ds.hadley.nz/workflow-scripts.html
https://www.tidyverse.org/blog/2017/12/workflow-vs-script/
The only way to write complex software that won’t fall on its face is to build it out of simple modules connected by well-defined interfaces, so that most problems are local and you can have some hope of fixing or optimizing a part without breaking the whole. http://catb.org/%7Eesr/writings/taoup/html/modularitychapter.html
https://swcarpentry.github.io/r-novice-inflammation/06-best-practices-R.html
“Another way you can be explicit about the requirements of your code and improve it’s reproducibility is to limit the “hard-coding” of the input and output files for your script. If your code will read in data from a file, define a variable early in your code that stores the path to that file. For example”
The problem of writing safe, flexible data analysis code
-
classic example: big file
- difficult to understand
- slow or confusing/unreliable - have to re-run
- difficult to debug
-
split into logical parts (modular), then source
- helps
- but doesn’t totally solve - confusing environment
better: break into pieces, and run each in own environment
a bit like functions (but without global environment)
-
push harder - make inputs outputs completely transparenet
have to re-run - slow
- difficult to change
- polluted environment
classic software solution: break into small files
but if
how to run them?