Skip to contents

https://r4ds.hadley.nz/workflow-scripts.html

https://www.tidyverse.org/blog/2017/12/workflow-vs-script/

The only way to write complex software that won’t fall on its face is to build it out of simple modules connected by well-defined interfaces, so that most problems are local and you can have some hope of fixing or optimizing a part without breaking the whole. http://catb.org/%7Eesr/writings/taoup/html/modularitychapter.html

https://swcarpentry.github.io/r-novice-inflammation/06-best-practices-R.html

“Another way you can be explicit about the requirements of your code and improve it’s reproducibility is to limit the “hard-coding” of the input and output files for your script. If your code will read in data from a file, define a variable early in your code that stores the path to that file. For example”

The problem of writing safe, flexible data analysis code

  • classic example: big file

    • difficult to understand
    • slow or confusing/unreliable - have to re-run
    • difficult to debug
  • split into logical parts (modular), then source

    • helps
    • but doesn’t totally solve - confusing environment
  • better: break into pieces, and run each in own environment

  • a bit like functions (but without global environment)

  • push harder - make inputs outputs completely transparenet

    have to re-run - slow

    • difficult to change
    • polluted environment
  • classic software solution: break into small files

  • but if

  • how to run them?

[https://stemurphy.com/post/rep_manu_think_about/]

Other solutions

  • commandArgs
  • docopt
  • other?

Principles of file organization

  • don’t distinguish intermediate and final outputs (everything in out)
  • don’t number files