Introduction
Functions extract_make()
and makefile()
help with writing a Makefile to control a data analysis workflow.
extract_make()
and makefile()
do much of the
work, but some extra hand coding is also necessary.
An example workflow
To demonstrate, we use a simple workflow where all the code, aside from the shell script, has already been written. The project directory contains the following files:
.
├── data
│ └── raw_data.csv
├── out
├── report.qmd
└── src
├── cleaned_data.R
├── fig_fitted.R
├── model.R
└── vals_fitted.R
-
src/cleaned_data.R
reads in the raw data, cleans it, and createsout/cleaned_data.rds
-
src/model.R
fits a model to the cleaned data, and createsout/model.rds
-
src/vals_fitted.R
extract fitted values from the model, and createsvals_fitted.rds
-
src/fig_fitted.R
usesvals_fitted.rds
to create the plot inout/fig_fitted.png
-
report.qmd
usesout/fig_fitted.png
to create the documentreport.html
cleaned_data.R
, model.R
, and
vals_fitted.R
, and fig_fitted.R
all contain
calls to cmd_assign()
. For instance,
src/model.R
contains the lines
cleaned_data.R
, model.R
, and
vals_fitted.R
, and fig_fitted.R
all contain
calls to cmd_assign()
. For instance,
src/model.R
contains the lines
cmd_assign(.cleaned_data = "out/cleaned_data.rds",
.out = "out/model.rds")
extract_make()
extract_make()
extracts the call to
cmd_assign()
from an R file, and turns it into a Makefile
rule. For instance, the call
extract_make("src/model.R")
has return value
out/model.rds: /home/runner/work/command/command/vignettes/articles/src/model.R \
out/cleaned_data.rds
Rscript $^ $@
The easiest way to use extract_make()
is to call it from
the R console, and then cut-and-paste the results into Makefile.
makefile()
makefile()
creates a draft of the whole Makefile.
makefile()
loops through all the R files in a directory,
extracting cmd_assign()
calls and converting them into
Makefile rules. It then puts the rules into a Makefile.
For instance, the call
makefile("src")
creates a new file called Makefile
with the lines
.PHONY: all
all:
out/cleaned_data.rds: src/cleaned_data.R \
data/raw_data.csv
Rscript $^ $@
out/fig_fitted.png: src/fig_fitted.R \
out/vals_fitted.rds
Rscript $^ $@
out/model.rds: src/model.R \
out/cleaned_data.rds
Rscript $^ $@
out/vals_fitted.rds: src/vals_fitted.R \
out/cleaned_data.rds \
out/model.rds
Rscript $^ $@
.PHONY: clean
clean:
rm -rf out
mkdir out
The output from makefile()
needs some editing before it
is ready for use. We need to a rule for creating the report, and make
that report a prerequisite for all:
at the top of the
file:
.PHONY: all
all: report.html
out/cleaned_data.rds: src/cleaned_data.R \
data/raw_data.csv
Rscript $^ $@
out/fig_fitted.png: src/fig_fitted.R \
out/vals_fitted.rds
Rscript $^ $@
out/model.rds: src/model.R \
out/cleaned_data.rds
Rscript $^ $@
out/vals_fitted.rds: src/vals_fitted.R \
out/cleaned_data.rds \
out/model.rds
Rscript $^ $@
report.html: report.qmd \
out/fig_fitted.png
quarto render $<
.PHONY: clean
clean:
rm -rf out
mkdir out
The ordering of the rules in the Makefile reflects the ordering of
the files in the src
directory, not the order in which the
rules should be executed. This does not matter to make
,
which constructs its own dependency map. But for the benefit of human
readers, we rearrange the rules to match the execution order.
.PHONY: all
all: report.html
out/cleaned_data.rds: src/cleaned_data.R \
data/raw_data.csv
Rscript $^ $@
out/model.rds: src/model.R \
out/cleaned_data.rds
Rscript $^ $@
out/vals_fitted.rds: src/vals_fitted.R \
out/cleaned_data.rds \
out/model.rds
Rscript $^ $@
out/fig_fitted.png: src/fig_fitted.R \
out/vals_fitted.rds
Rscript $^ $@
report.html: report.qmd \
out/fig_fitted.png
quarto render $<
.PHONY: clean
clean:
rm -rf out
mkdir out
Running the Makefile
Our project directory looks like this.
.
├── Makefile
├── data
│ └── raw_data.csv
├── out
├── report.qmd
└── src
├── cleaned_data.R
├── fig_fitted.R
├── model.R
└── vals_fitted.R
We run the Makefile.
Rscript src/cleaned_data.R data/raw_data.csv out/cleaned_data.rds
✔ Assigned object `.raw_data` with value "data/raw_data.csv" and class "character".
✔ Assigned object `.out` with value "out/cleaned_data.rds" and class "character".
Rscript src/model.R out/cleaned_data.rds out/model.rds
✔ Assigned object `.cleaned_data` with value "out/cleaned_data.rds" and class "character".
✔ Assigned object `.out` with value "out/model.rds" and class "character".
Rscript src/vals_fitted.R out/cleaned_data.rds out/model.rds out/vals_fitted.rds
✔ Assigned object `.cleaned_data` with value "out/cleaned_data.rds" and class "character".
✔ Assigned object `.model` with value "out/model.rds" and class "character".
✔ Assigned object `.out` with value "out/vals_fitted.rds" and class "character".
Rscript src/fig_fitted.R out/vals_fitted.rds out/fig_fitted.png
✔ Assigned object `.vals_fitted` with value "out/vals_fitted.rds" and class "character".
✔ Assigned object `.out` with value "out/fig_fitted.png" and class "character".
null device
1
quarto render report.qmd
processing file: report.qmd
1/3
2/3 [unnamed-chunk-1]
3/3
output file: report.knit.md
pandoc
to: html
output-file: report.html
standalone: true
section-divs: true
html-math-method: mathjax
wrap: none
default-image-extension: png
variables: {}
metadata
document-css: false
link-citations: true
date-format: long
lang: en
title: Swiss Fertility
Output created: report.html
Our project directory now looks like this.
.
├── Makefile
├── data
│ └── raw_data.csv
├── out
│ ├── cleaned_data.rds
│ ├── fig_fitted.png
│ ├── model.rds
│ └── vals_fitted.rds
├── report.html
├── report.qmd
├── report_files
│ └── libs
└── src
├── cleaned_data.R
├── fig_fitted.R
├── model.R
└── vals_fitted.R
We have created scaled.rds
, model.rds
, and
tab_coef
in the out
directory, and
report.html
in the main directory.
Other resources
-
shell_script()
The shell script equivalent of
makefile()
- Data Analysis Workflows Safe, flexible workflows for data analysis
- Project Management with Make Makefiles in data analysis workflows
- GNU make Definitive guide
- Command-Line Programs Introduction to Rscript