Reproducing simulator
package Quick Start
This tutorial reimplements the execution part of Getting Started with Simulator example. We demonstrate in this tutorial how compound module executables are specified with scripts and inline executables.
Material to run this tutorial can be found in DSC vignettes repo. It also contains the code to reproduce the original simulator
simulation.
DSC Specification
The problem is fully specified in DSC syntax below:
simulate: model.R + R(m = simulate(n, prob))
seed: R(1:10)
n: 50
prob: R(seq(0,1,length=6))
$model: m
my_method, their_method: (my.R, their.R) + R(fit = method($(model)$x)$fit)
$fit: fit
abs, mse: (herloss.R, hisloss.R) + R(score = metric($(model)$mu, $(fit)))
$score: score
DSC:
define:
method: my_method, their_method
score: abs, mse
run: simulate * method * score
output: simulator_results
exec_path: R
Run DSC
cd ~/GIT/dsc/vignettes/simulator_example
dsc main.dsc
INFO: Checking R library dscrutils@stephenslab/dsc/dscrutils ...
INFO: DSC script exported to simulator_results.html
INFO: Constructing DSC from main.dsc ...
INFO: Building execution graph & running DSC ...
[###############] 15 steps processed (428 jobs completed)
INFO: Building DSC database ...
INFO: DSC complete!
INFO: Elapsed time 35.413 seconds.
You may notice that for this trivial benchmark, the execution is a lot slower compared to the original implementation. This is a limitation of DSC when running lite jobs – having to build and execute DAG, checking file signatures and monitoring runtime environment results in substential overhead. For real-world benchmarking when execution time is much longer, this overhead is perhaps tolerable because it is relatively insignificant under such scenario. Also notice that under simulator_results
there are 420 intermediate output *.rds
files generated, compared to only 31 *.Rdata
itermediate files generated when you run the simulator
version of the example. This is because the demonstrated DSC implementation is a lot more modular. You can compare the data generating code in DSC vs in simulator
to tell the more modular setup we have adopted here – although it is entirely possible to develop a different style of DSC that reproduces exactly the simulator
example’s output (yet stored in rds
rather than Rdata
format).