Implementation details

DSC is implemented in Python 3. It relies on a number of libraries:

  • DSC relies heavily on codes from SoS project for execution of pipelines, which implements job dispatch and management via networkx and, different from other timestamp-based workflow tools, a file signature system via xxHash. Development of DSC is contributed directly to SoS whenever approperate.
  • sympy is used to expand DSC benchmark specification into pipelines, and to expand logic for @FILTER decorator.
  • pandas is used to ensure proper conversion between R and Python data frames. It is also used to manipulate output data.
  • scipy provides a sparse module that supports storing scipy.sparse type of matrix to DSC default storage format for Python.
  • sqlalchemy supports dsc-query to use SQL-like syntax.

In addition,

  • Preliminary cross-language communication from R to Python is implemented in rpy2, and from Python to R using reticulate. This might be replaced in future versions with some data bus implementation in the SoS project.