Processing data with Drake

Introducing ‘Drake’, a “Make for Data”

We call this tool Drake, and today we are excited to share Drake with the world, as an open source project. It is written in Clojure.

Drake is a text-based command line data workflow tool that organizes command execution around data and its dependencies. Data processing steps are defined along with their inputs and outputs.  It automatically resolves dependencies and provides a rich set of options for controlling the workflow. It supports multiple inputs and outputs and has HDFS support built-in.

We use Drake at Factual on various internal projects. It serves as a primary way to define, run, and manage data workflow. Some core benefits we’ve seen:
    • Non-programmers can run Drake and fully manage a workflow
    • Encourages repeatability of the overall data building process
    • Encourages consistent organization (e.g., where supporting scripts live, and how they’re run)
    • Precise control over steps (for more effective testing, debugging, etc.)
    • Unifies different tools in a single workflow (shell commands, Ruby, Python, Clojure, pushing data to production, etc.)

Drake official blog:


Comments are closed.