Skip to content

Creating A Module

Jonathan Stray edited this page Oct 1, 2018 · 35 revisions

Workbench comes with many modules for loading data, cleaning it, visualizing it, etc. But it's also a "package manager" for all those little pieces of code that are necessary to do data work. You can create your own modules with Python, and they can optionally include JavaScript to produce embeddable visualizations or custom UI elements.

Quickstart

  • Clone the Hello Workbench module
  • Clone the main workbench repo into a sibling directory and set up a Workbench development environment
  • Fire up Workbench with CACHE_MODULES=false bin/dev start
  • Watch the module directory with bin/dev develop-module hello-workbench. This will re-import the module whenever you make any changes.
  • Browse to 128.0.0.0:8000 to use Workbench and try out your module

The Basics

Workbench loads custom modules from GitHub on production, or from a local copy of the repo when developing. There must be at least two files in your repo: a JSON configuration file which defines module metadata and parameters, and a Python script which does the actual data processing. You can also add a JavaScript file which produces output in the right pane, as Workbench's built-in charts do.

We recommend you also write tests for your new module.

Once you've checked your module into Github, you can add it with the Import Module from Github command in Workbench.

Here are some examples of existing Workbench modules:

The Module JSON file

The JSON file is required for every module and defines metadata including the module name, an internal unique identifier, and most importantly all the parameters that are displayed in the module UI.

For example, if you were to create a Twitter module that allowed users to search by user or a term, your module configuration could look something like this

{
 "name": "Search and Replace",
 "id_name": "S&R",
 "category" : "Clean",
 "description" : “Search for text and replace it with something else",
 ""help_url": "https://mymodules.com/docs/search-and-replace",
 "parameters": [
        {
         "name" : "Search for",
         "id_name" : "search",
         "type" : "string",
        },
        {
         "name": "Replace with",
         "id_name" : "replace",
         "type": "string",
        },
        {
         "name": "Column (or None for all)",
         "id_name": "column",
         "type": column
        }
    ]
}

This module has three parameters: two strings and a column selector.

The JSON module description in detail

All modules must define the following keys

  • name - The user-visible name of the module
  • id_name - An internal unique identifier. It must never change, or currently applied modules will break.
  • category - The category the module appears under in the Module Library

The following keys are optional but recommended:

  • description - An optional one-line description used to help users search for modules
  • help_url - An optional link to a help page describing how to use the module
  • icon - Must be one of a set of internal icons; see other module JSON files for options.

Parameters

Each parameter must define the following keys:

  • name - User visible
  • id_name - Internal unique identifier. Must not change, or Workbench will think it's a brand new parameter. However, different modules can use the same id_name.

They can have several optional keys:

  • default - The initial value of the parameter.
  • placeholder - The text that appears when the parameter field is empty, or column is selected.
  • visible_if - Hides or shows this parameter based on the value of a menu or checkbox parameter.

The visible_if key is JSON object (content is inside braces) which itself has the following keys:

  • id_name - Which parameter controls the visibility of this parameter. It must be a menu or checkbox.
  • values - A list of menu values separated by |, or true or false for a checkbox
  • invert - Optional. If set to true, the parameter is visible if the controlling parameter does not have one of the values.

Some parameter types also support custom flags; see below.

Parameter types

Workbench currently supports the following parameter types:

  • string - Yup. Can have multiline set to true if you want an expandable text field.
  • integer - An integer value
  • float - A decimal value
  • column - Allows the user to select a column. The placeholder value appears when no column is selected.
  • multicolumn - Allows the user to select multiple columns.
  • menu - A fixed list of menu items, separated by the pipe character (|). Menu item indices are zero-based when setting the default and when passed to the render function.
  • checkbox - A simple boolean control.
  • statictext - Just shows the name as text, has no parameter value. Useful to explain to the user what to do.

The Module Python file

The Python file must contain a single function called render which takes two parameters: a Pandas dataframe and a dictionary of parameter values, and returns a Pandas dataframe, like this:

def render(table, params):
    s = params['search']
    r = params['replace']
    col = params['column']
   
    if col is None:
        return table.replace(s, r)
    else
        return table[col].replace(s,r)

To help the user set parameters, this function should return the input unchanged when the module is initially applied. If the module produces no output return table or the empty dataframe, pd.DataFrame(). If the module encounters an error, return an error string instead of a Pandas table.

The module may change the input table parameter, such as operating in place and returning the modified input dataframe, or simply using it for scratch space.

The Module HTML file

Set "html_output": true in your module JSON file to create an HTML output pane. Add a [modulename].html file to your module's directory, and it will appear in that output pane.

Workbench will display your HTML page in an iframe whenever your module is selected in a workflow. The most common reason is to render a chart.

Your HTML page can include inline JavaScript.

HTML: producing JSON data from Python

Every Python module produces "embed data": JSON destined for the embedded iframe. By default, that data is null.

To produce non-null embed data, make your Python render method return a triplet of data in this exact order: (dataframe, error_str, json_dict). For instance:

def render(table, params):
    return (table, 'Code not yet finished', {'foo': 'bar'})

Workbench will encode json_dict as JSON, so it must be a dict that is compatible with json.dumps().

HTML: reading JSON data from your HTML page's inline JavaScript

On page load: Workbench will inject a <script> tag with a global variable at the top of your HTML's <head>. You can access it by reading window.workbench.embeddata. For instance:

<!DOCTYPE html>
<html>
  <head><!-- You _must_ have a <head> element -->
    <title>Embeddata is set</title>
  </head>
  <body>
    <main></main>
    <script>
      document.querySelector('main').textContent = JSON.stringify(window.workbench.embeddata)
    </script>
  </body>
</html>

After page load: Workbench adds a #revision=N hash to your iframe's URL. That means the hashchange event will fire every time the JSON data will be recomputed. You can query the embeddata API endpoint to load the new data.

<!DOCTYPE html>
<html>
  <head>
    <title>Let's query embeddata from the server</title>
  </head>
  <body>
    <main></main>
    <script>
      function renderData (data) {
        document.querySelector('main').textContent = JSON.stringify(data)
      }

      function reloadEmbedData () {
        const url = String(window.location).replace(/\/output.*/, '/embeddata')
        fetch(url, { credentials: 'same-origin' })
          .then(function(response) {
            if (!response.ok) {
              throw new Error('Invalid response code: ' + response.status)
            }
            return response.json()
          })
          .then(renderData)
          .catch(console.error)
      }

      // Reload data whenever it may have changed
      window.addEventListener('hashchange', reloadEmbedData)

      // Don't forget to render the data on page load, _before_ the first change
      renderData(window.workbench.embeddata)
      // (alternatively: `reloadEmbedData()`)
    </script>
  </body>
</html>

Importing a module

Simply click on “Import from GitHub” to add a module from GitHub. Workbench will ensure that your module is ready to load and let you know if it runs into any trouble. Once you fix the issue, and commit the changes to GitHub, you can attempt to import the module from GitHub once again.

All imported modules are versioned, by typing the imported code to the Github revisions. Currently applied modules are automatically updated to new module code versions (which can involve adding, removing, and resetting parameters.)

Developing a module

First, Set up a development environment

Start it, but disable the part that saves compiled modules: CACHE_MODULES=false bin/dev start

Next, create a new directory (at the same level as the cjworkbench directory, a sibling to it) called modulename. Add these files:

  • README.md -- optional but highly recommended
  • LICENSE -- optional but highly recommended
  • [modulename].py -- Python code, including def render(table, params) function
  • [modulename].json -- JSON file
  • [modulename].html -- if outputting a custom iframe

In a shell in the cjworkbench directory, start a process that watches that directory for changes and auto-imports the module into the running Workbench: bin/dev develop-module modulename

Now, edit the module's code. Every time you save, the module will reload in Workbench. To see changes to HTML and JSON, refresh the page. To see changes to Python, refresh the page and trigger a render() by changing a parameter.