Skip to content

Latest commit

 

History

History
97 lines (76 loc) · 7.26 KB

File metadata and controls

97 lines (76 loc) · 7.26 KB

Human Experiment Data

This subdirectory is the home of all human experiment data for the Overcooked game. All data was collected through Mturk and is fully anonymized. While the data collection code is proprietary, it relies heavily on the open source Overcooked-demo project.

Data was collected on behalf of the Center for Human-Compatible AI (CHAI) at UC Berkeley. Do not distribute in any manner without the express consent of CHAI or its affiliates. If you have questions regarding data rights, privacy, or distrubion please contact either Nathan Miller at [email protected] or Micah Carroll at [email protected].

Overview

Data Directory Structure

This directory is subdivided into three subdirectories as follows

human_data/

  • raw/
    • Contains all unprocessed, unfiltered data in CSV form
    • Data is divided into 2019 experiments, collected for this paper, and 2020 experiments, collected on more complex layouts with updated dynamics
  • cleaned/
    • Contains processed, filted data as pickled pandas DataFrames
    • Data is again divided into 2019 and 2020 experiments
    • Data is futher divided into 'all', 'train', and 'test' sets
    • Code for performing this pre-processing is available here, with further info found below
  • dummy/
    • A strict subset of the data in other two repos
    • Useful for making tests more lightweight and reproducible
    • Do NOT use for production purposes

Schema

Raw Schema

The current raw data schema is as follows

NEW_SCHEMA = set(['state', 'joint_action', 'reward', 'time_left', 'score', 'time_elapsed', 'cur_gameloop', 'layout', 
              'layout_name', 'trial_id', 'player_0_id', 'player_1_id', 'player_0_is_human', 'player_1_is_human'])

Each row in the CSV corresponds to a single, discrete timestep in the underlying MDP.

Note: A 'trial' refers to a singular Overcooked game on a single layout.

  • state (JSON): A JSON serialized version of a OvercookedState instance. Support for converting JSON into an OvercookedState python object is found in the Overcooked-ai repo
  • joint_action (JSON): A JSON serialized version of a joint overcooked action. Player 0 action is at index 0, similarly for player 1.
  • reward (int): The sparse reward achieved in this particular transition
  • time_left (float): The wall-clock time remaining in the trial
  • score (float): Cumulative sparse reward achieved by both players at this point in the game
  • time_elapsed (float): Wall clock time since begining of the trial
  • cur_gameloop (int): Number of discrete MDP timesteps since beginning of trial
  • layout (string): The 'terrain', or all static parts (pots, ingredients, counters, etc) of the layout, serialized in a string encoding
  • layout_name (string): Human readable name given to the specific layout
  • trial_id (string): unique identifier given to the trial (again, note this is a single Overcooked game; a single player pair could experience many trials).
  • player_0_id (string): Anonymized ID given to this particular Psiturk worker. Note, these were independently generated by us on the backend so there is no relation to Turk ID. If player is AI, the the the player ID is a hardcoded AI_ID constant
  • player_1_id (string): Symmetric to player_0_id
  • player_0_is_human (bool): Indicates whether player_0 is controlled by human or AI
  • player_1_is_human (bool): Symmetric to player_0_is_human

Processed Schema

In the course of pre-processing, several additional columns are computed from the underlying raw data and added for convenience. They are as follows

  • cur_gameloop_total (int): Total number of MDP timesteps in this trial. Note that this is a constant across all rows with equivalent trial_id
  • score_total (int): Final score of the trial
  • button_press (int): Whether a keyboard stroke was performed by a human at this timestep. Each non-wait action counts as one button press
  • button_press_total (int): Total number of (human) button presses performed in entire trial
  • button_presses_per_timestep (float): button_press_total / cur_gameloop_total
  • timesteps_since_interact (int): Number of MDP timesteps since the last human-input 'INTERACT' action

Processing Utils

All data processing utils are found in the human directory.

  • process_dataframes.py
    • High level, user facing methods are found in this file
    • get_human_human_trajectories accepts a layout name and returns de-serialized overcooked trajectories, if data for that layout exists. Highest level function used by BC training pipeline
    • csv_to_df_pickle loads, processes, and filters raw CSV data and saves as pickled DataFrame
    • format_trials_df helper to csv_to_df_pickle, this method handles all pre-processing and builds the processed schema mentioned in previous section
    • filter_trials helper to csv_to_df_pickle, this method allows user to specify a filter function that filters entire trials
    • filter_transitions allows user to specify filter function that filters at a transition-by-transition level
  • data_processing_utils.py
    • Lower level helper functions found in this file
    • Primarily for converting CSV and DataFrame representations into python Overcooked objects
    • One abstraction level lower than process_dataframs.py, recommended to be used by advanced users only
  • data_wrangling.ipynb
    • Interactive Jupyter Notebook examplifying use of the process_dataframes functionality
  • process_human_trials.py
    • Script for converting legacy dynamics into form comaptible with current dynamics
    • Previously, Overcooked MDP began automatically cooking a soup once valid recipe was in pot, now, an INTERACT action is explicitely required to begin cooking
    • This script imputes dummy INTERACT actions at every timestep where soup cooking begins
    • See overcooked-ai for more details on MDP dynamics and game rules
  • human_data_forward_comp.py
    • Utils script for converting deprecated schema to updated schema listed in previous section
    • All data currently in repo is under updated schema, so this is only included for legacy reasons