Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
converted_examples.json	converted_examples.json
original_examples.json	original_examples.json
schema.json	schema.json

MultiWOZ 2.2

This dataset consists of a schema file schema.json describing the ontology and dialogue files dialogues_*.json of dialogue data under the train, dev, and test folders.

Notes:

Compared to MultiWOZ 2.1, we remove SNG01862.json as it's an invalid dialogue.
MultiWOZ 2.2 is also available on Hugging Face and ParlAI.

Schema file

schema.json defines the new ontology using the schema representation in the schema-guided dialogue dataset.

The table below shows the categorical slots, non-categorical slots and intents defined for each domain.

Domain	Categorical slots	Non-categorical slots	Intents
Restaurant	pricerange, area, bookday, bookpeople	food, name, booktime, address, phone, postcode, ref	find, book
Attraction	area, type	name, address, entrancefee, openhours, entrancefee, openhours, phone, postcode	find
Hotel	pricerange, parking, internet, stars, area, type, bookpeople, bookday, bookstay	name, address, phone, postcode, ref	find, book
Taxi	-	destination, departure, arriveby, leaveat, phone, type	book
Train	destination, departure, day, bookpeople	arriveby, leaveat, trainid, ref, price, duration	find, book
Bus	day	departure, destination, leaveat	find
Hospital	-	department , address, phone, postcode	find
Police	-	name, address, phone, postcode	find

Of the 61 slots in the schema, the following 35 slots are tracked in the dialogue state:

{
  "attraction-area",
  "attraction-name",
  "attraction-type",
  "bus-day",
  "bus-departure",
  "bus-destination",
  "bus-leaveat",
  "hospital-department",
  "hotel-area",
  "hotel-bookday",
  "hotel-bookpeople",
  "hotel-bookstay",
  "hotel-internet",
  "hotel-name",
  "hotel-parking",
  "hotel-pricerange",
  "hotel-stars",
  "hotel-type",
  "restaurant-area",
  "restaurant-bookday",
  "restaurant-bookpeople",
  "restaurant-booktime",
  "restaurant-food",
  "restaurant-name",
  "restaurant-pricerange",
  "taxi-arriveby",
  "taxi-departure",
  "taxi-destination",
  "taxi-leaveat",
  "train-arriveby",
  "train-bookpeople",
  "train-day",
  "train-departure",
  "train-destination",
  "train-leaveat"
}

Dialogue files

Dialogues are formatted following the data presentation of the schema-guided dialogue dataset.

Because the state value of a slot can be mentioned in different ways in the dialogues (e.g. 8pm and 20:00), the ground truth state values is presented as a list of values to incorporate such cases. Predicting any of them is considered as correct in the evaluation. Specifically, the state values of each turn is represented as:

{
  "state":{
    "active_intent": String. User intent of the current turn.
    "requested_slots": List of string representing the slots, the values of which are being requested by the user.
    "slot_values": Dict of state values. The key is slot name in string. The value is a list of values.
  }
}

In addition, we also add the span annotations that identify the location where slot values have been mentioned in the utterances for non-categorical slots. These span annotations are represented as follows:

{
  "slots": [
    {
      "slot": String of slot name.
      "start": Int denoting the index of the starting character in the utterance corresponding to the slot value.
      "exclusive_end": Int denoting the index of the character just after the last character corresponding to the slot value in the utterance. In python, utterance[start:exclusive_end] gives the slot value.
      "value": String of value. It equals to utterance[start:exclusive_end], where utterance is the current utterance in string.
    }
  ]
}

There are some non-categorical slots whose values are carried over from another slot in the dialogue state. Their values don"t explicitly appear in the utterances.

For example, a user utterance can be "I also need a taxi from the restaurant to the hotel.", in which the state values of "taxi-departure" and "taxi-destination" are respectively carried over from that of "restaurant-name" and "hotel-name". For these slots, instead of annotating them as spans, we use a "copy from" annotation to identify the slot it copies the value from. This annotation is formatted as follows,

{
  "slots": [
    {
      "slot": Slot name string.
      "copy_from": The slot to copy from.
      "value": A list of slot values being . It corresponds to the state values of the "copy_from" slot.
    }
  ]
}

Action annotation

There are 8,333 turns missing dialogue action annotations in MultiWOZ 2.1. We used a finetuned T5 model to annotate actions for these missing turns, and manually verified and corrected them. Please note that there are still 749 turns without dialogue action annotations because the semantics of the utterances can"t be appropriately expressed using the dialogue actions defined by ConvLab, such as "Sure. Just a moment.", "said to skip.", etc.

Please check the annotated action annotation in "dialog_acts.json". It is formatted in the same style as MultiWOZ 2.1 except that we use character-level indexing instead of token-level indexing for the action values.

{
  "$dialogue_id": [
    "$turn_id": {
      "dialogue_acts": {
        "$act_name": [
          [
            "$slot_name",
            "$action_value"
          ]
        ]
      },
      "span_info": [
        [
          "$act_name"
          "$slot_name",
          "$action_value"
          "$start_charater_index",
          "$exclusive_end_character_index"
        ]
      ]
    }
  ]
}

Conversion to the data format of MultiWOZ 2.1

To include the corrections from MultiWOZ 2.2 dataset into MultiWOZ 2.1 in the format used by the MultiWOZ 2.1 dataset, please download the MultiWOZ 2.1 zip file, unzip it, and run

python convert_to_multiwoz_format.py --multiwoz21_data_dir=<multiwoz21_data_dir> --output_file=<output json file>

Please refer to our paper for more details about the dataset.

Questions

We are continuously making efforts to make this dataset better. If you have any questions, please feel free to contact us by ([email protected] or [email protected]).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MULTIWOZ2_2

MULTIWOZ2_2

README.md

MultiWOZ 2.2

Schema file

Dialogue files

Action annotation

Conversion to the data format of MultiWOZ 2.1

Questions

Files

MULTIWOZ2_2

Directory actions

More options

Directory actions

More options

Latest commit

History

MULTIWOZ2_2

Folders and files

parent directory

README.md

MultiWOZ 2.2

Schema file

Dialogue files

Action annotation

Conversion to the data format of MultiWOZ 2.1

Questions