Skip to content

Latest commit

 

History

History
78 lines (55 loc) · 4.76 KB

README.md

File metadata and controls

78 lines (55 loc) · 4.76 KB

Tidy Up Task

The idea behind this task is that the robot is encountering a cluttered household environment where different objects are misplaced. This problem is encapsulated by the following two commonsense questions the robot should be able to answer:

  • What objects in my environment are out of place?
  • What are the prototypical locations for these objects?

To answer these questions, we provide a list of 868 household objects and their associated locations.

Data Structure

The data in the tidy_up_data_csv is collected in two columns: The first column (Object) contains all object names. The second column (Locations) contains a Python dictionary with a ranked list of possible locations. The key in the dictionary describes the rank of the location. The value in the dictionary is the name of the location as well as a trust value on which the ranking is based. The trust value depends on the data source from which this location is gathered.

Example: ladle,"{1: ('kitchen', 1.5), 2: ('drawer', 0.5)}"

The object ladle is probably found in the kitchen otherwise it is found in a drawer.

Data Sources

We combine five different data sources to gather the information on prototypical object locations. For each source, we provide a single file focused on the data extraction. We use WordNet [1] to ensure that the found candidates are actually household objects and household locations. Additionally, to create the ranking of the locations, we manually assign each source a specific trust value based on the reliability.

Source Trust Value Candidates Included Ref
AI2Thor 1.00 118 118 [2]
Ascent++ 0.50 4309 131 [3]
CSKG 0.50 9264 462 [4]
Housekeep 1.00 268 268 [5]
Microsoft COCO 0.75 7 7 [6]
SUM - 13966 986

After removing duplicates from the list of 986 included objects, 868 distinct objects and their locations remain.

Experiments

To evaluate this tasks, we provide the LLMs with the following prompt:

System: Imagine you are a robot tidying up a household.
User: What are the prototypical locations in a household where the following object can be found? Please only answer with a comma separated & ranked list of locations.
Object: [Object] 
Locations:

We analyse the results for each model using the following metrics:

  • Mean Reciprocal Rank (MRR) of the generated answers when compared to the gold standard ranked list
  • Mean Average Precision @k (MAP@k) over all objects (evaluated for k=1, k=3 & k=5)
  • Mean Average Recall @k (MAR@k) over all objects (evaluated for k=1, k=3 & k=5)

Results

LLM MRR MAP@1 MAP@3 MAP@5 MAR@1 MAR@3 MAR@5
GPT-4o 0.412 0.345 0.387 0.365 0.100 0.146 0.163
Llama-3.3-70B-Instruct 0.424 0.364 0.392 0.372 0.104 0.144 0.154
gemma-2-27b-it 0.276 0.218 0.255 0.248 0.086 0.117 0.125

References

[1] G. A. Miller, ‘WordNet: A Lexical Database for English’, Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995, doi: 10.1145/219717.219748.

[2] E. Kolve et al., ‘AI2-THOR: An Interactive 3D Environment for Visual AI’, 2017, arXiv. doi: 10.48550/ARXIV.1712.05474.

[3] T.-P. Nguyen, S. Razniewski, J. Romero, and G. Weikum, ‘Refined Commonsense Knowledge from Large-Scale Web Contents’, IEEE Trans. Knowl. Data Eng., pp. 1–16, 2022, doi: 10.1109/TKDE.2022.3206505.

[4] F. Ilievski, P. Szekely, and B. Zhang, ‘CSKG: The CommonSense Knowledge Graph’, in The Semantic Web, vol. 12731, R. Verborgh, K. Hose, H. Paulheim, P.-A. Champin, M. Maleshkova, O. Corcho, P. Ristoski, and M. Alam, Eds., in Lecture Notes in Computer Science, vol. 12731. , Cham: Springer International Publishing, 2021, pp. 680–696. doi: 10.1007/978-3-030-77385-4_41.

[5] Y. Kant et al., ‘Housekeep: Tidying Virtual Households Using Commonsense Reasoning’, in Computer Vision – ECCV 2022, S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds., in Lecture Notes in Computer Science, vol. 13699. Cham: Springer Nature Switzerland, 2022, pp. 355–373. doi: 10.1007/978-3-031-19842-7_21.

[6] T.-Y. Lin et al., ‘Microsoft COCO: Common Objects in Context’, in Proceedings of the 13th European Conference on Computer Vision (ECCV 2014), D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., in Lecture Notes in Computer Science, vol. 8693. Zurich, Switzerland: Springer International Publishing, 2014, pp. 740–755. doi: 10.1007/978-3-319-10602-1_48.