Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

This repository contains a PyTorch implementation of the paper Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos accepted at NeurIPS 2021 (spotlight). If you find this implementation or the paper helpful, please consider citing:

@InProceedings{tanCOMMA2021,
     author={Reuben Tan and Bryan A. Plummer and Kate Saenko and Hailin Jin and Bryan Russell},
     title={Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos},
     booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
     year={2021} }

Dependencies

Python 3.6
Pytorch version 1.7.0
Ffmpeg
Open-CV

Project Code Files

The code is currently getting cleaned up and tested. It will be released very soon! Thank you for your patience.

Download YouCook2-Interactions Dataset

Please go to this link to download the YouCook2-Interactions evaluation dataset and unzip it. The output folder has the following files:

final_dataset_segments.pkl - this file contains all the video segments that are used for evaluation. Each segment is represented by a tuple where its elements are the video name and start and end times in seconds.
final_dataset_annotations.pkl - this file contains the frame-level bounding box annotations for the video segments.

If you are interested in visualizing the YouCook2-Interactions dataset, you can do so by running the following command:

python plot_local_annotations.py --video_dir {directory where YouCook2 videos are stored} --annotations_path {path to final_dataset_annotations.pkl} --segments_path {path to final_dataset_segments.pkl} --output_dir {directory to store annotated frames}

Preprocess videos into bytestream files (optional)

Training code

To run the training code, please run the following command:

python main_distributed.py --batch_size 512 --lr 1e-4 --epochs 10 --multiprocessing-distributed --checkpoint_dir {path to directory for saving checkpoints}

Evaluation code

Before starting the evaluation, please download the original YouCook2 train and validation annotations here.

To run the evaluation code, please run the following command:

python -W ignore eval_youcook_interactions_localization.py --eval_video_root {directory containing YouCook2 videos} --youcook2_annotations_path {path to json file containing YouCook2 annotations} --interactions_annotations_path {path to YouCook2-Interaction annotations file} --interactions_segments_path {path to YouCook2-Interaction segments file} ----checkpoint_eval {path to trained model weights}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
csv		csv
README.md		README.md
args.py		args.py
eval_youcook_interactions_localization.py		eval_youcook_interactions_localization.py
loss.py		loss.py
main_distributed.py		main_distributed.py
motivational.png		motivational.png
plot_local_annotations.py		plot_local_annotations.py
s3dg.py		s3dg.py
utils.py		utils.py
video_loader.py		video_loader.py
youcook_interactions_loader.py		youcook_interactions_loader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

Dependencies

Project Code Files

Download YouCook2-Interactions Dataset

Preprocess videos into bytestream files (optional)

Training code

Evaluation code

Code credit

About

Releases

Packages

Languages

rxtan2/video-grounding-narrations

Folders and files

Latest commit

History

Repository files navigation

Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

Dependencies

Project Code Files

Download YouCook2-Interactions Dataset

Preprocess videos into bytestream files (optional)

Training code

Evaluation code

Code credit

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages