[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
-
Updated
Dec 8, 2023 - Python
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Awesome papers & datasets specifically focused on long-term videos.
Code for the paper Learning the Predictability of the Future (CVPR 2021)
PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529
[CVPR2023] Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning (https://arxiv.org/abs/2212.04500)
Implementation of "Generating Videos with Scene Dynamics" in Tensorflow
Official repository of the “Mask Again: Masked Knowledge Distillation for Masked Video Modeling” (ACM MM 2023)
Winning SubNetwork (WSN), Fourier Subneural Operator (FSO), Video-Incremental Learning (VIL), Sequential Neural Implicit Representation (NIR)
This is the code accompanying the AAAI 2022 paper "Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives" https://arxiv.org/abs/2201.11736 . The method allows you to use additional ranking information for representation learning.
Official Pytorch implementation of EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens [ICML2024].
Official repository of the "Fine-grained Key-Value Memory Enhanced Predictor for Video Representation Learning" (ACM MM 2023)
Actor-agnostic Multi-label Action Recognition with Multi-modal Query [ICCVW '23]
👆PyTorch Implementation of JEDi Metric described in "Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality"
Official code for CVPR2024 “VideoMAC: Video Masked Autoencoders Meet ConvNets”
Chainer implementation of Networks for Learning Video Representations
The code for the paper "Efficient Self-Supervised Video Hashing with Selective State Spaces" (AAAI'25).
[Asilomar 2022] Contextual Explainable Video Representation: Human Perception-based Understanding
The code for the paper "GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval" (AAAI'24)
📚 Paper Notes (Computer vision)
The official repository for creating casual action effect (CAE) dataset for the IJCNLP-AACL 2023 paper: Implicit Affordance Acquisition via Causal Action–Effect Modeling in the Video Domain
Add a description, image, and links to the video-representation-learning topic page so that developers can more easily learn about it.
To associate your repository with the video-representation-learning topic, visit your repo's landing page and select "manage topics."