Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about usage of VFI model #3

Open
Qianqian3764 opened this issue Jan 14, 2025 · 2 comments
Open

Question about usage of VFI model #3

Qianqian3764 opened this issue Jan 14, 2025 · 2 comments

Comments

@Qianqian3764
Copy link

Hi, thanks for your great job!

For TA method,why use VFI model to generate intermediate frames? Isn't using intermediate frames as labels when training VFI models? My question is, since the middle frame can be obtained, why do we need to use an additional VFI model to generate it?
If the VFI model serves the subsequent multi frame feature fusion, it is reasonable because it needs to be used to generate the optical flow from the target frame to the source frame.

@LiuJF1226
Copy link
Owner

LiuJF1226 commented Jan 14, 2025

Thank you for your attention! The data split used in VFI training is identical to that employed in self-supervised MDE training. In standard self-supervised MDE training, a data sample consists of three consecutive frames ($I_{t-1}$, $I_{t}$, $I_{t+1}$), where $I_{t}$ is the target frame and the other two are source frames. In VFI training, we interpolate $I_{t}$ using $I_{t-1}$ and $I_{t+1}$, with $I_{t}$ as the label, and no other intermediate frames are utilized as labels. Once the VFI model is trained,, we utilize ($I_{t-1}$, $I_{t}$) to synthesize an intermediate frame $I_{t-0.5}$, and ($I_{t}$, $I_{t+1}$) to synthesize $I_{t+0.5}$. $I_{t-0.5}$ and $I_{t+0.5}$ serve as two additional target frames in our following self-supervisef MDE training, for both single-frame and multi-frame depth models. This diversifies data distribution in terporal dimension. Also, the VFI model serves the subsequent multi frame feature fusion as you said.

@Qianqian3764
Copy link
Author

Qianqian3764 commented Jan 15, 2025

Thanks very much for your reply!
This question is completely dispelled. That's a great idea!
I looked at the training code train.py.

image

Why not count the loss of [-0.5,0,0.5] for a single frame?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants