Skip to content

Video Alignment and Synchonization functions for Vapoursynth

License

Notifications You must be signed in to change notification settings

pifroggi/vs_align

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video Alignment and Synchonization for Vapoursynth

Useful when two sources are available and you would like to combine them in curtain ways, which would only become possible once they are perfectly aligned and synchronized. For example doing a color transfer, patching black crushed areas, transfering textures, creating a paired dataset, combining high res Blu-ray chroma with better DVD luma, or similar.

Requirements

  • pytorch with cuda
  • pip install numpy
  • pip install timm (optional, only for Temporal Alignment Precision 3)
  • julek-plugin (optional, only for Temporal Alignment Precision 2)

Setup

Put the entire vs_align folder into your vapoursynth scripts folder.
Or install via pip: pip install git+https://github.com/pifroggi/vs_align.git


Spatial Alignment

Aligns and removes distortions by warping a frame towards a reference frame. See this collection of Comparisons and this one for Mask Usage.

import vs_align
clip = vs_align.spatial(clip, ref, mask=None, precision=3, iterations=1, lq_input=False, device="cuda")

clip
Misaligned clip. Must be in RGB format.

ref
Reference clip that misaligned clip will be aligned to. Output will have these dimensions. Must be in RGB format.

mask (optional)
Use a mask clip to exclude areas (in white) from warping, like for example a watermark or text. Masked areas will instead be warped like the surrounding pixels. Can be a static single frame or a moving mask.
Can be any format and dimensions.

precision
Speed/Quality tradeoff in the range 1-4, with higher meaning finer more stable alignment up to a subpixel level. Higher is slower and requires more VRAM. 2 or 3 works great in most cases.

iterations (optional)
Higher iterations can fix larger misalignment > 50 pixel, but are slower. Not needed in most cases. If the misalignment is roughly consistent, a manual shift/crop is recommended over increasing this.

lq_input (optional)
Enables better handling for low-quality input clips. When set to True general shapes are prioritized over high-frequency details like noise, grain, or compression artifacts by averaging the warping across a small area. Also fixes an issue sometimes noticeable in 2D animation, where lines can get slightly thicker/thinner due to warping.

device (optional)
Possible values are "cuda" to use with an Nvidia GPU, or "cpu". This will be very slow on CPU.

Tip

While this is pretty good at aligning very different looking clips (see comparisons), you will make it easier and get better results by prefiltering to make ref as close to clip as possible. For example:

  • If clip is cropped, crop ref too so they roughly match. Always crop black bars.
  • If clip is much brighter than ref, make ref brighter too.
  • If the misalignment is larger than around 50 pixels, shift it manually so they roughly align.

Temporal Alignment

Synchronizes a clip with a reference clip by frame matching. It works by searching through a clip and finding the frame that most closely matches the reference clip frame. Sometimes also known as automatic frame remapping.

import vs_align
clip = vs_align.temporal(clip, ref, out=None, tr=20, precision=1, fallback=None, thresh=100.0, clip_num=None, clip_den=None, ref_num=None, ref_den=None, device="cuda", debug=False)

clip
Unsynched clip. Must be same format and dimensions as ref.

ref
Reference clip that unsynched clip will be synched to. Must be same format and dimensions as clip.

out (optional)
Output clip from which matched frames are copied. By default, frames are matched and copied from clip. However, if providing an out clip, the script will still use clip and ref for frame matching but will copy the actual frames in the final output from out. A common use case is downscaling clip and ref for faster matching while preserving the original high res frames in the output. Can be any format and dimensions.

precision

# Precision Speed Usecase Method
1 Worst Very Fast Clips look identical, frames are just in the wrong place. PlaneStats
2 Better Slow Slight differences like compression, grain, halos. Butteraugli
3 Best Slow Large differences like warping, colors, spatial misalignment. TOPIQ

tr
Temporal radius determines how many frames to search forwards and backwards for a match. Higher is slower.

fallback (optional)
Fallback clip in case no good match is found. Must have the same format and dimensions as clip (or out if used).

thresh (optional)
Threshold for fallback clip. If frames differ more than this value, fallback clip is used. Use debug=True to get an idea for the values. The ranges differ for each precision value. Does nothing if no fallback clip is set.

clip_num, clip_den, ref_num, ref_den (optional)
Numerator and Denominator for clip and ref. Only needed if clip and ref have different framerates. This is used to make sure the function searches for a matching frame in the correct location. Some slowdown when used.
Example with clip at 29.97fps and ref at 23.976fps: clip_num=30000, clip_den=1001, ref_num=24000, ref_den=1001

device (optional)
Possible values are "cuda" to use with an Nvidia GPU, or "cpu". Only affects Precision 3, as the others do not have GPU support.

debug (optional)
Overlays computed difference scores for all frames within the temporal radius, and the chosen best match, directly onto the frame.

Caution

Performance Considerations: High res frame matching is very slow. For Precision 2 and 3 it is recommended to downscale clip and ref to around 360p and use a high res out clip instead. Both are still very effective at this resolution and far better than Precision 1.

Tip

Frame Matching Quality: Even Precision 3 needs the clips to look somewhat similar. You will make it easier and get better results by prefiltering to make ref as close to clip as possible. For example:

  • If one clip is cropped, crop the other too so they match. Always crop black bars.
  • If one clip is brighter than the other, you want to make them roughly match.
  • If one clip has crushed blacks, you want to crush the other too.
  • If one clip is black & white and the other is in color, you want to make them both black & white.

Benchmarks

Spatial Alignment

Hardware Precision FPS at 720x480 FPS at 1440x1080
RTX 4090 1 ~25 fps ~22 fps
RTX 4090 2 ~18 fps ~14 fps
RTX 4090 3 ~12 fps ~7 fps
RTX 4090 4 ~7 fps ~2.5 fps

Temporal Alignment

Hardware Precision TR Resolution FPS
Ryzen 5900X 1 20 1440x1080 ~200 fps
Ryzen 5900X 2 20 480x360 ~4 fps
RTX 4090 3 20 480x360 ~19 fps

Depending on the GPU, Precision 3 can now be faster than 2.


Acknowledgements

Spatial Alignment uses code based on RIFE by hzwer.
Temporal Alignment uses code based on decimatch by po5 and IQA-PyTorch by chaofengc, proposed in the paper TOPIQ by Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan and Weisi Lin.