Video Alignment and Synchonization for Vapoursynth

Useful when two sources are available and you would like to combine them in curtain ways, which would only become possible once they are perfectly aligned and synchronized. For example doing a color transfer, patching black crushed areas, transfering textures, creating a paired dataset, combining high res Blu-ray chroma with better DVD luma, or similar.

Requirements

pytorch with cuda
pip install numpy
pip install timm (optional, only for Temporal Alignment Precision 3)
julek-plugin (optional, only for Temporal Alignment Precision 2)

Setup

Put the entire vs_align folder into your vapoursynth scripts folder.
Or install via pip: pip install git+https://github.com/pifroggi/vs_align.git

Spatial Alignment

Aligns and removes distortions by warping a frame towards a reference frame. See this collection of Comparisons and this one for Mask Usage.

import vs_align
clip = vs_align.spatial(clip, ref, mask=None, precision=3, iterations=1, lq_input=False, device="cuda")

clip
Misaligned clip. Must be in RGB format.

ref
Reference clip that misaligned clip will be aligned to. Output will have these dimensions. Must be in RGB format.

mask (optional)
Use a mask clip to exclude areas (in white) from warping, like for example a watermark or text. Masked areas will instead be warped like the surrounding pixels. Can be a static single frame or a moving mask.
Can be any format and dimensions.

precision
Speed/Quality tradeoff in the range 1-4, with higher meaning finer more stable alignment up to a subpixel level. Higher is slower and requires more VRAM. 2 or 3 works great in most cases.

iterations (optional)
Higher iterations can fix larger misalignment > 50 pixel, but are slower. Not needed in most cases. If the misalignment is roughly consistent, a manual shift/crop is recommended over increasing this.

lq_input (optional)
Enables better handling for low-quality input clips. When set to True general shapes are prioritized over high-frequency details like noise, grain, or compression artifacts by averaging the warping across a small area. Also fixes an issue sometimes noticeable in 2D animation, where lines can get slightly thicker/thinner due to warping.

device (optional)
Possible values are "cuda" to use with an Nvidia GPU, or "cpu". This will be very slow on CPU.

Tip

While this is pretty good at aligning very different looking clips (see comparisons), you will make it easier and get better results by prefiltering to make ref as close to clip as possible. For example:

If clip is cropped, crop ref too so they roughly match. Always crop black bars.
If clip is much brighter than ref, make ref brighter too.
If the misalignment is larger than around 50 pixels, shift it manually so they roughly align.

Temporal Alignment

Synchronizes a clip with a reference clip by frame matching. It works by searching through a clip and finding the frame that most closely matches the reference clip frame. Sometimes also known as automatic frame remapping.

import vs_align
clip = vs_align.temporal(clip, ref, out=None, tr=20, precision=1, fallback=None, thresh=100.0, clip_num=None, clip_den=None, ref_num=None, ref_den=None, device="cuda", debug=False)

clip
Unsynched clip. Must be same format and dimensions as ref.

ref
Reference clip that unsynched clip will be synched to. Must be same format and dimensions as clip.

out (optional)
Output clip from which matched frames are copied. By default, frames are matched and copied from clip. However, if providing an out clip, the script will still use clip and ref for frame matching but will copy the actual frames in the final output from out. A common use case is downscaling clip and ref for faster matching while preserving the original high res frames in the output. Can be any format and dimensions.

precision

#	Precision	Speed	Usecase	Method
1	Worst	Very Fast	Clips look identical, frames are just in the wrong place.	PlaneStats
2	Better	Slow	Slight differences like compression, grain, halos.	Butteraugli
3	Best	Slow	Large differences like warping, colors, spatial misalignment.	TOPIQ

tr
Temporal radius determines how many frames to search forwards and backwards for a match. Higher is slower.

fallback (optional)
Fallback clip in case no good match is found. Must have the same format and dimensions as clip (or out if used).

thresh (optional)
Threshold for fallback clip. If frames differ more than this value, fallback clip is used. Use debug=True to get an idea for the values. The ranges differ for each precision value. Does nothing if no fallback clip is set.

clip_num, clip_den, ref_num, ref_den (optional)
Numerator and Denominator for clip and ref. Only needed if clip and ref have different framerates. This is used to make sure the function searches for a matching frame in the correct location. Some slowdown when used.
Example with clip at 29.97fps and ref at 23.976fps: clip_num=30000, clip_den=1001, ref_num=24000, ref_den=1001

device (optional)
Possible values are "cuda" to use with an Nvidia GPU, or "cpu". Only affects Precision 3, as the others do not have GPU support.

debug (optional)
Overlays computed difference scores for all frames within the temporal radius, and the chosen best match, directly onto the frame.

Caution

Performance Considerations: High res frame matching is very slow. For Precision 2 and 3 it is recommended to downscale clip and ref to around 360p and use a high res out clip instead. Both are still very effective at this resolution and far better than Precision 1.

Tip

Frame Matching Quality: Even Precision 3 needs the clips to look somewhat similar. You will make it easier and get better results by prefiltering to make ref as close to clip as possible. For example:

If one clip is cropped, crop the other too so they match. Always crop black bars.
If one clip is brighter than the other, you want to make them roughly match.
If one clip has crushed blacks, you want to crush the other too.
If one clip is black & white and the other is in color, you want to make them both black & white.

Benchmarks

Spatial Alignment

Hardware	Precision	FPS at 720x480	FPS at 1440x1080
RTX 4090	1	~25 fps	~22 fps
RTX 4090	2	~18 fps	~14 fps
RTX 4090	3	~12 fps	~7 fps
RTX 4090	4	~7 fps	~2.5 fps

Temporal Alignment

Hardware	Precision	TR	Resolution	FPS
Ryzen 5900X	1	20	1440x1080	~200 fps
Ryzen 5900X	2	20	480x360	~4 fps
RTX 4090	3	20	480x360	~19 fps

Depending on the GPU, Precision 3 can now be faster than 2.

Acknowledgements

Spatial Alignment uses code based on RIFE by hzwer.
Temporal Alignment uses code based on decimatch by po5 and IQA-PyTorch by chaofengc, proposed in the paper TOPIQ by Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan and Weisi Lin.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
vs_align		vs_align
LICENSE		LICENSE
README.md		README.md
README_img1.png		README_img1.png
README_img2.png		README_img2.png
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Alignment and Synchonization for Vapoursynth

Requirements

Setup

Spatial Alignment

Temporal Alignment

Benchmarks

Acknowledgements

About

Releases

Languages

License

pifroggi/vs_align

Folders and files

Latest commit

History

Repository files navigation

Video Alignment and Synchonization for Vapoursynth

Requirements

Setup

Spatial Alignment

Temporal Alignment

Benchmarks

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages