Useful when two sources are available and you would like to combine them in curtain ways, which would only become possible once they are perfectly aligned and synchronized. For example doing a color transfer, patching black crushed areas, transfering textures, creating a paired dataset, combining high res Blu-ray chroma with better DVD luma, or similar.
- pytorch with cuda
pip install numpy
pip install timm
(optional, only for Temporal Alignment Precision 3)- julek-plugin (optional, only for Temporal Alignment Precision 2)
Put the entire vs_align
folder into your vapoursynth scripts folder.
Or install via pip: pip install git+https://github.com/pifroggi/vs_align.git
Aligns and removes distortions by warping a frame towards a reference frame. See this collection of Comparisons and this one for Mask Usage.
import vs_align
clip = vs_align.spatial(clip, ref, mask=None, precision=3, iterations=1, lq_input=False, device="cuda")
clip
Misaligned clip. Must be in RGB format.
ref
Reference clip that misaligned clip will be aligned to. Output will have these dimensions. Must be in RGB format.
mask
(optional)
Use a mask clip to exclude areas (in white) from warping, like for example a watermark or text. Masked areas will instead be warped like the surrounding pixels. Can be a static single frame or a moving mask.
Can be any format and dimensions.
precision
Speed/Quality tradeoff in the range 1-4, with higher meaning finer more stable alignment up to a subpixel level. Higher is slower and requires more VRAM. 2 or 3 works great in most cases.
iterations
(optional)
Higher iterations can fix larger misalignment > 50 pixel, but are slower. Not needed in most cases. If the misalignment is roughly consistent, a manual shift/crop is recommended over increasing this.
lq_input
(optional)
Enables better handling for low-quality input clips. When set to True general shapes are prioritized over high-frequency details like noise, grain, or compression artifacts by averaging the warping across a small area. Also fixes an issue sometimes noticeable in 2D animation, where lines can get slightly thicker/thinner due to warping.
device
(optional)
Possible values are "cuda" to use with an Nvidia GPU, or "cpu". This will be very slow on CPU.
Tip
While this is pretty good at aligning very different looking clips (see comparisons), you will make it easier and get better results by prefiltering to make ref as close to clip as possible. For example:
- If clip is cropped, crop ref too so they roughly match. Always crop black bars.
- If clip is much brighter than ref, make ref brighter too.
- If the misalignment is larger than around 50 pixels, shift it manually so they roughly align.
Synchronizes a clip with a reference clip by frame matching. It works by searching through a clip and finding the frame that most closely matches the reference clip frame. Sometimes also known as automatic frame remapping.
import vs_align
clip = vs_align.temporal(clip, ref, out=None, tr=20, precision=1, fallback=None, thresh=100.0, clip_num=None, clip_den=None, ref_num=None, ref_den=None, device="cuda", debug=False)
clip
Unsynched clip. Must be same format and dimensions as ref.
ref
Reference clip that unsynched clip will be synched to. Must be same format and dimensions as clip.
out
(optional)
Output clip from which matched frames are copied. By default, frames are matched and copied from clip. However, if providing an out clip, the script will still use clip and ref for frame matching but will copy the actual frames in the final output from out. A common use case is downscaling clip and ref for faster matching while preserving the original high res frames in the output. Can be any format and dimensions.
precision
# | Precision | Speed | Usecase | Method |
---|---|---|---|---|
1 | Worst | Very Fast | Clips look identical, frames are just in the wrong place. | PlaneStats |
2 | Better | Slow | Slight differences like compression, grain, halos. | Butteraugli |
3 | Best | Slow | Large differences like warping, colors, spatial misalignment. | TOPIQ |
tr
Temporal radius determines how many frames to search forwards and backwards for a match. Higher is slower.
fallback
(optional)
Fallback clip in case no good match is found. Must have the same format and dimensions as clip (or out if used).
thresh
(optional)
Threshold for fallback clip. If frames differ more than this value, fallback clip is used. Use debug=True
to get an idea for the values. The ranges differ for each precision value. Does nothing if no fallback clip is set.
clip_num
, clip_den
, ref_num
, ref_den
(optional)
Numerator and Denominator for clip and ref. Only needed if clip and ref have different framerates. This is used to make sure the function searches for a matching frame in the correct location. Some slowdown when used.
Example with clip at 29.97fps and ref at 23.976fps: clip_num=30000, clip_den=1001, ref_num=24000, ref_den=1001
device
(optional)
Possible values are "cuda" to use with an Nvidia GPU, or "cpu". Only affects Precision 3, as the others do not have GPU support.
debug
(optional)
Overlays computed difference scores for all frames within the temporal radius, and the chosen best match, directly onto the frame.
Caution
Performance Considerations: High res frame matching is very slow. For Precision 2 and 3 it is recommended to downscale clip and ref to around 360p and use a high res out clip instead. Both are still very effective at this resolution and far better than Precision 1.
Tip
Frame Matching Quality: Even Precision 3 needs the clips to look somewhat similar. You will make it easier and get better results by prefiltering to make ref as close to clip as possible. For example:
- If one clip is cropped, crop the other too so they match. Always crop black bars.
- If one clip is brighter than the other, you want to make them roughly match.
- If one clip has crushed blacks, you want to crush the other too.
- If one clip is black & white and the other is in color, you want to make them both black & white.
Spatial Alignment
Hardware | Precision | FPS at 720x480 | FPS at 1440x1080 |
---|---|---|---|
RTX 4090 | 1 | ~25 fps | ~22 fps |
RTX 4090 | 2 | ~18 fps | ~14 fps |
RTX 4090 | 3 | ~12 fps | ~7 fps |
RTX 4090 | 4 | ~7 fps | ~2.5 fps |
Temporal Alignment
Hardware | Precision | TR | Resolution | FPS |
---|---|---|---|---|
Ryzen 5900X | 1 | 20 | 1440x1080 | ~200 fps |
Ryzen 5900X | 2 | 20 | 480x360 | ~4 fps |
RTX 4090 | 3 | 20 | 480x360 | ~19 fps |
Depending on the GPU, Precision 3 can now be faster than 2.
Spatial Alignment uses code based on RIFE by hzwer.
Temporal Alignment uses code based on decimatch by po5 and IQA-PyTorch by chaofengc, proposed in the paper TOPIQ by Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan and Weisi Lin.