We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
对于一个具有(N)层的普通视觉Transformer(ViT)[19],输入图像被划分为(m)个固定大小的 patches(小块) $({I_{j} \in \mathbb{R}^{3 ×h ×w} | j \in \mathbb{N}, 1 ≤j ≤m})$ 。(h)、(w)分别是图像小块的高度和宽度。然后,每个小块首先通过位置编码嵌入到(d)维潜在空间中: $(e_{0}^{j}=Embed\left(I_{j}\right) e_{0}^{j} \in \mathbb{R}^{d}, j = 1,2, \ldots m)$。(1)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
VPT - Visual Prompt Tuning
对于一个具有(N)层的普通视觉Transformer(ViT)[19],输入图像被划分为(m)个固定大小的 patches(小块)
$({I_{j} \in \mathbb{R}^{3 ×h ×w} | j \in \mathbb{N}, 1 ≤j ≤m})$
$(e_{0}^{j}=Embed\left(I_{j}\right) e_{0}^{j} \in \mathbb{R}^{d}, j = 1,2, \ldots m)$ 。(1)
。(h)、(w)分别是图像小块的高度和宽度。然后,每个小块首先通过位置编码嵌入到(d)维潜在空间中:
Arch
VPT - Shallow
VPT-Deep
Reference
The text was updated successfully, but these errors were encountered: