You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We also encountered this problem recently. We found that when using vicuna v1.5 as LLM and Siglip-Large as ViT, the model can not converge. The final loss of pretraining stage is about 2.3~2.5, which make the final model degeneration to a bad performance. After using Qwen2-7B as LLM, there is no non-converge problem and a good performance than Vicuna v1.5-7B. Maybe there is something wrong when training on vicuna v1.5, but we do not found the true reason on it.
Hi, I adopt this Resampler module to LLaVa without slicing, and replace the vision encoder from CLIP to siglip, the loss can not converge.
Any thought about this?
The text was updated successfully, but these errors were encountered: