You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tested on 3D-OVS data. It can achieve similar results (IoU) on the sofa scene.
But the performance for other scenes are not good as expeted.
I noted the language feature images are well-trained, the reason should be the setting of the eval code, such as threshold and the kernal size. Does it mean we need to try the setting manually to achieve the best results?
Below is the sample of the bench scene, including language feature image, groundtruth and the predicted mask. Do you have any suggestion?
The text was updated successfully, but these errors were encountered:
I tested on 3D-OVS data. It can achieve similar results (IoU) on the sofa scene.
But the performance for other scenes are not good as expeted.
I noted the language feature images are well-trained, the reason should be the setting of the eval code, such as threshold and the kernal size. Does it mean we need to try the setting manually to achieve the best results?
Below is the sample of the bench scene, including language feature image, groundtruth and the predicted mask. Do you have any suggestion?
The text was updated successfully, but these errors were encountered: