Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Bounding Box Results with OWLv2 Image-Guided Detection #463

Open
HuzefaAnver opened this issue Sep 6, 2024 · 0 comments
Open

Comments

@HuzefaAnver
Copy link

HuzefaAnver commented Sep 6, 2024

hey, i have been trying to label the data with owlv2. i have tried 2 different codes for the same task one without postprocessing which is giving good results and another with postprocessing as i want correct annotations on my original image which is (704, 576). pre processing part automatically resize the image to (960, 960). i have tried changing different threshold=0.98, nms_threshold=1.0.

I just want correct annotations on image size (704, 576).

the code without post processing (giving perfect bounding box on image size (960, 960) )

target_image = Image.open(image_path)

target_pixel_values = processor(images=target_image, return_tensors="pt").pixel_values
unnormalized_target_image = get_preprocessed_image(target_pixel_values)

with torch.no_grad():
feature_map = model.image_embedder(target_pixel_values)[0]

b, h, w, d = feature_map.shape
target_boxes = model.box_predictor(
feature_map.reshape(b, h * w, d), feature_map=feature_map
)

target_class_predictions = model.class_predictor(
feature_map.reshape(b, h * w, d),
torch.tensor(query_embedding[None, None, ...]), # [batch, queries, d]
)[0]

target_boxes = np.array(target_boxes[0].detach())
target_logits = np.array(target_class_predictions[0].detach())
top_ind = np.argmax(target_logits[:, 0], axis=0)
score = sigmoid(target_logits[top_ind, 0])
top_boxes = target_boxes[top_ind]

correct results


the code with post processing (the result is inaccurate on both image size)

import requests
from PIL import Image
import torch

from transformers import Owlv2Processor, Owlv2ForObjectDetection

processor = Owlv2Processor.from_pretrained("google/owlv2-base-patch16-ensemble")
model = Owlv2ForObjectDetection.from_pretrained("google/owlv2-base-patch16-ensemble")
source_image = Image.open('./source_image.jpg')
target_image = Image.open('./all_images/2024-08-19-114707271_1000032$1$0$21_1-00000.jpg')
inputs = processor(images=target_image, query_images=source_image, return_tensors="pt",threshold=0.98, nms_threshold=1.0)
with torch.no_grad():
outputs = model.image_guided_detection(**inputs)
target_sizes = torch.Tensor([target_image.size[::-1]])
results = processor.post_process_image_guided_detection(outputs=outputs, target_sizes=target_sizes, threshold=0.98, nms_threshold=1.0)
boxes, scores = results[i]["boxes"], results[i]["scores"]
for box, score in zip(boxes, scores):
box = [round(i, 2) for i in box.tolist()]
print(f"Detected similar object with confidence {round(score.item(), 3)} at location {box}")

wrong result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant