Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Precision and Recall Metrics #38

Open
YoungseokOh opened this issue Sep 5, 2024 · 2 comments
Open

Incorrect Precision and Recall Metrics #38

YoungseokOh opened this issue Sep 5, 2024 · 2 comments

Comments

@YoungseokOh
Copy link

YoungseokOh commented Sep 5, 2024

Hi,

I believe there's an issue with the get_confidence_list() function.

When I used your pre-trained model, I couldn't achieve the same performance metrics as you reported.

I think the else clause should be removed because it calculates values that should not be considered.

The function should only handle cases where the prediction is a true positive and matches well with the ground truth (GT).

If a prediction does not match well with a ground truth, it should not append 0 to the same list (true_positive_list).

Appending 0 incorrectly includes non-matching predictions in the calculation, which creates problems when calculating precision and recall.

I will wait your reply.

Thanks

@Teoge
Copy link
Owner

Teoge commented Sep 9, 2024

For each ground truth, the algorithm attempts to find a matching prediction. If no prediction matches the ground truth (indicating that the model failed to detect the object), the algorithm appends a 0 to the true_positive_list.

When calculating precision and recall, a threshold is used to divide the true_positive_list into true positives and false negatives:

false_negatives = bisect.bisect_left(true_positive_list, thresh)
true_positives = len(true_positive_list) - false_negatives

As long as the threshold is greater than 0, these appended 0s in the true_positive_list will always be treated as false negatives, which is the correct behavior. Ignoring them would result in an incorrect false negative count, leading to underestimation.

@ihoofans
Copy link

for i in range(num_samples): ground_truths = ground_truths_list[i] predictions = predictions_list[i] prediction_matched = [False] * len(predictions) for ground_truth in ground_truths: idx = match_gt_with_preds(ground_truth, predictions, match_labels) if idx >= 0: prediction_matched[idx] = True true_positive_list.append(predictions[idx][0]) else: true_positive_list.append(.0) for idx, pred_matched in enumerate(prediction_matched): if not pred_matched: false_positive_list.append(predictions[idx][0])
这样有些奇怪,个人的看法应该是对于某个预测值,用真值来判断,而不是对于每个真值来用预测值来匹配,true_positive_list应该是预测值的长度,后面根据置信度阈值计算rank,可能是我不太成熟的想法,或者没有理解代码,希望作者回复,谢谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants