Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter "Text should not be present" (block change detection when text exists) - Empty diff triggers change detection #2548

Closed
Noki opened this issue Aug 5, 2024 · 20 comments · Fixed by #2709
Assignees
Labels
bug Something isn't working triage

Comments

@Noki
Copy link

Noki commented Aug 5, 2024

Describe the bug

I sometimes get change detection notifications even though the diff is empty. My notification body setting looks like this:

{{watch_title}} / {{watch_url}} had a change.<br><br>

see diff: {{diff_url}}<br>
---diff---<br>
{{diff}}<br>
---diff---<br>

Whenever it happens the notification looks like this:

image

[...] had a change.<br><br>

see diff: http://diskstation:3003/diff/040867f4-9a48-4a24-b4e1-9f6b2b84b626<br>
---diff---<br>
<br>
---diff---<br>

In addition I can't download the latest HTML snapshot:

image

Version

v0.46.02

To Reproduce

Can't reproduce. Can't share the check, as it contains sensitive information.
Best guess: An internal error is not handled properly. It might have to do with the "Remove elements" and "Block change-detection while text matches" features, which I both use for the check. Maybe a blocked change detection still stores data and the following run is incorrectly compared against it.

I found the following in the docker logfile:

image

It clearly states that the hash is different.

image

Expected behavior

Never send out notifications in case of an empty diff. Do not detect a change in this case.

@petersg83
Copy link

Do you have any text to ignore in the diff?
I have the same issue and my intuition is that the text to ignore triggered the notification whereas it shouldn't. And it causes an empty notification if only the text to ignore changed.

@Noki
Copy link
Author

Noki commented Aug 8, 2024

The ignore text is indeed the diff. I'm checking a service which produces unstable results. Sometimes it responds with a result list, sometimes it can't access it's database and responds with a error message e.g. "No results available". That error message is exactly the text I ignore as I am only interested in real result changes.

@SansGuidon
Copy link

It's quite annoying, I've hundred of watches and have configured texts to be ignored globally and per watch, I would love an option or more control on what kind of diff triggers a change.

Note that I'm following changes on my changedetection instance through RSS feed.

@dgtlmoon
Copy link
Owner

@SansGuidon

It's quite annoying, I've hundred of watches and have configured texts to be ignored globally and per watch, I would love an option or more control on what kind of diff triggers a change.

Can you be any more specific here? are you saying the global ignore doesnt work or? what is the problem you are seeing exactly?

@dgtlmoon
Copy link
Owner

dgtlmoon commented Aug 19, 2024

@Noki

The ignore text is indeed the diff.

Yes that is 100% correct, it SHOULD be in the diff, "ignore text" only ignores the text when detecting the difference, ofcourse, the application should show you the whole difference

@SansGuidon
Copy link

SansGuidon commented Aug 19, 2024

@dgtlmoon

@SansGuidon

It's quite annoying, I've hundred of watches and have configured texts to be ignored globally and per watch, I would love an option or more control on what kind of diff triggers a change.

Can you be any more specific here? are you saying the global ignore doesnt work or? what is the problem you are seeing exactly?

I get notified about changes both through RSS feed and through Changedetection UI.
However the diffs shows nothing has changed, because the changes are indeed ignored globally or per watch.

In the RSS feed, it will result as empty content (just a link).
In Changedetection UI, nothing interesting is shown, of course I can click "show current snapshot" then I see the texts that were ignored.

Soe the issue is indeed that despite the ignore text feature works, we are yet notified about changes, while there is none to be shown.

@Noki
Copy link
Author

Noki commented Aug 19, 2024

@Noki

The ignore text is indeed the diff.

Yes that is 100% correct, it SHOULD be in the diff, "ignore text" only ignores the text when detecting the difference, ofcourse, the application should show you the whole difference

Ok. So how do I avoid triggering a change in this case? Any suggestions?

@dgtlmoon
Copy link
Owner

@Noki

Ok. So how do I avoid triggering a change in this case? Any suggestions?

Add the text that might change - which you want to ignore in the change detection - to the various ignore text boxes in the [Edit] tab of the watch, use the "preview" to Visualise what would be ignored (it will highly ignored lines in grey)

Did you find that part in the UI? Espicially the last part?

@Noki
Copy link
Author

Noki commented Aug 19, 2024

@dgtlmoon

I don't want to ignore text. I want to block the change detection completely. I am using this one but it does not work properly:

image

The markup of the HTML is quite simple so a Text comparison should be fine.

image

I added both texts exactly as is and they should match and stop the change detection.

Note: The last time I had the error I described was at Agust 9th, so I can't really tell if it is fixed now, or if the website I am monitoring is just more stable these days. Will get back to you on this if it happens again.

@dgtlmoon
Copy link
Owner

dgtlmoon commented Aug 19, 2024

@Noki click [Diff] from the main list, then [Show current snapshot] in the top left corner, you should see that text (I assume in your blacked-out box) should still show in the diff because its part of the snapshot, can you confirm it? also hit [recheck]

@Noki
Copy link
Author

Noki commented Oct 1, 2024

@dgtlmoon I just created a test case for you to reproduce my issue.

The test page https://home.tobias-schwarz.com/changedetection/ switches every 3 minutes from showing a table to showing a No Results Found-message, during which change detection should be blocked, the result however is a continous stream of alerts with empty diffs.

The following is the test description from the backup json:

        "945511ee-0556-4725-9f1c-2177c099b940": {
            "body": "",
            "browser_steps": [
                {
                    "operation": null,
                    "selector": "",
                    "optional_value": ""
                },
                {
                    "operation": "Choose one",
                    "selector": "",
                    "optional_value": ""
                },
                {
                    "operation": "Choose one",
                    "selector": "",
                    "optional_value": ""
                },
                {
                    "operation": "Choose one",
                    "selector": "",
                    "optional_value": ""
                },
                {
                    "operation": "Choose one",
                    "selector": "",
                    "optional_value": ""
                },
                {
                    "operation": "Choose one",
                    "selector": "",
                    "optional_value": ""
                },
                {
                    "operation": "Choose one",
                    "selector": "",
                    "optional_value": ""
                },
                {
                    "operation": "Choose one",
                    "selector": "",
                    "optional_value": ""
                },
                {
                    "operation": "Choose one",
                    "selector": "",
                    "optional_value": ""
                },
                {
                    "operation": "Choose one",
                    "selector": "",
                    "optional_value": ""
                }
            ],
            "browser_steps_last_error_step": null,
            "check_count": 5,
            "check_unique_lines": false,
            "consecutive_filter_failures": 0,
            "date_created": 1727786400,
            "extract_text": [],
            "extract_title_as_title": false,
            "fetch_backend": "html_webdriver",
            "fetch_time": 32.777,
            "filter_failure_notification_send": true,
            "filter_text_added": true,
            "filter_text_removed": true,
            "filter_text_replaced": true,
            "follow_price_changes": true,
            "has_ldjson_price_data": false,
            "headers": {},
            "ignore_text": [],
            "in_stock_only": true,
            "include_filters": [
                "#results"
            ],
            "last_checked": 1727787126,
            "last_error": false,
            "last_viewed": 0,
            "method": "GET",
            "notification_alert_count": 1,
            "notification_body": "",
            "notification_format": "System default",
            "notification_muted": false,
            "notification_screenshot": false,
            "notification_title": "",
            "notification_urls": [],
            "paused": false,
            "previous_md5": "84bd141ac2b48c2dfb2094e964a57df6",
            "previous_md5_before_filters": "0a00044134dfd8cb4d1c176e15627838",
            "processor": "text_json_diff",
            "price_change_threshold_percent": null,
            "proxy": null,
            "remote_server_reply": "cloudflare",
            "sort_text_alphabetically": false,
            "subtractive_selectors": [],
            "tag": "",
            "tags": [
                "6d228dfc-da55-443f-aae7-2d582de99dda"
            ],
            "text_should_not_be_present": [
                "No Results Found"
            ],
            "time_between_check": {
                "weeks": null,
                "days": null,
                "hours": null,
                "minutes": 1,
                "seconds": null
            },
            "time_between_check_use_default": false,
            "title": "Test Changedetection",
            "track_ldjson_price_data": null,
            "trigger_text": [],
            "url": "https://home.tobias-schwarz.com/changedetection/",
            "uuid": "945511ee-0556-4725-9f1c-2177c099b940",
            "webdriver_delay": null,
            "webdriver_js_execute_code": "",
            "ignore_status_codes": false,
            "save_button": true,
            "last_notification_error": false,
            "content_type": "text/html; charset=utf-8",
            "last_check_status": 200
        }

I hope this helps you to reproduce the issue. Expected behaviour would be to only trigger a change detection if a value of the table would change, when the table is shown.

@Noki
Copy link
Author

Noki commented Oct 10, 2024

@dgtlmoon can you reproduce the issue with my test case?

@dgtlmoon
Copy link
Owner

@Noki actually while refactoring over at #2691 i discovered that in some cases tab \t could have caused it too, it wasnt being filtered out

@Noki
Copy link
Author

Noki commented Oct 12, 2024

@dgtlmoon The issue still persistrs with v0.47.03. I just activated and ran the test from above again and got identical output of empty diffs.

@dgtlmoon
Copy link
Owner

@Noki thanks for the info! hmm so its like if "No Results Found" exists, then block/ignore

then it switches back to the version with the table and no "No Results Found"

I cant reproduce it on my end yet

@dgtlmoon
Copy link
Owner

update: yes i can reproduce it :)

@dgtlmoon
Copy link
Owner

dgtlmoon commented Oct 14, 2024

I'm wondering if the bug is that it's storing the md5 checksum of the content as the new checksum (for detecting the change) even when theres a "block when text.." in place, it sure looks like it

text_should_not_be_present = watch.get('text_should_not_be_present', [])
if len(text_should_not_be_present):
# If anything matched, then we should block a change from happening
result = html_tools.strip_ignore_text(content=str(stripped_text_from_html),
wordlist=text_should_not_be_present,
mode="line numbers")
if result:
blocked = True
# The main thing that all this at the moment comes down to :)
if watch.get('previous_md5') != fetched_md5:
changed_detected = True
# Looks like something changed, but did it match all the rules?
if blocked:
changed_detected = False

@dgtlmoon dgtlmoon changed the title Empty diff triggers change detection Filter "Text should not be present" (block change detection when text exists) - Empty diff triggers change detection Oct 14, 2024
@dgtlmoon dgtlmoon added the bug Something isn't working label Oct 14, 2024
dgtlmoon added a commit that referenced this issue Oct 14, 2024
@dgtlmoon
Copy link
Owner

@Noki can you try the :dev docker image tag?

@Noki
Copy link
Author

Noki commented Oct 15, 2024

@dgtlmoon my test is stable with :dev. Looking forward to the official release. Thanks for fixing this one, was quite annoying. Btw.: I think an option ot export/import a single test as JSON would be really helpful for bug-reports as anybody could share his test that way and make it reproducable for you.

@dgtlmoon
Copy link
Owner

dgtlmoon commented Oct 15, 2024

Btw.: I think an option ot export/import a single test as JSON would be really helpful for bug-reports as anybody could share his test that way and make it reproducable for you.

PR: #2605

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
4 participants