Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Filter for folders in cleanup old preview job #48581

Merged
merged 1 commit into from
Jan 27, 2025

Conversation

hammer065
Copy link
Contributor

Summary

When running OC\Preview\BackgroundCleanupJob, the main iteration loop in run() expects a folder,
however, getOldPreviewLocations() currently does not filter by mimetype
and therefore can yield a non-folder entry which causes an Exception when constructing the Folder impl.
Filtering for httpd/unix-directory, as getNewPreviewLocations() already does, fixes this issue.

Checklist

@solracsf solracsf added the 3. to review Waiting for reviews label Oct 5, 2024
@solracsf solracsf added this to the Nextcloud 31 milestone Oct 5, 2024
@kesselb
Copy link
Contributor

kesselb commented Oct 7, 2024

Same as #46072?

Copy link
Contributor

Hello there,
Thank you so much for taking the time and effort to create a pull request to our Nextcloud project.

We hope that the review process is going smooth and is helpful for you. We want to ensure your pull request is reviewed to your satisfaction. If you have a moment, our community management team would very much appreciate your feedback on your experience with this PR review process.

Your feedback is valuable to us as we continuously strive to improve our community developer experience. Please take a moment to complete our short survey by clicking on the following link: https://cloud.nextcloud.com/apps/forms/s/i9Ago4EQRZ7TWxjfmeEpPkf6

Thank you for contributing to Nextcloud and we hope to hear from you soon!

(If you believe you should not receive this message, you can add yourself to the blocklist.)

@blizzz blizzz mentioned this pull request Jan 8, 2025
This was referenced Jan 14, 2025
@Altahrim Altahrim mentioned this pull request Jan 21, 2025
@hammer065
Copy link
Contributor Author

hammer065 commented Jan 21, 2025

As the above mentioned PR seems to be staled, I rebased and refined this one.

It now also incorporates the second improvement made in 0c7eeec for the job also ignoring non-numeric named directories.
However, 0c7eeec did this inefficiently by needing to cast all filecache row IDs to a string, which then also results in needing to perform expensive string comparisons for the JOIN.
The same goal is still achieved by further filtering the small amount of entries for this appid of type directory to match against a trivial RegExp.
As an expression to match against a RegExp was not present yet, I implemented it for this PR.

Requesting a review @artonge @kesselb @solracsf

@Altahrim Altahrim mentioned this pull request Jan 23, 2025
@hammer065
Copy link
Contributor Author

Additionally tagging @Altahrim, @blizzz and @nickvergessen for requesting a review.

@solracsf
Copy link
Member

Can you please explain why this PR is better than #46072 ?

@hammer065
Copy link
Contributor Author

hammer065 commented Jan 23, 2025

Can you please explain why this PR is better than #46072 ?

@solracsf Sure thing:

  1. The other PR seems to be inactive.
  2. As I explained in fix: Filter for folders in cleanup old preview job #48581 (comment), the changes in the other PR contain an majorly inefficient query change, which results in casting the numeric ID for all files/folders on the Nextcloud instance to a string to then be string-compared against a (to be expected) small list of files, fully bypassing any index, slowing down the periodic BackgroundCleanupJob.
  3. This PR contains tests for the incorporated changes.

@solracsf solracsf requested review from a team, ArtificialOwl, yemkareems, come-nc, artonge and provokateurin and removed request for a team January 23, 2025 08:32
Copy link
Contributor

@come-nc come-nc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This REGEX thing is a bit scary but I suppose if the tests pass on all DB we should be fine?

Copy link
Contributor

@artonge artonge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the regexp part actually needed? What kind of directory were selected when they should not be?

@hammer065
Copy link
Contributor Author

Is the regexp part actually needed? What kind of directory were selected when they should not be?

@artonge The RegExp is only needed for the following reason:
When JOINing the folder names of the previews folder (representing the filecache IDs of the associated images) against the filecache table, we cast the folder name to an integer.
This is fine in normal circumstances, as an untampered, vanilla instance of Nextcloud only creates and stores folders with only numeric names in this directory.

However, what this (and the above mentioned PR) try to fix/implement is a fix to the issue where a user wants, for whatever reason, create arbitrary files/folders there, for example a CACHEDIR.TAG file usually done to exclude the contents of this directory from backups.

The first change already filters out any file, as the query filters by the httpd/unix-directory MIME-type.
However, #46072 also implemented a second change which would also support arbitrary, user-created folders inside this directory by addressing the case when a non-numeric folder name then is supposed to be casted to an integer, which might be undefined/inconsistent behavior for/across some DB engines.
However, as explained in #48581 (comment), my approach prevents the costly cast of all file-IDs to a string by just filtering out all non-numeric folder names using the RegExp approach implemented in this PR, as there is otherwise no available better method to check if a SQL VARCHAR is an integer, especially across all (supported) DB engines.

I personally do not see the use-case for supporting arbitrary folders in this directory, which is why I initially did not implement this, however I wanted to reimplement the added support for this from the other PR, while not massively tanking its performance while doing so.

tl;dr: It is needed, if we also want to support arbitrary, user-created folders in the preview appdata directory, otherwise not, which would massively simplify this PR (as it was in its previous form).

@come-nc
Copy link
Contributor

come-nc commented Jan 23, 2025

@hammer065 Thank you for your work on this and explanations.
What we were wondering when discussing this issue with @artonge is that the join on (int)name = fileid should already filter out non-numeric folder names. But there may be corner cases depending on DB servers, I remember that cast can behave quite weird in some situations.
Like '1.3' could be cast to 1 and not be filtered out on sqlite I think.

But that’s quite a niche cornercase, and the code after should survive. Worst case scenario I think it will delete a folder in the preview folder which should not be there in the first place.

So all in all I would vote for the simpler version of the PR without the regex usage. Maybe we can keep the regex implementation the queryBuilder though, it may prove useful some day.

@artonge
Copy link
Contributor

artonge commented Jan 23, 2025

Thanks for the detailed summary @hammer065. On my side, I indeed don't see the point in introducing such logic for a hypothetical edge case.

my approach prevents the costly cast of all file-IDs to a string by just filtering out all non-numeric folder names using the RegExp

I would especially be against, as "all non-numeric folder names" would probably be only one.

@artonge
Copy link
Contributor

artonge commented Jan 23, 2025

Maybe we can keep the regex implementation the queryBuilder though, it may prove useful some day.

Not sure about the performance of such operator. Unless it is fast, having it in by default might lead to being it use without knowing better and causing performance issues. If we don't have a use case, let's not keep it.

@hammer065
Copy link
Contributor Author

Please let me know what the end position is after your (internal) discussions, so I can adapt the PR accordingly.

My personal stances are to both not support arbitrary folders, just files, in the previews directory and not to keep the RegEx code.

@come-nc
Copy link
Contributor

come-nc commented Jan 27, 2025

Please let me know what the end position is after your (internal) discussions, so I can adapt the PR accordingly.

My personal stances are to both not support arbitrary folders, just files, in the previews directory and not to keep the RegEx code.

We agree, please remove regex related code and keep the changes minimal, it will also be easier to backport.

Fixes nextcloud#35936.
When running `OC\Preview\BackgroundCleanupJob`, the main iteration loop
in `run()` expects a folder, however, `getOldPreviewLocations()`
currently does not filter by mimetype and therefore can yield a
non-folder entry which causes an Exception when constructing the Folder
impl.
Filtering for `httpd/unix-directory`, as `getNewPreviewLocations()`
already does, fixes this issue.

Signed-off-by: Dario Mehlich <[email protected]>
@hammer065
Copy link
Contributor Author

We agree, please remove regex related code and keep the changes minimal, it will also be easier to backport.

@come-nc Done, you can review this now (finally) final commit :D

@come-nc come-nc added 4. to release Ready to be released and/or waiting for tests to finish and removed 3. to review Waiting for reviews labels Jan 27, 2025
@come-nc come-nc modified the milestones: Nextcloud 31, Nextcloud 32 Jan 27, 2025
@come-nc
Copy link
Contributor

come-nc commented Jan 27, 2025

/backport to stable31

@come-nc
Copy link
Contributor

come-nc commented Jan 27, 2025

/backport to stable30

@come-nc
Copy link
Contributor

come-nc commented Jan 27, 2025

/backport to stable29

@AndyScherzinger AndyScherzinger merged commit b70654a into nextcloud:master Jan 27, 2025
176 of 179 checks passed
Copy link

welcome bot commented Jan 27, 2025

Thanks for your first pull request and welcome to the community! Feel free to keep them coming! If you are looking for issues to tackle then have a look at this selection: https://github.com/nextcloud/server/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22

@come-nc
Copy link
Contributor

come-nc commented Jan 27, 2025

@hammer065 Thanks a lot for your patience and team work here!

Would you be interested in a Guest account on our Nextcloud Talk instance where we host a dev chat?

@hammer065
Copy link
Contributor Author

Would you be interested in a Guest account on our Nextcloud Talk instance where we host a dev chat?
@come-nc Sure thing, would definitely speed up the discussion process for any potential future PRs :D
Can this be linked/Is this linkable to my account on my own Nextcloud instance or would it be a separate account all together?

@come-nc
Copy link
Contributor

come-nc commented Jan 28, 2025

Would you be interested in a Guest account on our Nextcloud Talk instance where we host a dev chat?
@come-nc

Sure thing, would definitely speed up the discussion process for any potential future PRs :D
Can this be linked/Is this linkable to my account on my own Nextcloud instance or would it be a separate account all together?

I invited you on the email used for the commit sign-off.
It’s a guest account using the guests app, it’s not linkable to a federated account. I think we did have federation enabled for Talk on our instance, but I’m not sure whether it’s enabled for community chat yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4. to release Ready to be released and/or waiting for tests to finish feedback-requested
Projects
None yet
6 participants