Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Unaccent with dandiset search filter #2142

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

jjnesbitt
Copy link
Member

@jjnesbitt jjnesbitt commented Jan 17, 2025

Closes #1941

This change normalized accent characters when searching dandisets. For example (from that issue), the words Buzsaki and Buzsáki would now resolve to the same word in search.

It seems the introduction of this change has no real impact on the query performance. In fact, perhaps due to now using a less complex search function overall (since we're no longer using the DRF SearchFilter class), the performance seems to be ever so slightly better.

Postgres docs for the unaccent extension: https://www.postgresql.org/docs/current/unaccent.html
Django docs for unaccent: https://docs.djangoproject.com/en/5.1/ref/contrib/postgres/lookups/#unaccent

Of note, we can't simply use the __unaccent__ lookup without any other changes, as is shown in the Django docs, because the metadata field is a JSONField, and that lookup only works on Charfield and Textfield.

@kabilar
Copy link
Member

kabilar commented Jan 17, 2025

Thanks @jjnesbitt. cc @bendichter

@jjnesbitt
Copy link
Member Author

Just realized I never wrote any tests for this. I'll do that.

@waxlamp waxlamp marked this pull request as draft January 21, 2025 23:41
Comment on lines +126 to +128
param = param.replace('\x00', '') # strip null characters

return param # noqa: RET504
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the noqa directive here, why not just follow the recommendation to eliminate the unnecessary assignment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is parsing the query param, and performing a number of actions on it. In this case, it's just that one .replace call, but realistically, you could see us adding to it in the future. It just felt a bit awkward/arbitrary to just return param.replace('\x00', '') 🤷‍♂️ .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just go with ruff's recommendation here. "We might change the code in the future" doesn't seem compelling enough to override its style advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Searching for names with accented characters.
3 participants