-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Unaccent
with dandiset search filter
#2142
base: master
Are you sure you want to change the base?
Conversation
Thanks @jjnesbitt. cc @bendichter |
Just realized I never wrote any tests for this. I'll do that. |
param = param.replace('\x00', '') # strip null characters | ||
|
||
return param # noqa: RET504 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the noqa directive here, why not just follow the recommendation to eliminate the unnecessary assignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is parsing the query param, and performing a number of actions on it. In this case, it's just that one .replace
call, but realistically, you could see us adding to it in the future. It just felt a bit awkward/arbitrary to just return param.replace('\x00', '')
🤷♂️ .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just go with ruff's recommendation here. "We might change the code in the future" doesn't seem compelling enough to override its style advice.
Closes #1941
This change normalized accent characters when searching dandisets. For example (from that issue), the words
Buzsaki
andBuzsáki
would now resolve to the same word in search.It seems the introduction of this change has no real impact on the query performance. In fact, perhaps due to now using a less complex search function overall (since we're no longer using the DRF
SearchFilter
class), the performance seems to be ever so slightly better.Postgres docs for the
unaccent
extension: https://www.postgresql.org/docs/current/unaccent.htmlDjango docs for
unaccent
: https://docs.djangoproject.com/en/5.1/ref/contrib/postgres/lookups/#unaccentOf note, we can't simply use the
__unaccent__
lookup without any other changes, as is shown in the Django docs, because themetadata
field is aJSONField
, and that lookup only works onCharfield
andTextfield
.