Use `Unaccent` with dandiset search filter #2142

jjnesbitt · 2025-01-17T18:41:43Z

Closes #1941

This change normalized accent characters when searching dandisets. For example (from that issue), the words Buzsaki and Buzsáki would now resolve to the same word in search.

It seems the introduction of this change has no real impact on the query performance. In fact, perhaps due to now using a less complex search function overall (since we're no longer using the DRF SearchFilter class), the performance seems to be ever so slightly better.

Postgres docs for the unaccent extension: https://www.postgresql.org/docs/current/unaccent.html
Django docs for unaccent: https://docs.djangoproject.com/en/5.1/ref/contrib/postgres/lookups/#unaccent

Of note, we can't simply use the __unaccent__ lookup without any other changes, as is shown in the Django docs, because the metadata field is a JSONField, and that lookup only works on Charfield and Textfield.

kabilar · 2025-01-17T19:18:57Z

Thanks @jjnesbitt. cc @bendichter

jjnesbitt · 2025-01-20T22:12:48Z

Just realized I never wrote any tests for this. I'll do that.

dandiapi/api/views/dandiset.py

waxlamp · 2025-01-21T23:51:38Z

dandiapi/api/views/dandiset.py

+        param = param.replace('\x00', '')  # strip null characters
+
+        return param  # noqa: RET504


Instead of the noqa directive here, why not just follow the recommendation to eliminate the unnecessary assignment?

This function is parsing the query param, and performing a number of actions on it. In this case, it's just that one .replace call, but realistically, you could see us adding to it in the future. It just felt a bit awkward/arbitrary to just return param.replace('\x00', '') 🤷‍♂️ .

I think just go with ruff's recommendation here. "We might change the code in the future" doesn't seem compelling enough to override its style advice.

jjnesbitt added 2 commits January 17, 2025 12:54

Add migration to activate the unaccent extension

a68e567

Use Unaccent with dandiset search filter

7288786

jjnesbitt requested review from waxlamp and mvandenburgh January 17, 2025 18:41

waxlamp marked this pull request as draft January 21, 2025 23:41

waxlamp reviewed Jan 21, 2025

View reviewed changes

dandiapi/api/views/dandiset.py Show resolved Hide resolved

waxlamp reviewed Jan 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `Unaccent` with dandiset search filter #2142

Use `Unaccent` with dandiset search filter #2142

jjnesbitt commented Jan 17, 2025 •

edited

Loading

kabilar commented Jan 17, 2025

jjnesbitt commented Jan 20, 2025

waxlamp Jan 21, 2025

jjnesbitt Jan 22, 2025

waxlamp Jan 22, 2025

		param = param.replace('\x00', '') # strip null characters

		return param # noqa: RET504

Use Unaccent with dandiset search filter #2142

Are you sure you want to change the base?

Use Unaccent with dandiset search filter #2142

Conversation

jjnesbitt commented Jan 17, 2025 • edited Loading

kabilar commented Jan 17, 2025

jjnesbitt commented Jan 20, 2025

waxlamp Jan 21, 2025

Choose a reason for hiding this comment

jjnesbitt Jan 22, 2025

Choose a reason for hiding this comment

waxlamp Jan 22, 2025

Choose a reason for hiding this comment

Use `Unaccent` with dandiset search filter #2142

Use `Unaccent` with dandiset search filter #2142

jjnesbitt commented Jan 17, 2025 •

edited

Loading