-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What should we do with former judges? #34
Comments
On topic: I'd prefer a combination of the first two points, i.e. you give the ability to include former judges via a query parameter, and if you do so, the datetime column provides information on when exactly the person was removed. Slightly off-topic: Does the 'former_judges' parameter still work as intended in your latest API release? I cannot seem to find any differences in the API response if I make a request with 'true' or 'false'. To illustrate, I created a Python function to request all judges at this moment in time using your API. Since I'm experimenting with the data, I figured it would be better if I request all judges once and then store them locally. This is my code: def query(search_query: str, limit: int = 100, page: int = 0) -> OpenRechtspraakResponse:
res = requests.get("http://openrechtspraak.nl/api/v1/person", params={
'offset': limit * page,
'limit': limit,
'former_judges': 'true', <--- the parameter
'q': search_query
})
res.raise_for_status()
json: OpenRechtspraakResponse = res.json()
return json
def all_judges():
_all_judges: list[Judge] = []
judge_count = query('%')['count']
# [0, 1, 2, ..., 45]
pages_to_fetch = [limit // 100 for limit in range(0, judge_count, 100)]
for page in pages_to_fetch:
subset = query('%', 100, page)
for judge in subset['data']:
_all_judges.append(judge)
return _all_judges In the code snippet the parameter is set to 'true', though if I set it to 'false' I get the same count of judges in the response. |
That's weird. I tried it myself, and I did get a different count. Additionally, we've got multiple unit tests for this query parameter that test this. My hypothesis is that there is an ordering issue: we order by last name, but there are several judges with the same last name. In my script below, you'll see that I increment the offset by 75 instead of the limit. This prevents the ordering bug from happening. I checked whether your script has a bug by changing _all_judges to a set, adding the id of each judge to the set and printing the length of the set after every page. If you do this, you'll see an unexpected length after a few pages: I'll open up a PR to apply a definitive fix (order by last name AND id). I hope this will be deployed next week. I'll ping you once it's up; could you then check whether this fixes your problem? My script: import requests
offset = 0
limit = 100
ids = set()
while True:
r = requests.get("https://openrechtspraak.nl/api/v1/person",
params={"limit": limit, "offset": offset, "former_judges": "true"})
r.raise_for_status()
data = r.json().get("data")
if len(data) == 0:
break
ids |= {i.get("id") for i in data}
print(len(ids))
offset += 75
print(ids, len(ids)) |
I've included the field |
For sure. Looking forward to the ping! |
@ThijmenDam this fix is now online. Can you verify whether this fixes your problem? |
@Jonakemon yes will do! I'll let you know as soon as I have time, probably tonight or tomorrow night. |
Hi @Jonakemon! Thanks for the effort you put in this week. Using the code snippet I posted above, these are my results when fetching all judges: Parameter Thus, the parameter does not seem to make a difference. Within the set of 4591 judges, 24 judges now contain the new property Random example: {
'first_name': None,
'gender': 'male',
'id': '56d1821d-1a2f-491b-81c0-6d0ea89c287b',
'initials': 'K.',
'last_name': 'Meijde',
'professional_details': [{'function': 'Raadsheer',
'id': 'af0e7be2-7631-442e-9b40-2370312a936b',
'organisation': "Gerechtshof 's-Hertogenbosch"}],
'rechtspraak_id': 'uluiy2DjE4JH29e9O7nH5rxmzOsyxQIz',
'removed_from_rechtspraak_at': '2023-05-26T01:53:05.230291',
'titles': 'dhr. mr.',
'toon_naam': 'dhr. mr. K. van der Meijde',
'toon_naam_kort': 'K. van der Meijde'
} However, despite the |
Hi @ThijmenDam, using my own code snippet, I do get a difference in judges: 4591 vs 4573 (that's 18, not 24). I did notice that we broke the count field which causes your script to be confused. This will be deployed somewhere next week. I also added quite some extra tests to prevent something like this from breaking again. Regarding finding the example you posted, I don't think this judge is removed. When I query the API, I don't get the same result: https://openrechtspraak.nl/api/v1/person?q=Meijde. When exactly did you query the API? We rescrape the dataset every night. You might have queried the API at the same time when we did the rescrape. It's an edge case, but I can't think of another reason why that could have happened. |
@ThijmenDam this is now (definitively) fixed! 🎉 If you visit the API with and without query parameter, you'll also see a difference in count! Please open up a new issue if it's not resolved or if you encounter other bugs. Thanks for helping us out by reporting it! :) |
Former judges are judges whose profile is not available anymore on https://namenlijst.rechtspraak.nl. @siccovansas and I discussed the following solution:
removed_from_rechtspraak_at
on the Person model that indicates when the person was removed from the Namenlijst.removed_from_rechtspraak_at
is set), we shouldn't include the person by default in the search api. If a query param is set, then former judges can be included.removed_from_rechtspraak_at
should be set if the namenlijst web page returns an invalid status code.removed_from_rechtspraak_at
is set, but the page becomes accessible again during a future scrape,removed_from_rechtspraak_at
should be reset to None. Persons that haveremoved_from_rechtspraak_at
set should be rescraped periodically.The text was updated successfully, but these errors were encountered: