What should we do with former judges? #34

Jonakemon · 2022-05-06T13:33:37Z

Former judges are judges whose profile is not available anymore on https://namenlijst.rechtspraak.nl. @siccovansas and I discussed the following solution:

Datetime column called removed_from_rechtspraak_at on the Person model that indicates when the person was removed from the Namenlijst.
If a Person is removed (removed_from_rechtspraak_at is set), we shouldn't include the person by default in the search api. If a query param is set, then former judges can be included.
In the person enrich scraper, removed_from_rechtspraak_at should be set if the namenlijst web page returns an invalid status code.
If removed_from_rechtspraak_at is set, but the page becomes accessible again during a future scrape, removed_from_rechtspraak_at should be reset to None. Persons that have removed_from_rechtspraak_at set should be rescraped periodically.
In Searchfield text update #23 a toggle should be added to be able to filter on former judges.

The text was updated successfully, but these errors were encountered:

ThijmenDam · 2023-05-17T17:55:57Z

On topic: I'd prefer a combination of the first two points, i.e. you give the ability to include former judges via a query parameter, and if you do so, the datetime column provides information on when exactly the person was removed.

Slightly off-topic:

Does the 'former_judges' parameter still work as intended in your latest API release? I cannot seem to find any differences in the API response if I make a request with 'true' or 'false'.

To illustrate, I created a Python function to request all judges at this moment in time using your API. Since I'm experimenting with the data, I figured it would be better if I request all judges once and then store them locally.

This is my code:

def query(search_query: str, limit: int = 100, page: int = 0) -> OpenRechtspraakResponse:

    res = requests.get("http://openrechtspraak.nl/api/v1/person", params={
        'offset': limit * page,
        'limit': limit,
        'former_judges': 'true',   <--- the parameter
        'q': search_query
    })

    res.raise_for_status()
    json: OpenRechtspraakResponse = res.json()

    return json


def all_judges():
    _all_judges: list[Judge] = []
    judge_count = query('%')['count']

    # [0, 1, 2, ..., 45]
    pages_to_fetch = [limit // 100 for limit in range(0, judge_count, 100)]

    for page in pages_to_fetch:
        subset = query('%', 100, page)
        for judge in subset['data']:
            _all_judges.append(judge)

    return _all_judges

In the code snippet the parameter is set to 'true', though if I set it to 'false' I get the same count of judges in the response.

Jonakemon · 2023-05-21T19:04:17Z

That's weird. I tried it myself, and I did get a different count. Additionally, we've got multiple unit tests for this query parameter that test this.

My hypothesis is that there is an ordering issue: we order by last name, but there are several judges with the same last name. In my script below, you'll see that I increment the offset by 75 instead of the limit. This prevents the ordering bug from happening. I checked whether your script has a bug by changing _all_judges to a set, adding the id of each judge to the set and printing the length of the set after every page. If you do this, you'll see an unexpected length after a few pages: 100, 200, 300, 400, 497, 596, 695, ....

I'll open up a PR to apply a definitive fix (order by last name AND id). I hope this will be deployed next week. I'll ping you once it's up; could you then check whether this fixes your problem?

My script:

import requests

offset = 0
limit = 100
ids = set()

while True:
    r = requests.get("https://openrechtspraak.nl/api/v1/person",
                     params={"limit": limit, "offset": offset, "former_judges": "true"})
    r.raise_for_status()
    data = r.json().get("data")

    if len(data) == 0:
        break

    ids |= {i.get("id") for i in data}
    print(len(ids))
    offset += 75

print(ids, len(ids))

Jonakemon · 2023-05-21T19:15:54Z

I've included the field removed_from_rechtspraak_at in the API response. Will hopefully also be deployed next week.

ThijmenDam · 2023-05-22T18:53:36Z

I'll open up a PR to apply a definitive fix (order by last name AND id). I hope this will be deployed next week. I'll ping you once it's up; could you then check whether this fixes your problem?

For sure. Looking forward to the ping!

Jonakemon · 2023-05-23T11:23:32Z

@ThijmenDam this fix is now online. Can you verify whether this fixes your problem?

ThijmenDam · 2023-05-24T07:36:03Z

@Jonakemon yes will do! I'll let you know as soon as I have time, probably tonight or tomorrow night.

ThijmenDam · 2023-05-26T22:12:33Z

Hi @Jonakemon! Thanks for the effort you put in this week.

Using the code snippet I posted above, these are my results when fetching all judges:

Parameter former_judges set to true: fetched 4591 judges.
Parameter former_judges set to false: fetched 4591 judges.

Thus, the parameter does not seem to make a difference.

Within the set of 4591 judges, 24 judges now contain the new property removed_from_rechtspraak_at. That's great :-)

Random example:

 {
  'first_name': None,
  'gender': 'male',
  'id': '56d1821d-1a2f-491b-81c0-6d0ea89c287b',
  'initials': 'K.',
  'last_name': 'Meijde',
  'professional_details': [{'function': 'Raadsheer',
    'id': 'af0e7be2-7631-442e-9b40-2370312a936b',
    'organisation': "Gerechtshof 's-Hertogenbosch"}],
  'rechtspraak_id': 'uluiy2DjE4JH29e9O7nH5rxmzOsyxQIz',
  'removed_from_rechtspraak_at': '2023-05-26T01:53:05.230291',
  'titles': 'dhr. mr.',
  'toon_naam': 'dhr. mr. K. van der Meijde',
  'toon_naam_kort': 'K. van der Meijde'
}

However, despite the removed_from_rechtspraak_at property being set for this particular employee, it seems that he is still part of the namenlijst: https://namenlijst.rechtspraak.nl/#!/details/Y1ALw9lal-qFDMwDZ8oN_bxmzOsyxQIz

Jonakemon · 2023-05-27T12:42:22Z

Hi @ThijmenDam, using my own code snippet, I do get a difference in judges: 4591 vs 4573 (that's 18, not 24). I did notice that we broke the count field which causes your script to be confused. This will be deployed somewhere next week. I also added quite some extra tests to prevent something like this from breaking again.

Regarding finding the example you posted, I don't think this judge is removed. When I query the API, I don't get the same result: https://openrechtspraak.nl/api/v1/person?q=Meijde.

When exactly did you query the API? We rescrape the dataset every night. You might have queried the API at the same time when we did the rescrape. It's an edge case, but I can't think of another reason why that could have happened.

Jonakemon · 2023-05-30T20:19:30Z

@ThijmenDam this is now (definitively) fixed! 🎉 If you visit the API with and without query parameter, you'll also see a difference in count!

Please open up a new issue if it's not resolved or if you encounter other bugs. Thanks for helping us out by reporting it! :)

Jonakemon added a commit that referenced this issue May 21, 2023

Fixes ordering bug described in issue #34

cb166d1

Jonakemon mentioned this issue May 21, 2023

Quality of life improvements and bug fixes #56

Merged

siccovansas closed this as completed in #56 May 23, 2023

Jonakemon mentioned this issue May 27, 2023

Extend test suite and fix uuid related error #60

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What should we do with former judges? #34

What should we do with former judges? #34

Jonakemon commented May 6, 2022

ThijmenDam commented May 17, 2023 •

edited

Loading

Jonakemon commented May 21, 2023

Jonakemon commented May 21, 2023

ThijmenDam commented May 22, 2023

Jonakemon commented May 23, 2023

ThijmenDam commented May 24, 2023

ThijmenDam commented May 26, 2023 •

edited

Loading

Jonakemon commented May 27, 2023

Jonakemon commented May 30, 2023

What should we do with former judges? #34

What should we do with former judges? #34

Comments

Jonakemon commented May 6, 2022

ThijmenDam commented May 17, 2023 • edited Loading

Jonakemon commented May 21, 2023

Jonakemon commented May 21, 2023

ThijmenDam commented May 22, 2023

Jonakemon commented May 23, 2023

ThijmenDam commented May 24, 2023

ThijmenDam commented May 26, 2023 • edited Loading

Jonakemon commented May 27, 2023

Jonakemon commented May 30, 2023

ThijmenDam commented May 17, 2023 •

edited

Loading

ThijmenDam commented May 26, 2023 •

edited

Loading