-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored retrieval of last API call timestamps to improve performance. #156
base: master
Are you sure you want to change the base?
Conversation
- Changed from using an aggregation pipeline on `get_timeseries_db` to a loop fetching data from `get_profile_db`. - New approach iterates over `uuid_list`, fetching profile data and extracting `last_call_ts` for each user. - Simplifies logic, avoids heavy aggregation, and reduces database load. Results in a significant performance improvement.
@TeachMeTW I would like some more technical detail in the benefits and impact. I am not a general audience 😄 Can you please clarify:
|
@shankari Here's a breakdown of the clarification you requested:
|
OldScreen.Recording.2025-01-02.at.10.18.22.AM.movNewScreen.Recording.2025-01-02.at.10.19.27.AM.movTakeawaysAs we can see, it is much faster. I would just need to test my hypothesis above in where loading a specific uuid and updating it would classify as an active user. |
@TeachMeTW did you look through the code to check if there are any other sites that are making direct DB calls, or calls to the timeseries for profile data? When we finish a particular performance improvement, I want to get it done fully and not over multiple weeks. Each improvement is already carefully scoped to be small and self-contained. |
Hypothesis verified. First I ran Next I ran Lastly, I ran
to simulate an API call. |
In the home page, I only found one Mongo query which was the active_users card which this pr addresses. Previously in
However, in not_excluded_uuid_query = {'user_id': {'$nin': [UUID(uuid) for uuid in excluded_uuids]}} In
|
…get. Reduced the bloat of operations by leaving only one for loop that does the same thing.
Description
This PR refactors the process for retrieving the last API call timestamps to significantly improve performance. The previous implementation relied on a complex aggregation pipeline, which has been replaced with a simpler, iterative approach using profile db like previous enhancements.
Benefits
Impact
The changes result in faster execution and lower computational costs, particularly for scenarios with large user datasets like openaccess.
Testing
Active Users
Points of concern