Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate production ThreadDict AttributeErrors #10341

Open
jimchamp opened this issue Jan 15, 2025 · 3 comments
Open

Investigate production ThreadDict AttributeErrors #10341

jimchamp opened this issue Jan 15, 2025 · 3 comments
Labels
Lead: @jimchamp Issues overseen by Jim (Front-end Lead, BookNotes) [managed] Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] Needs: Investigation This issue/PR needs a root-cause analysis to determine a solution. [managed] Needs: Staff / Internal Reviewed a PR but don't have merge powers? Use this. Priority: 2 Important, as time permits. [managed] Type: Bug Something isn't working. [managed]

Comments

@jimchamp
Copy link
Collaborator

Problem

Sentry has logged a large number of AttributeErrors, with messages like:

  • 'ThreadedDict' object has no attribute 'site' (examples)
  • 'ThreadedDict' object has no attribute 'features'(examples: 1 2 3)

These occur when we try to access values in web.ctx (like web.ctx.site or web.ctx.features) that are seemingly not set.

Reproducing the bug

  1. Go to ...
  2. Do ...
  • Expected behavior:
  • Actual behavior:

Context

  • Browser (Chrome, Safari, Firefox, etc):
  • OS (Windows, Mac, etc):
  • Logged in (Y/N):
  • Environment (prod, dev, local): prod

Breakdown

It seems that their are a few classes of ThreadedDict AttributeErrors, each happening in different code paths. I think the first step should be to gather more information about each class of error, then determine how to proceed. There may be a general solution that gracefully handles all such errors, or we may need to fix each of these separately.

Requirements Checklist

  • [ ]

Related files

Stakeholders


Instructions for Contributors

  • Please run these commands to ensure your repository is up to date before creating a new branch to work on this issue and each time after pushing code to Github, because the pre-commit bot may add commits to your PRs upstream.
@jimchamp jimchamp added Type: Bug Something isn't working. [managed] Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Needs: Lead Needs: Staff / Internal Reviewed a PR but don't have merge powers? Use this. Needs: Investigation This issue/PR needs a root-cause analysis to determine a solution. [managed] labels Jan 15, 2025
@jimchamp
Copy link
Collaborator Author

jimchamp commented Jan 15, 2025

Here are some details about these errors:

Accessed Value Events Details
web.ctx.site errors Occurs in our get_by_username() calls, here
web.ctx.features errors Seems to occur whenever the Infogami path_processor tries raises a non-exception app.notfound object. Suspect that this may be happening here under certain conditions. More research needed.
web.ctx.features errors Occurs when we fetch the stats for the homepage, here.
web.ctx.features errors Seems to occur when a features.get_enabled() call occurs as the interpreter is shutting down.

@cdrini
Copy link
Collaborator

cdrini commented Jan 15, 2025

Hmm, my hypothesis is that this might be something to do with caching. Our cache generally repopulates in the background on a separate thread, I believe. E.g. the homepage will be cached for 5 minutes, and if someone hits the homepage when the cached copy has expired, they will be served the old copy, and a thread will spin up to render/cache the updated homepage. When this happens, the version of the web framework running on the other thread is kind of a dummy version.

We try to "bake" in any user-specific info we can (eg things like web.ctx.lang) into the cache key to avoid cache collisions on things that affect rendering, and copy it from web.ctx so that it gets put in the right spot. This is done e.g. by the caching_prethread . It stores lang in the key, and then also copies the lang into the new thread.

With the rest of the web.ctx globals, I think that's what delegate.fake_load() tries to set up in a new thread -- in a sort of simplified way, as the name implies. This is the method that should be setting up ctx.features and ctx.site .

But then also note that anywhere we use a cache, we should also make sure to call the fake_load() to get globals set up for that specific off-the-main-thread cache.

I believe that's how all this is wired up, but this code is rather old, so it might be doing things I haven't foreseen!

@cdrini
Copy link
Collaborator

cdrini commented Jan 15, 2025

Oh also note this I think specifically applies to cache.memcache_memoize , not all our implementations of caching/memcache/memoizing (of which there are a few in the codebase 😅 )

@mekarpeles mekarpeles added Lead: @jimchamp Issues overseen by Jim (Front-end Lead, BookNotes) [managed] Priority: 2 Important, as time permits. [managed] and removed Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Needs: Lead labels Jan 16, 2025
@mekarpeles mekarpeles added this to the Sprint 2025-02 milestone Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Lead: @jimchamp Issues overseen by Jim (Front-end Lead, BookNotes) [managed] Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] Needs: Investigation This issue/PR needs a root-cause analysis to determine a solution. [managed] Needs: Staff / Internal Reviewed a PR but don't have merge powers? Use this. Priority: 2 Important, as time permits. [managed] Type: Bug Something isn't working. [managed]
Projects
None yet
Development

No branches or pull requests

3 participants