Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats are missing from the homepage #10318

Closed
RayBB opened this issue Jan 12, 2025 · 17 comments · Fixed by #10353
Closed

Stats are missing from the homepage #10318

RayBB opened this issue Jan 12, 2025 · 17 comments · Fixed by #10353
Assignees
Labels
Lead: @mekarpeles Issues overseen by Mek (Staff: Program Lead) [managed] Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] Priority: 4 An issue, but should be worked on when no other pressing work can be done. [managed] Type: Bug Something isn't working. [managed]

Comments

@RayBB
Copy link
Collaborator

RayBB commented Jan 12, 2025

Problem

Originally posted in Slack here

The stats about users/edits/etc are missing from the homepage.

Could be related to #10179

Reproducing the bug

  1. Go to ...
  2. Do ...
  • Expected behavior:
  • Actual behavior:

Context

  • Browser (Chrome, Safari, Firefox, etc):
  • OS (Windows, Mac, etc):
  • Logged in (Y/N):
  • Environment (prod, dev, local): prod

Breakdown

Requirements Checklist

  • [ ]

Related files

Stakeholders


Instructions for Contributors

  • Please run these commands to ensure your repository is up to date before creating a new branch to work on this issue and each time after pushing code to Github, because the pre-commit bot may add commits to your PRs upstream.
@RayBB RayBB added Type: Bug Something isn't working. [managed] Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Needs: Lead labels Jan 12, 2025
@mekarpeles
Copy link
Member

mekarpeles commented Jan 12, 2025

It may need a recache 🤔 or it may be related to #10206 (crons for stats seem to be running both on ol-home0 baremetal and cron container...)

@mekarpeles mekarpeles added Priority: 4 An issue, but should be worked on when no other pressing work can be done. [managed] Lead: @mekarpeles Issues overseen by Mek (Staff: Program Lead) [managed] and removed Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Needs: Lead labels Jan 12, 2025
@techy4shri
Copy link
Contributor

techy4shri commented Jan 14, 2025

Is it different on the deployed website? Because I am running the local development (with full setup)and can see the stats:
image
Unless this is about some other stats?

@RayBB
Copy link
Collaborator Author

RayBB commented Jan 14, 2025

Stats like those should be showing on the openlibrary.org site not just local host

@techy4shri
Copy link
Contributor

I mean, I see those are under templates folder, and the inspect shows stats for local and then it doesn't for live so perhaps that could be a reason? Still looking at folders and files though.

@cdrini
Copy link
Collaborator

cdrini commented Jan 14, 2025

Relevant error in sentry:

AttributeError: 'ThreadedDict' object has no attribute 'features'

image

@schu96
Copy link
Contributor

schu96 commented Jan 14, 2025

Hm that's interesting, I did not expect that to be the source of the issue.

@RayBB @cdrini Could I be assigned this issue? The fix that I have off the top of my head would be something along the lines of

if "features" in web.ctx:
  stats = admin.get_stats(use_mock_data=("dev" in web.ctx.features))
else: 
  stats = admin.get_stats()

@RayBB
Copy link
Collaborator Author

RayBB commented Jan 15, 2025

@schu96 assigned

@RayBB
Copy link
Collaborator Author

RayBB commented Jan 15, 2025

They now seem to be missing for me on localhost too

@schu96
Copy link
Contributor

schu96 commented Jan 16, 2025

@RayBB Are they still missing for you on localhost? I've had that happen to me after I left my dev environment on overnight.

It seems like there are other parts across the codebase that are seeing ThreadDict attribute issues for web.ctx.site and web.ctx.features (#10341). I think it would be best to wait and see how that issue is resolved since it could fix this homepage stat issue as well

@RayBB
Copy link
Collaborator Author

RayBB commented Jan 16, 2025

Still missing. This might be a tricky one

@schu96
Copy link
Contributor

schu96 commented Jan 16, 2025

I just ran into this issue now after booting up docker and letting it sit for ~30 minutes. Restarting the memcached container brought it back though, which ties closely to Drini's comment about caching being the potential culprit

@github-actions github-actions bot added the Needs: Response Issues which require feedback from lead label Jan 17, 2025
@mekarpeles
Copy link
Member

mekarpeles commented Jan 17, 2025

This (stats, blog posts) is a general problem that is making our sentry monitoring extremely noisy:

As is kenly noted, here

# Because of caching, memcache will call `get_homepage` on another thread! So we
# need a way to carry some information to that computation on the other thread.
# We do that by using a python closure. The outer function is executed on the main
# thread, so all the web.* stuff is correct. The inner function is executed on the
# other thread, so all the web.* stuff will be dummy.
def caching_prethread():
# web.ctx.lang is undefined on the new thread, so need to transfer it over
lang = web.ctx.lang
def main():
# Leaving this in since this is a bit strange, but you can see it clearly
# in action with this debug line:
# web.debug(f'XXXXXXXXXXX web.ctx.lang={web.ctx.get("lang")}; {lang=}')
delegate.fakeload()
web.ctx.lang = lang
return main
:

We have defined many functions (that we try to cache) that expect web.ctx to be accessible but won't be when passed to cache to be run on another thread.

What is happening

In the first request (where no key exists in cache), the initial computation (I believe) is performed within the context of the thread we're on which contains web.ctx. However, memcache then takes this function and stores it away so it can be re-run once the timeout is reached and when (i.e. every time) this recompute is performed, we'll hit an AttributeError that web.ctx doesn't exist within the version of this function forwarded to memcache's thread(s).

It's tempting to say we can fix this by using our delegate.fakeload() pattern, which basically initializes a dummy web.ctx variable. However, this doesn't work if the function we need cached relies on an actual value of web.ctx.

Solution(s)

  1. Prevent memcache from recomputing: We can ask cache to save the value in memcached for functions which require web.ctx until timeout but not recompute them (using memcache threads). Instead, when the timeout is hit, we destroy the value and the function.

  2. Use closures to give threads access to relevant web.ctx vars: Another option is if we can create e.g. a closure (similar to @cdrini's prethreading pattern), such that the necessary web.ctx variables or functions are retained and available within the memcached threads. There are some function that we want to cache that wrap functions which, when run uncached, definitely need web.ctx variables (e.g. web.ctx.site for making db calls) and so there may be cases where delegate.fakeload() may be necessary to initialize a web.ctx object and then we'd hydrate it using the values available to us through the closure(s).

@mekarpeles
Copy link
Member

mekarpeles commented Jan 17, 2025

This specific example is happening 45k times a week (one of our highest volume errors):
In

def get_homepage():
try:
stats = admin.get_stats(use_mock_data=("dev" in web.ctx.features))
except Exception:
logger.error("Error in getting stats", exc_info=True)
stats = None
blog_posts = get_blog_feeds()
# render template should be setting ctx.cssfile
# but because get_homepage is cached, this doesn't happen
# during subsequent called
page = render_template("home/index", stats=stats, blog_posts=blog_posts)
# Convert to a dict so it can be cached
return dict(page)
, we can possible resolve the problem with the following work-around:

def get_homepage(dev=False):  # <-- now takes `dev`
    try:
        stats = admin.get_stats(use_mock_data=dev)
    except Exception:
        logger.error("Error in getting stats", exc_info=True)
        stats = None
    blog_posts = get_blog_feeds()
    ...

def get_cached_homepage():
    mc = cache.memcache_memoize(
        get_homepage, key, timeout=five_minutes, prethread=caching_prethread()
    )
    dev = "dev" in web.ctx.features
    page = mc(dev=dev)  # now the cached function has the answer, doesn't need web.ctx

@mekarpeles
Copy link
Member

mekarpeles commented Jan 17, 2025

@schu96 apologies if I'm stepping on your feet -- given how often this error is occurring (introducing noise into other performance issues we're trying to investigate), I'd like to quickly test this fix as a patch-deploy (to see if works as a pattern for reducing sentry errors) in our prod environment (and open a draft PR if that's alright)

mekarpeles added a commit that referenced this issue Jan 17, 2025
Moves `("dev" in web.ctx.features)` check to a `devmode` function param so `web.ctx` is no longer needed within our cached function.

While the initial computer of the `get_homepage()` function works fine (because `web.ctx` is in scope) the `web.ctx` `ThreadedDict` may no longer be accessible once passed to a memcache thread for it's subsequent re-compute after timeout hit)
@mekarpeles
Copy link
Member

mekarpeles commented Jan 17, 2025

Image

With #10353 deployed, stats appear on testing.openlibrary.org for staff

mekarpeles added a commit that referenced this issue Jan 17, 2025
* move web.ctx check outside of cached function #10318

Moves `("dev" in web.ctx.features)` check to a `devmode` function param so `web.ctx` is no longer needed within our cached function.

While the initial computer of the `get_homepage()` function works fine (because `web.ctx` is in scope) the `web.ctx` `ThreadedDict` may no longer be accessible once passed to a memcache thread for it's subsequent re-compute after timeout hit)

---------

Co-authored-by: Drini Cami <[email protected]>
@schu96
Copy link
Contributor

schu96 commented Jan 17, 2025

@schu96 apologies if I'm stepping on your feet -- given how often this error is occurring (introducing noise into other performance issues we're trying to investigate), I'd like to quickly test this fix as a patch-deploy (to see if works as a pattern for reducing sentry errors) in our prod environment (and open a draft PR if that's alright)

No worries, I'm glad to see that the fix was relatively straightforward!

@mekarpeles mekarpeles removed the Needs: Response Issues which require feedback from lead label Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Lead: @mekarpeles Issues overseen by Mek (Staff: Program Lead) [managed] Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] Priority: 4 An issue, but should be worked on when no other pressing work can be done. [managed] Type: Bug Something isn't working. [managed]
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants