Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transurlvania is not thread safe #3

Open
djoume opened this issue Jan 20, 2011 · 10 comments
Open

transurlvania is not thread safe #3

djoume opened this issue Jan 20, 2011 · 10 comments

Comments

@djoume
Copy link

djoume commented Jan 20, 2011

On one of my site, while browsing the site in English, the user would sometimes end up on the French site after clicking on a link.
It turns out that some links on the page were pointing to the French site while others links were pointing to the English site (all links are using a simple reverse(urlname) and we're using language in path).

The site was configured to use multiple WSGI threads, switching to a single thread per process fixed the issue.

The following test illustrate the issue:

class MultilangThreadingTestCase(TestCase):
def test_urlresolver_is_thread_safe(self):
resolved_urls = {}
class URLTester(threading.Thread):
def init(self, id):
threading.Thread.init(self)
self.language = settings.LANGUAGES[id % 2][0]
self.id = id

        def run(self):
            time.sleep(0.05)
            translation.activate(self.language)
            clear_url_caches()
            time.sleep(0.05)
            actual_url = reverse(landing)
            should_be = '/%s/garfield/' % self.language
            resolved_urls[self.id] = (actual_url, should_be)

    t1 = URLTester(1)
    t2 = URLTester(2)
    t1.start()
    t2.start()
    t1.join()
    t2.join()
    for v in resolved_urls.values():
        self.assertEqual(v[0], v[1])

The test fails because in django the URL resolver cache is a global variable. The URLCacheResetMiddleware provided by transurlvania only solves the issue for non-concurrent requests (when only 1 thread is handling a request at a time).

The only fix I can think of involve patching django (making the resolver cache a thread local variable for instance) but I don't see how this could be justified to the django developers until transurlvania is part of core.

So in the meantime it might be worth mentioning it in the README :-)

@djoume
Copy link
Author

djoume commented Jan 20, 2011

Actually I might have a fix. django uses get_urlconf() in django.core.urlresolvers to get a thread specific urlconf name, the urlconf name to use for a given thread can also be set via set_urlconf().

What would you think of updating LangInPathMiddleware so that it will call:

urlresolvers.set_urlconf('urls_%s' % request.LANGUAGE_CODE)

This will require the user to create the following symlinks:

urls_en.py -> urls.py
urls_fr.py -> urls.py

Which is not very nice, maybe we can make this transparent for the user by playing with sys.modules.

I believe making this change will not only make transurlvania more thread safe but it will also improve the performance since we would not need the URLCacheResetMiddleware anymore.

@greatestape
Copy link
Contributor

I don't like the notion of requiring users to set up symlinks, but if there's a way to take care of this behind the scenes, this solution might be best.

@djoume
Copy link
Author

djoume commented Jan 20, 2011

http://code.djangoproject.com/changeset/15232

This change might means that we do not need to use different url name, I'll test when I get a chance.

@greatestape
Copy link
Contributor

I don't think that changeset relates to this issue. It looks like it's just refactoring the way urlresolver's existing thread-safety code is implemented. The relevant global variable for us is _resolver_cache, I think, and that's just a non-thread-safe global variable.

@greatestape
Copy link
Contributor

Looking at the Django code, I think one stopgap might be to use _resolver_cache to our advantage. Instead of clearing it after every request, inject an instance of MultilangRegexURLResolver into the cache for the project's ROOT_URLCONF module during initial module loading.

Then it will be our class's reverse method that gets called, and we can ensure that it does a lookup based on the current active language.

@djoume
Copy link
Author

djoume commented Jan 20, 2011

This code is making the global urlconf a thread local variable, that's why I think it could be relevant.

@greatestape
Copy link
Contributor

I think the global urlconf var is already thread-safe, but it doesn't have to do with the threading issue you've highlighted.

I think this is the problematic chain of events:

  • call reverse
  • reverse looks up the urlconf it should use. This is a string representing the location of the python module to be used as the root urlconf.
  • reverse calls get_resolver, passing in that urlconf
  • get_resolver constructs a RegexURLResolver object for the requested urlconf and returns it
  • get_resolver is wrapped in memoize so it only constructs the resolver for a given urlconf once, returning the value cached in _resolver_cache each time thereafter
  • the reset middleware calls clear_url_cache, emptying _resolver_cache

So, if two separate threads handle requests simultaneaously, one will set up the URLRegexResolver and the other one will just get the cached copy.

We could use the thread-safety of get_urlconf() to solve this if we could somehow use unique urlconf values for each language, but since the urlconf value is a path to a python module, we would either have to:

  • provide separate modules using symlinks as mentioned above
  • hack sys.module as mentioned above
  • or do something crazy where we subclass str so that we can have two objects that will act as different keys in the _resolver_cache dictionary but will act as the same string when used to load the actual urlconf module and build the resolver.

@sneilan
Copy link

sneilan commented May 12, 2011

Has anyone been able to resolve this issue without reducing the number of threads per apache process?

@sepow
Copy link

sepow commented Nov 8, 2011

Is this still an issue?

@greatestape
Copy link
Contributor

Yes, as far as I know. I haven't considered it a deal breaker since I'd always heard that Django wasn't guaranteed to be thread-safe anyway.

Patches welcome, of course. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants