Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add doc to explain multithreading #1154

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

jokester
Copy link

@jokester jokester commented Sep 17, 2024

Warning for interested

This PR likely requires major change before merged, if it ever will. The patterns mentioned should mostly work for current Streamlit, but the official team has the decide to how to introduce them.

📚 Context

Surprising many people are confused by multithreading in Streamlit, like in #87 / streamlit/streamlit#1326 / streamlit/streamlit#8490 .

This PR tries to explain the underneath Streamlit-specific things, and provide multithreading patterns to start with.

🧠 Description of Changes

  • Add a page to explain Streamlit-created and user-created threads, and how to use them together.

Revised:

see the new file

Current:

(I didn't find document explaining threads and ScriptRunContext, except in the code)

💥 Impact

Size:

  • Small
  • Not small

🌐 References

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

@jokester jokester changed the title add doc to explain non-stramlit threads add doc to explain multithreading Sep 18, 2024
@jokester jokester marked this pull request as ready for review September 18, 2024 14:15
@jokester
Copy link
Author

I'm happy to hear feedback about content or format 🙏

@jokester
Copy link
Author

CI fails but I can't find the error message. The links like https://app.netlify.com/sites/streamlit-docs/deploys/66eaa4a1b1871d000843e854 only says "A fatal rendering error has occurred"

@sfc-gh-dmatthews
Copy link
Contributor

Hi @jokester. Thanks for contributing. The site won't build and the page won't be viewable without correctly updating site navigation. Check out the readme file for docs, specifically Add pages to the menu.

I'm going over items for the next release right now, so it might take me a few weeks to look at this more closely. Multithreading has been on my list for a while, so thanks for taking a stab at it. I'll circle back here when I get the next release fleshed out. 😄


### 2. Expose context object to custom thread

Alternatively, one can let a custom thread have access to the `ScriptRunContext` attached to ScriptThread. This pattern is also used by Streamlit standard widgets like [st.spinner](https://github.com/streamlit/streamlit/blob/develop/lib/streamlit/elements/spinner.py).
Copy link
Author

@jokester jokester Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I saw people are already going this way (in various GH issues), I'm not really sure about this pattern

  1. it exposes internal object to page writers

  2. it is less guaranteed. I don't know enough to say the probability where a ScriptRunContext suffices. The other pattern just looks "safer" to me because it assumes less from Streamlit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Officially, adding your own threads isn't supported. We don't want to include unsupported patterns in the main concepts area. This will likely go out as a Knowledge Base (KB) article. There is a subset of this information that can live in concepts, but it will need to be very carefully separated. For now, we can move this page to the KB so that we can get it published faster and I'll likely follow up with moving some of it back into concepts. (I still haven't read through any of it yet, so I still expect a few weeks before I get to this. Just a heads up about what will likely happen.)

In the longer run, the plan is to properly support multithreading and async tasks, but that work isn't currently on the schedule this quarter or next, so it wouldn't likely happen until next year at the earliest.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I don't really want to promote the hack either.

When you come back, feel free to change or move things or ask me to do so 👍🏽

@jokester
Copy link
Author

@sfc-gh-dmatthews
Copy link
Contributor

Sorry for the delay. I've been working on this this week. I think instead of the pseudo code to explain the threading structure, an illustration might be better. I have a much more detailed illustration I want to put into a contributing wiki-type document, but for this guide, I want to make sure it's as accessible to Python developers as possible (who might not ever touch a Tornado server to deal with WebSockets). I'm currently thinking of something like this to just schematically show script threads tied to script runs and that they have to communicate both to the front end and the server.

image

I should have the PR updated with a first pass by early next week. 😄


## `streamlit.errors.NoSessionContext`

Many Streamlit commands, including `st.session_state`, expect to be called from a script thread. When Streamlit is running as expected, such commands use the `ScriptRunContext` attached to the script thread to ensure they work within the intended session and update the correct user's view. When those Streamlit commands can't find any `ScriptRunContext`, they raise a `streamlit.errors.NoSessionContext` exception. Depending on your logger settings, you may also see a console message identifying a thread by name and warning, "missing ScriptRunContext!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this in fact be "All Streamlit commands?" @lukasmasuch

Copy link
Collaborator

@lukasmasuch lukasmasuch Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not 100% sure. I believe a lot of commands will not raise a NoSessionContext if there is no ScriptRunContext. This is to support bare execution (execute a Streamlit app script with pure python). But calling these commands with the context won't do anything. But I believe all Streamlit commands require a ScriptRunContext to be fully functional.

<Warning>

- This is not officially supported and may change in a future version of Streamlit.
- This may not work with all Streamlit commands.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know exhaustively which commands these might be, or should this just be a generic warning to cover the possibilities?

Comment on lines +159 to +174
delays = [5, 4, 3, 2, 1]
result_containers = []
for i, delay in enumerate(delays):
st.header(f"Thread {i}")
result_containers.append(st.container())

threads = [
WorkerThread(delay, container)
for delay, container in zip(delays, result_containers)
]
for thread in threads:
add_script_run_ctx(thread, get_script_run_ctx())
thread.start()

for thread in threads:
thread.join()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Commenting for engineering review later in the week when people are back from the holidays):

Should we be storing the threads in Session State and running a check for threads at the top of the script? And/or disabling widgets when threading? Adding try-except to the Streamlit commands in the threads? Although the script works like this, if a user reruns the app before the page finishes loading, that'd be an issue. Hence this is very fragile, right?


```python
import streamlit as st
from streamlit.runtime.scriptrunner import add_script_run_ctx, get_script_run_ctx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Before we release this, we might want to double-check if it might be better to expose get_script_run_ctx and add_script_run_ctx in a slightly less internal namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's any hope of doing so relatively quickly/easily, that'd be great!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's relatively quickly/easily. Probably needs some discussion with the product (cc @jrieke). I think in the best case, we can semi-officially expose it to a stable namespace (e.g., streamlit.multithreading) and also put some metrics tracking on these methods. But that probably takes a bit of decision time. It is probably fine in the meantime to add a warning that these methods are purely internal and can change/break with any version update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have a warning included in the section: https://deploy-preview-1154--streamlit-docs.netlify.app/develop/concepts/design/multithreading#option-2-expose-scriptruncontext-to-the-thread

I'll bring this up in office hours to see if there are any other concerns before publishing. My biggest question is if I should include a little more careful handling of the exposed thread context. The warning states that custom threads should not outlive the script thread from whence they came, but the example doesn't actually do enough to prevent that since it does nothing to handle an interrupted script run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah definitely needs a bit of discussion and thought about how we do that API. I know Joshua wanted to do it but I never really deeply looked into multithreading so far, so it doesn't really make sense to do something ad-hoc right now. I think it's fine mentioning the internal API in a guide but we should definitely put in a disclaimer that it's an internal, unstable API and will change in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants