-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable the LSP on crash #679
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See slack comments, hopefully we can talk about it in the morning before merging
async fn request(&self, request: LspRequest) -> RequestResponse { | ||
if LSP_HAS_CRASHED.load(Ordering::Acquire) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think fn notify(
should probably know about LSP_HAS_CRASHED
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to touch as few things as possible and it seemed fine not to change notify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've decided to do
if LSP_HAS_CRASHED {
return;
}
in notify()
as well to be safe, and to not relay the notification to the main loop when it might be in a bad state (after a crash)
Co-authored-by: Davis Vaughan <[email protected]>
Davis pointed out that the To be on the safe side I decided to keep the crash flag that disables request handlers because our internal notification races with incoming messages from the client so it's possible the main loop will tick again after we detect a crash. Also I realised we already have the infrastructure to show notifications via Jupyter so I now do that. The downside is that this requires us to go through an |
@@ -334,6 +334,7 @@ pub(crate) fn handle_statement_range( | |||
params: StatementRangeParams, | |||
state: &WorldState, | |||
) -> anyhow::Result<Option<StatementRangeResponse>> { | |||
panic!("oh no"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this 😬
/// Tower-LSP's client | ||
client: Client, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove client
from here again, no longer needed?
// Once the LSP has crashed all requests respond with an error. This is a | ||
// workaround, ideally we'd properly shut down the LSP from the server. The | ||
// `Disabled` enum variant is an indicator of this state. We could have just | ||
// created an anyhow error passed through the `Result` variant but that would | ||
// flood the LSP logs with irrelevant backtraces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe update this message to mention that we do force a shutdown? AFAICT this is us being defensive in case there are any other things happening after we've shutdown that will no longer be able to communicate with the client
@@ -122,18 +168,26 @@ pub(crate) enum LspResponse { | |||
|
|||
#[derive(Debug)] | |||
struct Backend { | |||
shutdown_tx: tokio::sync::mpsc::Sender<()>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment on what this is for would be nice
let init = |client: Client| { | ||
let state = GlobalState::new(client); | ||
let state = GlobalState::new(client.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let state = GlobalState::new(client.clone()); | |
let state = GlobalState::new(client); |
When you remove client
from Backend
can also go back to this
@@ -397,17 +399,47 @@ impl GlobalState { | |||
/// * - `response`: The response wrapped in a `anyhow::Result`. Errors are logged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comment here
into_lsp_response: impl FnOnce(T) -> LspResponse, | ||
) -> anyhow::Result<()> { | ||
let mut crashed = false; | ||
|
||
let response = std::panic::catch_unwind(std::panic::AssertUnwindSafe(response)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely certain why we can AssertUnwindSafe
here, maybe a comment?
The goal of this PR is to prevent the LSP from ever crashing the R session and lose the user state. I've moved all LSP request handlers behind a
catch_unwind()
, basically atry
for panics. When an LSP handler panics, we detect it, report it, and flip our state to crashed. Once crashed, all LSP handlers respond with an error. This causes some chatter in the LSP logs but I made sure these errors do not carry a backtrace to avoid flooding the logs.Ideally we'd shut down the LSP entirely and forcefully disconnect from the client. Unfortunately that scenario of a server-initiated shutdown is not supported by the LSP protocol and tower-lsp does not give us the tools to do this.
Alternatively we could send a notification to the client that the LSP has crashed. The client could then initiate a shutdown. I chose not to go that route because to avoid having to deal with synchronisation issues and having to make changes to both the client and the server.
For context, this is a temporary workaround. Once the LSP lives in Air a crash will never be a big deal for the user. Most of the time they will not be aware of it since VS Code / Positron silently restarts crashed servers (unless they crash too many times in a short period, in which case the user is notified and the server is no longer restarted).
Here is a screencast of what happens when the LSP crashes:
Screen.Recording.2025-01-28.at.10.52.34.mov
The user is notified of the crash and requested to send a report with the logs.
Note that the relevant backtrace is sent by our panic hook to the kernel logs rather than the LSP logs. The backtrace in the LSP logs is unlikely to be helpful.