Disable the LSP on crash #679

lionel- · 2025-01-28T10:21:54Z

The goal of this PR is to prevent the LSP from ever crashing the R session and lose the user state. I've moved all LSP request handlers behind a catch_unwind(), basically a try for panics. When an LSP handler panics, we detect it, report it, and flip our state to crashed. Once crashed, all LSP handlers respond with an error. This causes some chatter in the LSP logs but I made sure these errors do not carry a backtrace to avoid flooding the logs.

Ideally we'd shut down the LSP entirely and forcefully disconnect from the client. Unfortunately that scenario of a server-initiated shutdown is not supported by the LSP protocol and tower-lsp does not give us the tools to do this.

Alternatively we could send a notification to the client that the LSP has crashed. The client could then initiate a shutdown. I chose not to go that route because to avoid having to deal with synchronisation issues and having to make changes to both the client and the server.

For context, this is a temporary workaround. Once the LSP lives in Air a crash will never be a big deal for the user. Most of the time they will not be aware of it since VS Code / Positron silently restarts crashed servers (unless they crash too many times in a short period, in which case the user is notified and the server is no longer restarted).

Here is a screencast of what happens when the LSP crashes:

Screen.Recording.2025-01-28.at.10.52.34.mov

The user is notified of the crash and requested to send a report with the logs.

Note that the relevant backtrace is sent by our panic hook to the kernel logs rather than the LSP logs. The backtrace in the LSP logs is unlikely to be helpful.

DavisVaughan

See slack comments, hopefully we can talk about it in the morning before merging

DavisVaughan · 2025-01-28T20:07:17Z

crates/ark/src/lsp/backend.rs

+    async fn request(&self, request: LspRequest) -> RequestResponse {
+        if LSP_HAS_CRASHED.load(Ordering::Acquire) {


I think fn notify( should probably know about LSP_HAS_CRASHED too?

I chose to touch as few things as possible and it seemed fine not to change notify

We've decided to do

if LSP_HAS_CRASHED { return; }

in notify() as well to be safe, and to not relay the notification to the main loop when it might be in a bad state (after a crash)

crates/ark/src/lsp/backend.rs

Co-authored-by: Davis Vaughan <[email protected]>

lionel- · 2025-01-29T08:47:28Z

Davis pointed out that the serve().await is not really blocking (not sure how I missed that 😬) so we are able to fully shut down the LSP by waking up a select. That's nice because that removes a source of potential problems and that prevents any further log messages in the LSP output channel.

To be on the safe side I decided to keep the crash flag that disables request handlers because our internal notification races with incoming messages from the client so it's possible the main loop will tick again after we detect a crash.

Also I realised we already have the infrastructure to show notifications via Jupyter so I now do that. The downside is that this requires us to go through an r_task() to send the notification (it would be possible to avoid that but would require a non trivial amount of plumbing). The upside is that we leave the LSP out of this which seems safer since we are shutting down.

DavisVaughan · 2025-01-29T14:17:10Z

crates/ark/src/lsp/handlers.rs

@@ -334,6 +334,7 @@ pub(crate) fn handle_statement_range(
    params: StatementRangeParams,
    state: &WorldState,
 ) -> anyhow::Result<Option<StatementRangeResponse>> {
+    panic!("oh no");


Please remove this 😬

DavisVaughan · 2025-01-29T14:26:40Z

crates/ark/src/lsp/backend.rs

+    /// Tower-LSP's client
+    client: Client,


Remove client from here again, no longer needed?

DavisVaughan · 2025-01-29T14:28:16Z

crates/ark/src/lsp/backend.rs

+// Once the LSP has crashed all requests respond with an error. This is a
+// workaround, ideally we'd properly shut down the LSP from the server. The
+// `Disabled` enum variant is an indicator of this state. We could have just
+// created an anyhow error passed through the `Result` variant but that would
+// flood the LSP logs with irrelevant backtraces.


Maybe update this message to mention that we do force a shutdown? AFAICT this is us being defensive in case there are any other things happening after we've shutdown that will no longer be able to communicate with the client

DavisVaughan · 2025-01-29T14:30:11Z

crates/ark/src/lsp/backend.rs

@@ -122,18 +168,26 @@ pub(crate) enum LspResponse {

 #[derive(Debug)]
 struct Backend {
+    shutdown_tx: tokio::sync::mpsc::Sender<()>,


Comment on what this is for would be nice

DavisVaughan · 2025-01-29T14:31:55Z

crates/ark/src/lsp/backend.rs

        let init = |client: Client| {
-            let state = GlobalState::new(client);
+            let state = GlobalState::new(client.clone());


Suggested change

let state = GlobalState::new(client.clone());

let state = GlobalState::new(client);

When you remove client from Backend can also go back to this

DavisVaughan · 2025-01-29T14:32:56Z

crates/ark/src/lsp/main_loop.rs

@@ -397,17 +399,47 @@ impl GlobalState {
 /// * - `response`: The response wrapped in a `anyhow::Result`. Errors are logged.


Update comment here

DavisVaughan · 2025-01-29T14:33:27Z

crates/ark/src/lsp/main_loop.rs

    into_lsp_response: impl FnOnce(T) -> LspResponse,
 ) -> anyhow::Result<()> {
+    let mut crashed = false;
+
+    let response = std::panic::catch_unwind(std::panic::AssertUnwindSafe(response))


I'm not entirely certain why we can AssertUnwindSafe here, maybe a comment?

lionel- added 6 commits January 28, 2025 10:41

Disable LSP on panic

49c2510

Don't create backtraces when LSP has crashed

cac8039

Disable panic hook for LSP threads

2423699

Notify users of crashes

130d9b6

Unwrap message from panic info

56070a7

Allow Shutdown requests to go through

f8ed164

lionel- requested a review from DavisVaughan January 28, 2025 10:29

select! on shutdown_tx and server.serve() to really shutdown

7d96540

DavisVaughan reviewed Jan 28, 2025

View reviewed changes

lionel- and others added 3 commits January 29, 2025 08:57

Sleep to ensure notification shows up

bcd1810

Use Jupyter to send crash notification

1f499e2

Fix typo

0237e64

Co-authored-by: Davis Vaughan <[email protected]>

DavisVaughan reviewed Jan 29, 2025

View reviewed changes

DavisVaughan approved these changes Jan 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable the LSP on crash #679

Disable the LSP on crash #679

lionel- commented Jan 28, 2025

DavisVaughan left a comment

DavisVaughan Jan 28, 2025

lionel- Jan 29, 2025

DavisVaughan Jan 29, 2025

lionel- commented Jan 29, 2025

DavisVaughan Jan 29, 2025

DavisVaughan Jan 29, 2025 •

edited

Loading

DavisVaughan Jan 29, 2025

DavisVaughan Jan 29, 2025

DavisVaughan Jan 29, 2025

DavisVaughan Jan 29, 2025

DavisVaughan Jan 29, 2025

		async fn request(&self, request: LspRequest) -> RequestResponse {
		if LSP_HAS_CRASHED.load(Ordering::Acquire) {

	let state = GlobalState::new(client.clone());
	let state = GlobalState::new(client);

		@@ -397,17 +399,47 @@ impl GlobalState {
		/// * - `response`: The response wrapped in a `anyhow::Result`. Errors are logged.

Disable the LSP on crash #679

Are you sure you want to change the base?

Disable the LSP on crash #679

Conversation

lionel- commented Jan 28, 2025

DavisVaughan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lionel- commented Jan 29, 2025

Choose a reason for hiding this comment

DavisVaughan Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DavisVaughan Jan 29, 2025 •

edited

Loading