Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tree-sitter approach to expand across semicolons #644

Conversation

DavisVaughan
Copy link
Contributor

@DavisVaughan DavisVaughan commented Nov 26, 2024

Alternative approach using tree-sitter rather than the R parser, so it will eventually be fairly easy to move to a standalone LSP

Joint work with @lionel-

Comment on lines +189 to +230
/// Assuming `node` is the first node on a line, `expand_across_semicolons()`
/// checks to see if there are any other non-comment nodes after `node` that
/// share its line number. If there are, that means the nodes are separated by
/// a `;`, and that we should expand the range to also include the node after
/// the `;`.
fn expand_range_across_semicolons(mut node: Node) -> tree_sitter::Range {
let start_byte = node.start_byte();
let start_point = node.start_position();

let mut end_byte = node.end_byte();
let mut end_point = node.end_position();

// We know `node` is at the start of a line, but it's possible the node
// ends with a `;` and needs to be extended
while let Some(next) = node.next_sibling() {
let next_start_point = next.start_position();

if end_point.row != next_start_point.row {
// Next sibling is on a different row, we are safe
break;
}
if next.is_comment() {
// Next sibling is a trailing comment, we are safe
break;
}

// Next sibling is on the same line as us, must be separated
// by a semicolon. Extend end of range to include next sibling.
end_byte = next.end_byte();
end_point = next.end_position();

// Update ending `node` and recheck (i.e. `1; 2; 3`)
node = next;
}

tree_sitter::Range {
start_byte,
end_byte,
start_point,
end_point,
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the magic we use to "expand" the range of the node we typically get back to also include support for trailing ;

It's basically: "if there is a sibling node on the same line as us, expand the range to include that sibling node"

Comment on lines 714 to +723
let node = find_statement_range_node(&root, cursor.unwrap().row).unwrap();
let range = expand_range_across_semicolons(node);

assert_eq!(
node.start_position(),
range.start_point,
sel_start.unwrap(),
"Failed on test {original}"
);
assert_eq!(
node.end_position(),
range.end_point,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tweaked the test function so we can use statement_range_test() even for the new tests

Comment on lines -1102 to 1099
fn test_if_statements_without_braces_can_run_individual_expressions() {
fn test_if_statements_without_braces_should_run_the_whole_if_statement() {
statement_range_test(
"
<<if (@a > b)
1 + 1>>",
);

// TODO: This should run the whole if statement because there are no braces
statement_range_test(
"
if (a > b)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lionel- I tweaked based on our discussions

Comment on lines -1135 to +1129
// TODO: I'm not exactly sure what this should run, but this seems strange
// TODO: This should run the whole if statement because there are no braces
statement_range_test(
"
if (a > b)
<<1 + 1>> else if @(b > c)
2 + 2 else 4 + 4
<<1 + 1 else if @(b > c)
2 + 2 else 4 + 4>>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make these extremely weird edge cases a little "worse" because previously we at least selected a complete expression before, but as you can see from the previous TODO we didn't really know the right thing to return here.

We now thing the right thing to return is unambiguously the entire top level if statement, so I've updated the TODO and I'm not really caring too much about the current result.

I don't anticipate it affecting anyone in practice really

@@ -1436,119 +1428,96 @@ test_that('stuff', {
#[test]
fn test_multiple_expressions_on_one_line() {
// https://github.com/posit-dev/positron/issues/4317

// Can't use `statement_range_test()` because it revolves around finding nodes not ranges
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can now use this for all these new tests!

Comment on lines -1497 to +1483
// FIXME: Should select up to 2
let doc = Document::new(
// Selects everything
statement_range_test(
"
{
@<<{
1
}; 2
",
None,
}; 2>>
",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@DavisVaughan DavisVaughan requested a review from lionel- November 26, 2024 19:08
@DavisVaughan DavisVaughan merged commit 472359c into bugfix/statement-semi-colons Dec 9, 2024
6 checks passed
@DavisVaughan DavisVaughan deleted the bugfix/statement-semi-colons-only-tree-sitter branch December 9, 2024 15:31
@github-actions github-actions bot locked and limited conversation to collaborators Dec 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants