Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor FormattedVector for faster formatting of S3 objects #646

Merged
merged 2 commits into from
Dec 10, 2024

Conversation

dfalbel
Copy link
Contributor

@dfalbel dfalbel commented Nov 27, 2024

The variables pane uses the FormattedVector API to format vector elements to be displayed. One important assumption that we do when using this API is that it's lazy, in the sense that we create an iterator over the elements of a vector and we only pay the formatting price for the values we actually iterated.

This is true for atomic vectors, for which we iterate using R's C API eg, extracting values with REAL_ELT and formattinng them in Rust. However, this is not true for any vector that has a class (except factors), in this case we need to use the format function, and we were actually formatting the entire vector before being able to iterate over the formatted values.
This is very costly if we need for format very large vectors, as when we expand the data.frame in posit-dev/positron#3628 (comment)

This PR refactors the FormattedVector API, with the main change being:

  • We no longer format the entire vector upon construction of the FormattedVector
  • Instead, we'll only format the vector when creating the iter()
  • We provide a few different iterators: iter(), iter_n(), column_iter(), column_iter_n(). Those suffixed with n will only format the requested part of the vector.

As a consequence, now creating the iter() returns a Result, so we need to adapt in a few places.
Now, most of the times in the variables pane we'll use iter_n() to only format the first N elements, that are necessary
to create the display value in the variables pane.

@dfalbel dfalbel requested a review from lionel- November 27, 2024 15:33
@dfalbel dfalbel force-pushed the bugfix/refactor-formatted-vector branch from 15f6f34 to 897cc9f Compare November 27, 2024 18:32
@dfalbel dfalbel force-pushed the bugfix/refactor-formatted-vector branch from 897cc9f to 19fe643 Compare December 5, 2024 16:49
/// Returns an iterator over the first `n` elements of a vector.
/// Should be used when the vector is potentially large and you won't need to
/// iterate over the entire vector.
pub fn iter_n(&self, n: usize) -> anyhow::Result<FormattedVectorIter> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe iter_take?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that!

/// If `indices` is `None`, the iterator will iterate over all elements of the vector.
/// If `indices` is `Some`, the iterator will iterate over the elements at the specified indices.
/// The indices must be valid and in bounds as we do no bounds checking.
fn new_unchecked(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is missing a Safety comment to explain what is being unchecked exactly and what are the failure modes (panics, R errors?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some notices here: e862651

@dfalbel dfalbel merged commit 3b8ff61 into main Dec 10, 2024
6 checks passed
@dfalbel dfalbel deleted the bugfix/refactor-formatted-vector branch December 10, 2024 17:01
@github-actions github-actions bot locked and limited conversation to collaborators Dec 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants