Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow IPC serializers and parsers #968

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

JosiahParry
Copy link
Contributor

This PR adds support for the Arrow IPC format with an IPC serializer and parser.

PR task list:

  • Update NEWS
  • Add tests
  • Update documentation with devtools::document()

@CLAassistant
Copy link

CLAassistant commented Dec 17, 2024

CLA assistant check
All committers have signed the CLA.

@JosiahParry
Copy link
Contributor Author

Please let me know if there is anything needed for this PR.

@schloerke schloerke requested a review from gadenbuie January 10, 2025 18:20
NEWS.md Outdated Show resolved Hide resolved
@@ -491,6 +491,17 @@ parser_feather <- function(...) {
})
}

#' @describeIn parsers Arrow IPC parser. See [arrow::read_ipc_stream()] for more details.
#' @export
parser_arrow_ipc <- function(...) {
Copy link
Member

@gadenbuie gadenbuie Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this should be named parser_arrow_ipc_stream() or something similar? From Read Arrow IPC stream format — read_ipc_stream • Arrow R Package it seems that there are two IPC formats, "stream" and "file":

Apache Arrow defines two formats for serializing data for interprocess communication (IPC): a "stream" format and a "file" format, known as Feather.

Since the "file" format is synonymous with "feather" (which already has parse_feather()), my take-away is that stream is an important aspect we should include in the name. I also like trying to match the naming of the underlying function (arrow::{read,write}_ipc_stream()) while staying within plumber's naming patterns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gocha! That's a good point. I believe though that Feather specifically refers to the V1 format. Then V2 is the IPC file—though I think it is a bit of a vagary there.

So, to be crystal clear before merge:

  • Change parser_arrow_ipc() to parser_arrow_ipc_stream()

Copy link
Member

@gadenbuie gadenbuie Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it sounds like there some room for confusion around which variant of IPC this is that would be cleared up by parser_arrow_ipc_stream(), so I'm in favor of making that change before merging. Otherwise the PR looks great, thanks @JosiahParry!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants