Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #14: Refactoring ray portable runner #18

Merged
merged 6 commits into from
Jun 16, 2022

Conversation

pabloem
Copy link
Collaborator

@pabloem pabloem commented Jun 11, 2022

Making the code more readable and more efficient.

Also adding support for SDF-initiated checkpointing

  • This is when a user DoFn returns before being completely finished to checkpoint current work.

@pabloem
Copy link
Collaborator Author

pabloem commented Jun 13, 2022

r: @iasoon ? : )

fyi @pdames @jjyao @wilsonwang371

@pabloem
Copy link
Collaborator Author

pabloem commented Jun 13, 2022

I realize that CI is broken. Should we fix it as part of this PR? or as a follow-up PR?

@pabloem pabloem marked this pull request as ready for review June 14, 2022 04:54
@pabloem pabloem force-pushed the refactor-rayportablerunner branch 3 times, most recently from 21f68b3 to a5dff00 Compare June 14, 2022 04:59
@pabloem pabloem force-pushed the refactor-rayportablerunner branch from a5dff00 to c6a36ca Compare June 14, 2022 05:08
@pabloem
Copy link
Collaborator Author

pabloem commented Jun 14, 2022

I made all the portability runner tests run : )

@pabloem pabloem changed the title [WIP] Refactoring ray portable runner Fixes #14: Refactoring ray portable runner Jun 14, 2022
Copy link
Member

@iasoon iasoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
I can't comment on all code equally well, but I feel this version is generally cleaner indeed 👍

return self.transform_to_buffer_coder, self.data_output, self.stage_timers

def setup(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this method is only called after initialization of the ContextManager?

(transform_id, id)
] = timer_family_spec.timer_family_coder_id

def __reduce__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit / general remark: It's a bit unfortunate that we keep running into the serialization issue, and sometimes solve it by using a custom __reduce__, sometimes by registering a custom serializer (ray.util.register_serializer), and sometimes manually (SerializeToString / FromString).
It would be good if we could enable serialization for all the protobuf components in one central place - I'm not sure how that could be done though, as ray.util.register_serializer would have to be called on every ray worker that transmits protobuf objects. Maybe something to discuss?

@pabloem
Copy link
Collaborator Author

pabloem commented Jun 15, 2022

TODO: Rename immutable context managers to *Context

@iasoon iasoon mentioned this pull request Jun 15, 2022
@pabloem
Copy link
Collaborator Author

pabloem commented Jun 15, 2022

I created #28 and #29 to track fixing the comments in the PR. Maybe then we can merge to unblock other reviews? : )

@iasoon
Copy link
Member

iasoon commented Jun 16, 2022

Sounds good to me :)

@iasoon iasoon merged commit f1b8fde into ray-project:master Jun 16, 2022
@pabloem pabloem deleted the refactor-rayportablerunner branch June 21, 2022 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants