Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] DynamicSupervisor please #42

Open
sclee15 opened this issue Feb 4, 2023 · 17 comments
Open

[Feature] DynamicSupervisor please #42

sclee15 opened this issue Feb 4, 2023 · 17 comments
Labels
help wanted Extra attention is needed

Comments

@sclee15
Copy link

sclee15 commented Feb 4, 2023

Hello.

I think Gleam's typed OTP and its concept of subject is great.

But, It would be better to have some types of DynamicSupervisor that allow me to spawn workers as I need.

@lpil
Copy link
Member

lpil commented Feb 5, 2023

Sounds great!

@lpil lpil added the help wanted Extra attention is needed label Feb 5, 2023
@arnarg
Copy link
Contributor

arnarg commented Jul 20, 2023

I'm also interested in this.

In terms of API design, do you see any value in having a separate supervisor and dynamic supervisor (afaict this is the case in Elixir) or should there just be a single supervisor type that happens to be dynamic, if you want it to be static you don't add more children to it. I'm personally leaning towards the option of just having a single type that is dynamic.

Also, should the supervisor still have an init function to setup initial children or do you just create a supervisor and start adding children to it dynamically?

Option 1

pub fn main() {
  let assert Ok(sup) = supervisor.start(fn(children) {
    children
    |> add(worker(database.start))
    |> add(worker(monitoring.start))
    |> add(worker(web.start))
  })

  // Something happens in between

  let assert Ok(runner) = supervisor.add_child(sup, worker(runner.start))
}

Option 2

pub fn main() {
  let assert Ok(sup) = supervisor.start()
  let assert Ok(db) = supervisor.add_child(sup, worker(database.start))
  let assert Ok(mon) = supervisor.add_child(sup, worker(monitoring.start))
  let assert Ok(web) = supervisor.add_child(sup, worker(web.start))

  // Something happens in between

  let assert Ok(runner) = supervisor.add_child(sup, worker(runner.start))
}

I see the value in not breaking the existing API but I also find option 2 to be a bit simpler API.

@lpil
Copy link
Member

lpil commented Jul 20, 2023

With option 2 how does it restart the children when one dies?

@arnarg
Copy link
Contributor

arnarg commented Jul 20, 2023

I haven't really looked into the current implementation but I'm assuming it will need to keep some kind of list of children.

Does it currently call the init function every time a child dies?

@lpil
Copy link
Member

lpil commented Jul 20, 2023

Nope, the current supervisor implements the rest_for_one strategy. I think we likely need to supervisors that implement all the different strategies, and possibly some other patterns that may be useful given Gleam OTP's lack of process naming.

@arnarg
Copy link
Contributor

arnarg commented Jul 21, 2023

Oh I see, to be honest I wasn't too familiar with all the different strategies.

I guess rest_for_one was the most logical to start with so you can control the arguments down the chain (so you don't have an old reference to a subject belonging to a process that already died)?

So some thoughts:

  • rest_for_one
    • The init function will be kept (no matter the strategy picked). This is the only place to set up the arguments.
    • All dynamic children added afterwards will be put at the back of the "chain" (as if they were just last in the init function). So they're always restarted if any child from the init function dies.
      • Possibly the dynamic children alone can be configured as one_for_one or one_for_all. So if any of the dynamic children die either only that one or all of them are restarted but the initial ones are unaffected.
  • one_for_all
    • This one can pretty much be treated as rest_for_all except everything is started from scratch, right?
  • one_for_one
    • This one is simple except for argument passing.
    • Should there be a function to set the argument for the supervisor to pass to children that can be set externally (i.e. supervisor.set_arguments(sup, MyArguments(...)))?
      • How does this affect already running children?
    • Should the supervisor maintain a registry that children can (optionally) use to register itself by name and will be passed to children?
      • This will need to send an update to all children when updated, not optimal.

Or would it be preferred to have distinct static and dynamic supervisors?

@lpil
Copy link
Member

lpil commented Jul 22, 2023

I don't think we could safely restart any dynamically added children as the initial state that was used to create them is not controlled by the supervisor.

Take a web server that does some background processing as an example. It could have a web server process, a database, connection process, and dynamically, added worker processes.

If there was to be a failure, which caused them all to be restarted, the web application and the database, connection processes would be initialise correctly, but if any of the work processes were restarted using their original initial state, they would have references to the no longer existing database connection process, and such would always fail. This would eventually result in there being too much restart intensity and the entire supervisor would fail.

@arnarg
Copy link
Contributor

arnarg commented Jul 22, 2023

I see. Given my limited experience using supervisors I might not be the best person to come up with designs for this 😅.

Would a more typical use case add the dynamic supervisor as a child of a static one so that the whole dynamic supervisor is restarted if any of its dependencies crash (web server, database, ...)?

If so, then two distinct types of supervisors might make more sense.

@guillheu
Copy link

I definitely could make use of a dynamic supervisor.

From what I've read, my understanding is that the rest-for-one strategy makes sense for the current supervisor since the init state of each worker depends on the state returned by the worker initialized right before it. In other words, each child worker starting returns the init state for its immediate "younger sibling". Since that init state for the "younger sibling" might therefor be different, the "younger sibling" should also be restarted, for example, if each worker needs the subject of its immediate "older sibling".

For a dynamic supervisor though, I feel like it'd be easier to use a one-for-one strategy. This would have the limitation of not allowing workers to pass state to each other. Obviously it wouldn't work in the example @lpil gave, where multiple workers depend on each other, but I feel like it'd be better than nothing.

Am I understanding the situation correctly ?

@lpil
Copy link
Member

lpil commented Nov 19, 2024

Yup that's right. "Dynamic supervisor" is what Elixir folks call a supervisor with the one for one strategy.

@guillheu
Copy link

I've been thinking about how to implement a dynamic supervisor. I'm trying to start from the current otp supervisor and changing things as I go. Feel free to correct my misunderstandings and point me in the right direction.

  • The init function for the dynamic supervisor should not start any children. Instead, it's the loop that should initialize children through a new message variant, something like ChildAdded.
  • I understand that actor.start_spec automatically links the process of a new actor to its calling process, which is why process.traps_exits and process.selecting_trapped_exits can be used in the init function without explicitly linking the child process to the supervisor.
  • The returning type argument of the ChildSpec type is no longer useful as previously intended, since the "next" child does not need its return. However, I think having a function that executes after initialization is still useful. For instance, I've been using the supervisor ChildSpec.returning function to register the new child subject to a subject registry (using Chip). I think having a configurable function that runs after initialization and gives access to the new worker subject is a good thing, but should be called something else. Maybe something like finalize instead of returning. Also, it would return Nil instead of the returning type argument, which can be removed from the ChildSpec.
  • Similarily, the Spec for the dynamic supervisor does not need a return type argument
  • I don't think the Children or Starter types are necessary. AFAICT they're needed for the one-to-rest strategy, but don't serve a purpose for a one-to-one.

Here's an overview:

pub opaque type Message(argument, children_message) {
  ChildAdded(child: ChildSpec(children_message, argument))
  Exit(process.ExitMessage)
  RetryRestart(Pid)
}

type State {
  State(
    restarts: IntensityTracker,
    // starter: Starter(a),
    retry_restarts: Subject(Pid),
  )
}

pub type Spec(argument) {
  Spec(
    argument: argument,
    max_frequency: Int,
    frequency_period: Int,
    init: fn(argument) -> Nil,
  )
}

pub opaque type ChildSpec(msg, argument) {
  ChildSpec(
    start: fn(argument) -> Result(Subject(msg), StartError),
    finalize: fn(argument, Subject(msg)) -> Nil,
  )
}

fn loop(
  message: Message(argument, children_message),
  state: State,
) -> actor.Next(Message(argument, children_message), State) {
  case message {
    Exit(exit_message) -> todo  //handle_exit(exit_message.pid, state)
    RetryRestart(pid) -> todo   //handle_exit(pid, state)
    ChildAdded(child) -> todo
  }
}

fn init(
  spec: Spec(argument),
) -> actor.InitResult(State, Message(argument, children_message)) {
  // Create a subject so that we can asynchronously retry restarting when we
  // fail to bring an exited child
  let retry = process.new_subject()

  // Trap exits so that we get a message when a child crashes
  process.trap_exits(True)

  // Combine selectors
  let selector =
    process.new_selector()
    |> process.selecting(retry, RetryRestart)
    |> process.selecting_trapped_exits(Exit)

  todo
}

pub fn worker(
  start: fn(argument) -> Result(Subject(msg), StartError),
  finalize: fn(argument, Subject(msg)) -> Nil,
) -> ChildSpec(msg, argument) {
  ChildSpec(start: start, finalize: finalize)
}

One issue with this design is all the workers under the same dynamic supervisor must all return a subject with the same message type. IDK if this is the same in Elixir, but I was forced to specify the ChildSpec message type in the dynamic supervisor Message because of the ChildAdded message.

@lpil
Copy link
Member

lpil commented Nov 19, 2024

I think the correct way to go would be to use the Erlang supervisor module as a base as it is well tested and known to work. That's what static_supervisor does, which is recommended over supervisor, which has several bugs presently.

That said, I've not looked into this much so I don't know what the API might be or what limitations might arise.

@guillheu
Copy link

I see.
That seems feasible at first glance. simple-one-to-one fits my use case at least.
I think I'm gonna need a bit more time playing with the Erlang supervisor module before I can confidently contribute to a dynamic_supervisor for gleam_otp.

@guillheu
Copy link

guillheu commented Nov 21, 2024

I started working on the simple-one-for-one Erlang supervisor, but realized I was mostly just reusing static_supervisor, which makes sense. I decided to add the SimpleOneForOne strategy to the static_supervisor.
Because of how the simple-one-for-one strategy works, I also had to add a binding for the Erlang supervisor:start_child function. This means the static_supervisor isn't so static anymore. One thing led to another, and I ended up binding to what I think are the most important Erlang dynamic supervisor functions for all strategies:

  • start_child/2
  • terminate_child/2
  • restart_child/2
  • delete_child/2
  • count_children/1

I essentially have a functional (but messy) dynamic supervisor using the Erlang supervisor module.
You can compare my work with the current otp main branch here main...guillheu:otp:simple-one-for-one-erlang

I haven't opened a PR yet because I think my work is very, very messy, and I'd like some feedback first.
Sorry if this is a big code review to ask for, feel free to ask for clarifications. I'll try to be responsive.

The simple-one-for-one behaves so differently from the other 3 strategies I'm thinking it might be preferable to have it in its own separate module.

@ryanmiville
Copy link
Contributor

I made a dynamic_supervisor module based on simple_one_for_one with minimal differences from static_supervisor.

usage looks like this:

import gleam/otp/dynamic_supervisor as sup

pub fn start_supervisor() {
  let assert Ok(supervisor) =
    sup.new(sup.worker_child("worker", start_worker))
    |> sup.start_link

  let assert Ok(p1) =  sup.start_child(supervisor, "1")
  let assert Ok(p2) =  sup.start_child(supervisor, "2")
}

fn start_worker(args) -> Result(Pid, error) {
 // Start a worker process
}

If this feels like an appropriate addition, I'd be happy to open a PR.

Link to fork

@guillheu
Copy link

@ryanmiville Great minds think alike it seems. I opened a PR with a very similar take a few weeks ago. Still discussing with Louis regarding how to organize the API.

@lpil
Copy link
Member

lpil commented Jan 21, 2025

There's not a discussion! I have been clear about what the design must be for a contribution to be accepted into this library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants