-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #3076 parEvalMap resource scoping #3515
base: main
Are you sure you want to change the base?
Conversation
Updates parEvalMap* and broadcastThrough to extend the resource scope past the channel/topic used to implement concurrency for these operators.
Make extendScopeTo cancellation safe (see typelevel#3474)
Awesome, thanks for digging in to a gnarly part of the library. Some questions below. Not sure if this is the right approach or not but definitely worth exploring.
Any idea why leasing the main scope is insufficient? Why do we need to walk the tree, manipulating acquire nodes?
Does this solution have a different problem, where the manipulated acquires happen in the source pull's scope tree instead of the scope tree of the downstream/inner pull? Also, does this PR have the result of flattening all scopes of the downstream pull in to the scope of the parent pull at the time of
To fix this, I think |
Unfortunately, I don't have a clear idea (I don't understand the scopes well enough). But the way it manifested was as follows. With @reardonj's initial approach, this would work fine: fs2.Stream.bracket {
IO.println("making resource").as("TempFile")
} { res =>
IO.println(s"!! releasing resource: $res")
}
.parEvalMap(2) { _ =>
IO.println("using the resource")
}
.compile
.drain
.unsafeRunSync() but it would start failing in scenarios like this (we have this in the added test): Stream(1, 2, 3)
.flatMap { i =>
fs2.Stream.bracket {
IO.println(s"making resource $i").as(s"TempFile $i")
} { res =>
IO.println(s"!! releasing resource: $res")
}
}
.parEvalMap(2) { _ =>
IO.println("using the resource")
}
.compile
.drain
.unsafeRunSync()
Yes, that's what is happening here. Otherwise the source doesn't know to keep scopes open until the downstream pulls are done.
Pull
.eval(
scope.acquireResource(
poll =>
if (p.cancelable) poll(p.resource)
else p.resource,
p.release
)
) and And this is the part that is the most "scary" to me, and feels like it's too simplistic (I'm struggling to reason about it) - what would happen to intermediate brackets? Are there scenarios where intermediate brackets could be safely closed, but with this approach they will have to stay open until the whole thing finishes? What happens if the stream is "infinite"?.. |
Yeah, I'd expect the current state of this PR to have a problem with streams like this: source.prefetch.flatMap(x => Stream.resource(r1).flatMap(...) ++ s) In this example, we need the resource acquired by |
Without So it depends on where Here I'd expect source.prefetch.flatMap(x => Stream.resource(r1).flatMap(...).parEvalMap(2)(...) ++ s) But here: source.prefetch.flatMap(x => (Stream.resource(r1).flatMap(...) ++ s).parEvalMap(2)(...))
Stream(1).covary[IO].prefetch.flatMap { x =>
(
fs2.Stream.bracket {
IO.println("making resource").as("TempFile")
} { res =>
IO.println(s"!! releasing resource: $res")
}
++ Stream("Not-a-resource")
)
.parEvalMap(2) { r =>
IO.println(s"eval: $r")
}
}
.compile
.drain
.unsafeRunSync()
Stream(1).covary[IO].prefetch.flatMap { x =>
fs2.Stream.bracket {
IO.println("making resource").as("TempFile")
} { res =>
IO.println(s"!! releasing resource: $res")
}
.parEvalMap(2) { r =>
IO.println(s"eval: $r")
} ++
Stream.eval(IO.println("after ++"))
}
.compile
.drain
.unsafeRunSync()
|
My solution worked fine for singleton streams, but failed when the stream produced multiple resources (eg. TCP server). My understanding was this happened because we captured the scope on that first pull, but new scopes are opened for each subsequent resource and those aren't being propagated. I wonder if the best 'fix' here is documenting more clearly when scopes are propagated vs not. The whole setup feels very magical. As in: |
This is a continuation of #3512
Original description
Updates
parEvalMap*
,broadcastThrough
andprefetch
to extend the resource scope past the channel/topic used to implement concurrency for these operators.In all cases the solution is the same:
I have abstracted this solution to a new
extendScopeThrough
method, which works asthrough
except propagating the scope to the new stream. It appears we should be doing this for any stream combinator where the resulting stream is not directly derived from the current stream (e.g. any combinator that uses an internal buffer, channel, topic, or such).I also had to fix the cancallation safety of extendScopeTo (see #3474). The StreamSuite "resource safety test 4" started failing after my changes, presumably because the scope started propagating far enough to actually get hit by cancellation.
There is at least conflateChunks broken the same way but I'd like to validate this solution is correct before I continue trying to chase down all the places the fix should be applied.
In this PR, the implementation of the
extendScopeThrough
is changed (because @reardonj found that the original one was not handling more complicated cases correctly):extendScopeTo
, a new method is defined and used:Pull.bindAcquireToScope
bindAcquireToScope
"walks" the tree of pulls and updates everyPull.Acquire
to usescope.acquireResource
, binding all acquisitions to the provided scope.I can say I don't really know what I'm doing here :) - it was a desperate attempt to fix the issue. But it seems to have worked.
Given that - this PR really needs a review by someone who knows how all this works (scopes, pulls, etc).
There is one kind of
Pull
that I do not know how to process:Pull.Translate
. At the very least, we don't have aMonadThrow
forG
, which is needed to callbindAcquireToScope
recursively.Pull.StepLeg
is another one that I don't know how to handle correctly.TODO
println
sPull.Translate
Pull.StepLeg
Outcome.Succeeded(Left(id))
returned fromscope.acquireResource