Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka sink to open fewer threads #431

Merged
merged 1 commit into from
Nov 19, 2024
Merged

Kafka sink to open fewer threads #431

merged 1 commit into from
Nov 19, 2024

Conversation

istreeter
Copy link
Contributor

No description provided.

@istreeter istreeter force-pushed the kafka-sink-fewer-threads branch from cd45a9d to e585e8e Compare November 6, 2024 07:38
@istreeter istreeter force-pushed the kafka-sink-fewer-threads branch from e585e8e to 6abd80b Compare November 6, 2024 07:52
@peel peel merged commit 9807f4a into develop Nov 19, 2024
3 checks passed
@peel peel deleted the kafka-sink-fewer-threads branch November 21, 2024 13:49
AlexBenny pushed a commit that referenced this pull request Jan 7, 2025
In #431 we improved performance of the Kafka
sink by calling `producer.send()` on a compute thread not on a blocking
thread.  This is a great improvement if it is always true that
`producer.send()` never blocks.

But `producer.send()` does in fact block if the producer needs to
re-fetch topic metadata.  This happens every 5 minutes by default, and
is configured by the kafka setting `metadata.max.age.ms`.  If
`producer.send()` blocks on a compute thread, then it can potentially
cause thread starvation, and negatively impact the collector's
responsiveness to requests.

This PR changes to executing `send()` on a dedicated thread. From the
point of view of the collector, it is ok if `send()` is blocking on the
dedicated thread. It is better than running it inside `Sync[F].blocking`
(which is what we did before) because that tended to create a huge
number of threads under some circumstances.

In theory this shouldn't negatively affect performance much, even though
it's a single thread.  Because most of the time `send()` does not block;
and when it does block (i.e. once per 5 minutes) then it is ok for the
other events to get enqueued as a backlog of Callables on the thread.

Kafka sink use a dedicated thread for potentially blocking send: Amendment 1
AlexBenny pushed a commit that referenced this pull request Jan 7, 2025
In #431 we improved performance of the Kafka
sink by calling `producer.send()` on a compute thread not on a blocking
thread.  This is a great improvement if it is always true that
`producer.send()` never blocks.

But `producer.send()` does in fact block if the producer needs to
re-fetch topic metadata.  This happens every 5 minutes by default, and
is configured by the kafka setting `metadata.max.age.ms`.  If
`producer.send()` blocks on a compute thread, then it can potentially
cause thread starvation, and negatively impact the collector's
responsiveness to requests.

This PR changes to executing `send()` on a dedicated thread. From the
point of view of the collector, it is ok if `send()` is blocking on the
dedicated thread. It is better than running it inside `Sync[F].blocking`
(which is what we did before) because that tended to create a huge
number of threads under some circumstances.

In theory this shouldn't negatively affect performance much, even though
it's a single thread.  Because most of the time `send()` does not block;
and when it does block (i.e. once per 5 minutes) then it is ok for the
other events to get enqueued as a backlog of Callables on the thread.

Kafka sink use a dedicated thread for potentially blocking send: Amendment 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants