-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release/3.3.0 #437
Release/3.3.0 #437
Conversation
Introduces version updates to third-party components along with refreshed SLULA license 1.1
Previously it was possible to send arbitrarily large payloads at the collector. Now, we add two mechanisms of handling large requests. First, `networking.maxPayloadSize` (defaulting to 1048576 = 1MB) will stream the input and generate SizeViolation bad event. Second, `networking.dropPayloadSize` (defaulting to 2097152 = 2MB) will return HTTP 413 and will not attempt to process the request. The two mechanisms come hand in hand allowing for setting a safe but tolerant configuration. If requests exceed `sink.maxBytes` the collector will attempt to split them as separate events. However, if payload is bigger than `maxPayloadSize`, parsing the body is risky and in case of an attack may result in failures. The collector will not attempt to parse the request and split it up but will generate a SizeViolation bad event. But if payload exceeds `dropPayloadSize` the request will not be handled and no bad event is going to be emitted. Instead HTTP 413 is generated.
In #431 we improved performance of the Kafka sink by calling `producer.send()` on a compute thread not on a blocking thread. This is a great improvement if it is always true that `producer.send()` never blocks. But `producer.send()` does in fact block if the producer needs to re-fetch topic metadata. This happens every 5 minutes by default, and is configured by the kafka setting `metadata.max.age.ms`. If `producer.send()` blocks on a compute thread, then it can potentially cause thread starvation, and negatively impact the collector's responsiveness to requests. This PR changes to executing `send()` on a dedicated thread. From the point of view of the collector, it is ok if `send()` is blocking on the dedicated thread. It is better than running it inside `Sync[F].blocking` (which is what we did before) because that tended to create a huge number of threads under some circumstances. In theory this shouldn't negatively affect performance much, even though it's a single thread. Because most of the time `send()` does not block; and when it does block (i.e. once per 5 minutes) then it is ok for the other events to get enqueued as a backlog of Callables on the thread. Kafka sink use a dedicated thread for potentially blocking send: Amendment 1
|
||
### Security | ||
- Add limit for payload allowed sizes [snowplow/stream-collector-private#1] | ||
- Security and license updates [snowplow/stream-collector-private#1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don’t need these links?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep the minimum changes in the content as it will be hard to manage both origins if content is different. If not strictly required I would keep them there.
No description provided.