Release/3.3.0 #437

AlexBenny · 2025-01-07T12:08:32Z

No description provided.

Introduces version updates to third-party components along with refreshed SLULA license 1.1

Previously it was possible to send arbitrarily large payloads at the collector. Now, we add two mechanisms of handling large requests. First, `networking.maxPayloadSize` (defaulting to 1048576 = 1MB) will stream the input and generate SizeViolation bad event. Second, `networking.dropPayloadSize` (defaulting to 2097152 = 2MB) will return HTTP 413 and will not attempt to process the request. The two mechanisms come hand in hand allowing for setting a safe but tolerant configuration. If requests exceed `sink.maxBytes` the collector will attempt to split them as separate events. However, if payload is bigger than `maxPayloadSize`, parsing the body is risky and in case of an attack may result in failures. The collector will not attempt to parse the request and split it up but will generate a SizeViolation bad event. But if payload exceeds `dropPayloadSize` the request will not be handled and no bad event is going to be emitted. Instead HTTP 413 is generated.

In #431 we improved performance of the Kafka sink by calling `producer.send()` on a compute thread not on a blocking thread. This is a great improvement if it is always true that `producer.send()` never blocks. But `producer.send()` does in fact block if the producer needs to re-fetch topic metadata. This happens every 5 minutes by default, and is configured by the kafka setting `metadata.max.age.ms`. If `producer.send()` blocks on a compute thread, then it can potentially cause thread starvation, and negatively impact the collector's responsiveness to requests. This PR changes to executing `send()` on a dedicated thread. From the point of view of the collector, it is ok if `send()` is blocking on the dedicated thread. It is better than running it inside `Sync[F].blocking` (which is what we did before) because that tended to create a huge number of threads under some circumstances. In theory this shouldn't negatively affect performance much, even though it's a single thread. Because most of the time `send()` does not block; and when it does block (i.e. once per 5 minutes) then it is ok for the other events to get enqueued as a backlog of Callables on the thread. Kafka sink use a dedicated thread for potentially blocking send: Amendment 1

stanch · 2025-01-07T13:30:33Z

CHANGELOG.md

+
+### Security
+- Add limit for payload allowed sizes [snowplow/stream-collector-private#1]
+- Security and license updates [snowplow/stream-collector-private#1]


We probably don’t need these links?

I'd like to keep the minimum changes in the content as it will be hard to manage both origins if content is different. If not strictly required I would keep them there.

peel and others added 5 commits January 7, 2025 13:06

Security and license updates

f74e8e8

Introduces version updates to third-party components along with refreshed SLULA license 1.1

Install sbt for release tasks

34acafe

Prepare for 3.3.0 release

00c3450

stanch reviewed Jan 7, 2025

View reviewed changes

stanch approved these changes Jan 7, 2025

View reviewed changes

istreeter approved these changes Jan 7, 2025

View reviewed changes

AlexBenny merged commit 0af99f7 into master Jan 7, 2025
3 checks passed

AlexBenny deleted the release/3.3.0 branch January 7, 2025 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release/3.3.0 #437

Release/3.3.0 #437

AlexBenny commented Jan 7, 2025

stanch Jan 7, 2025

AlexBenny Jan 7, 2025

Release/3.3.0 #437

Release/3.3.0 #437

Conversation

AlexBenny commented Jan 7, 2025

stanch Jan 7, 2025

Choose a reason for hiding this comment

AlexBenny Jan 7, 2025

Choose a reason for hiding this comment