Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kinesis source improvements for larger deployments #99

Merged
merged 1 commit into from
Nov 26, 2024

Conversation

istreeter
Copy link
Contributor

@istreeter istreeter commented Nov 25, 2024

Three improvements getting ready for using common-streams on larger pods or with more shards.

  1. Configurable max leases to steal at one time. The KCL default is 1. In order to avoid latency during scale up/down and pod-ration, we want the app to be quick to acquire shard-leases to process. With bigger instances we tend to have more shard-leases per instance, so we increase how aggressivley it acquires leases.
  2. Set KCL maxPendingProcessRecordsInput to the minimum allowed (1). This is a precaution to avoid potentials OOMs in case a single pod subscribes to a very large number of shards. It shouldn't have a negative performance impact, because common-streams apps tend to manage their own pre-fetching. In fact, this makes the Kinesis source more similar to the other Sources which don't have in-built pre-fetching.
  3. Fixes a bug in which latency was set by the latest record in a batch, not the earliest.

@istreeter istreeter changed the base branch from main to develop November 25, 2024 11:00
@istreeter istreeter force-pushed the kinesis-source-improvements branch 2 times, most recently from 52c5345 to b779ec3 Compare November 25, 2024 11:12
Two improvements getting ready for using common-streams on larger pods
or with more shards.

1. Configurable max leases to steal at one time. The KCL default is 1.
   In order to avoid latency during scale up/down and pod-ration, we
   want the app to be quick to acquire shard-leases to process. With
   bigger instances we tend to have more shard-leases per instance, so
   we increase how aggressivley it acquires leases.
2. Set KCL `maxPendingProcessRecordsInput` to the minimum allowed (1).
   This is a precaution to avoid potentials OOMs in case a single pod
   subscribes to a very large number of shards. It shouldn't have a
   negative performance impact, because common-streams apps tend to
   manage their own pre-fetching. In fact, this makes the Kinesis source
   more similar to the other Sources which don't have in-built
   pre-fetching.
3. Fixes a bug in which latency was set by the latest record in a batch,
   not the earliest.
@istreeter istreeter force-pushed the kinesis-source-improvements branch from b779ec3 to f07cc4b Compare November 26, 2024 09:23
@istreeter istreeter changed the base branch from develop to main November 26, 2024 09:26
@istreeter istreeter changed the base branch from main to develop November 26, 2024 09:26
@@ -10,6 +10,7 @@ snowplow.defaults: {
maxRecords: 1000
}
leaseDuration: "10 seconds"
maxLeasesToStealAtOneTimeFactor: 2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious that this is a decimal, you can't have a fraction of a lease surely? Doesn't seem like a likely oversight - so I'm guessing there's a reason I don't see?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This number is first multiplied by the number of processors. It is helpful to make it a decimal, e.g. maybe we want a 8-core instance to steal 2 leases at one time, so we set this value to 0.25.

I've been using this pattern in lots of places recently. It's a good way to make the app auto-guess a good configuration, instead of putting the burden on the user to configure it appropriately for its vertical size. And I consistently put Factor as a suffix of params that get multiplied by num processors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see - interesting. But nuanced, but I get it. It's cores*this = n instances. I assume it rounds up to the nearest CEIL, since it's max... Cool, thanks for explaining!

@@ -16,6 +16,26 @@ import java.net.URI
import java.time.Instant
import scala.concurrent.duration.FiniteDuration

/**
* Config to be supplied from the app's hocon
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the addition of in-code documentation for the other parameters too

Copy link
Contributor

@colmsnowplow colmsnowplow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@istreeter istreeter merged commit 38aaf95 into develop Nov 26, 2024
7 checks passed
@istreeter istreeter deleted the kinesis-source-improvements branch November 26, 2024 13:21
istreeter added a commit to snowplow-incubator/snowflake-loader that referenced this pull request Nov 26, 2024
See snowplow-incubator/common-streams#99 for the relevant change

This library upgrade brings improvements to the Kinesis source, which
should help on vertically larger instances.
istreeter added a commit to snowplow-incubator/snowflake-loader that referenced this pull request Nov 26, 2024
See snowplow-incubator/common-streams#99 for the relevant change

This library upgrade brings improvements to the Kinesis source, which
should help on vertically larger instances.
istreeter added a commit to snowplow-incubator/snowflake-loader that referenced this pull request Nov 26, 2024
* Bump common-streams to 0.9.0

See snowplow-incubator/common-streams#99 for the relevant change

This library upgrade brings improvements to the Kinesis source, which
should help on vertically larger instances.

* Bump snowflake-ingest-sdk to 3.0.0
istreeter added a commit to snowplow-incubator/snowplow-lake-loader that referenced this pull request Dec 31, 2024
Common streams 0.10.0 brings significant to changes to the Kinesis and
Pubsub sources:

- PubSub source completely re-written to be a wrapper around UnaryPull
  snowplow-incubator/common-streams#101
- Kinesis source is more efficient when the stream is re-sharded
  snowplow-incubator/common-streams#102
- Kinesis source better tuned for larger deployments
  snowplow-incubator/common-streams#99
istreeter added a commit to snowplow-incubator/snowplow-lake-loader that referenced this pull request Jan 3, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and
Pubsub sources:

- PubSub source completely re-written to be a wrapper around UnaryPull
  snowplow-incubator/common-streams#101
- Kinesis source is more efficient when the stream is re-sharded
  snowplow-incubator/common-streams#102
- Kinesis source better tuned for larger deployments
  snowplow-incubator/common-streams#99

And improvements to latency metrics:
- Sources should report stream latency of stuck events
  snowplow-incubator/common-streams#104
istreeter added a commit to snowplow-incubator/snowplow-bigquery-loader that referenced this pull request Jan 3, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and
Pubsub sources:

- PubSub source completely re-written to be a wrapper around UnaryPull
  snowplow-incubator/common-streams#101
- Kinesis source is more efficient when the stream is re-sharded
  snowplow-incubator/common-streams#102
- Kinesis source better tuned for larger deployments
  snowplow-incubator/common-streams#99

And improvements to latency metrics:
- Sources should report stream latency of stuck events
  snowplow-incubator/common-streams#104
istreeter added a commit to snowplow-incubator/snowplow-lake-loader that referenced this pull request Jan 10, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and
Pubsub sources:

- PubSub source completely re-written to be a wrapper around UnaryPull
  snowplow-incubator/common-streams#101
- Kinesis source is more efficient when the stream is re-sharded
  snowplow-incubator/common-streams#102
- Kinesis source better tuned for larger deployments
  snowplow-incubator/common-streams#99

And improvements to latency metrics:
- Sources should report stream latency of stuck events
  snowplow-incubator/common-streams#104
istreeter added a commit to snowplow-incubator/snowplow-bigquery-loader that referenced this pull request Jan 14, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and
Pubsub sources:

- PubSub source completely re-written to be a wrapper around UnaryPull
  snowplow-incubator/common-streams#101
- Kinesis source is more efficient when the stream is re-sharded
  snowplow-incubator/common-streams#102
- Kinesis source better tuned for larger deployments
  snowplow-incubator/common-streams#99

And improvements to latency metrics:
- Sources should report stream latency of stuck events
  snowplow-incubator/common-streams#104
istreeter added a commit to snowplow-incubator/snowplow-bigquery-loader that referenced this pull request Jan 14, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and
Pubsub sources:

- PubSub source completely re-written to be a wrapper around UnaryPull
  snowplow-incubator/common-streams#101
- Kinesis source is more efficient when the stream is re-sharded
  snowplow-incubator/common-streams#102
- Kinesis source better tuned for larger deployments
  snowplow-incubator/common-streams#99

And improvements to latency metrics:
- Sources should report stream latency of stuck events
  snowplow-incubator/common-streams#104
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants