Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[chore][pkg/stanza] Fix the bug that the log emitter might hang when …
…the receiver retry indefinitely (open-telemetry#37159) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description I was exploring options for backpressure the pipeline when the exporter fails. Inspired by open-telemetry#29410 (comment), I realized that I could enable the `retry_on_failure` on the receiver side, and have it retry indefinitely by setting `max_elapsed_time` to 0. ```yaml receivers: filelog: include: [ input.log ] retry_on_failure: enabled: true max_elapsed_time: 0 ``` With this config, the consumer will be blocked at the `ConsumeLogs` func in `consumerretry` when the exporter fails to consume the logs: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/12551d324375bd0c4647a8cdc7bd0f8c435c1034/internal/coreinternal/consumerretry/logs.go#L35 The func `flusher()` from the `LogEmitter` starts a loop and call the `consumerFunc` with `context.Background()`. When the `ConsumeLogs` is blocked by the retry, there is no way to cancel the retry, thus the `LogEmitter` will hang when I try to shut down the collector. In this PR, I created a ctx in the `Start` func, which will be cancelled later in the `Shutdown` func. The ctx is passed to the flusher and used for the flush in every `flushInterval`. However, I have to swap it with another ctx with timeout during shutdown to flush the remaining batch out one last time. That's the best approach I can think of for now, and I'm open to other suggestions. --------- Signed-off-by: Mengnan Gong <[email protected]> Co-authored-by: Daniel Jaglowski <[email protected]>
- Loading branch information