Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poison messages handling #1040

Open
saguiitay opened this issue Feb 11, 2024 · 6 comments
Open

Poison messages handling #1040

saguiitay opened this issue Feb 11, 2024 · 6 comments
Labels

Comments

@saguiitay
Copy link
Contributor

As a rokochet of issue #1039 I've noticed that the message is being handled indefinitely.
I see the following traces in my logs:

PoisonMessageDetected: d8bdcf4d-ef40-4501-a99b-6e72309644ee: Message [TaskScheduled#1] with ID 6128efd4-8904-494d-9a2d-d2d8d01edbe7 has been dequeued 79 times and is now considered poison: {"Account":"...","TaskHub":"Services","...":"TaskScheduled","TaskEventId":1,"MessageId":"6128efd4-8904-494d-9a2d-d2d8d01edbe7","InstanceId":"d8bdcf4d-ef40-4501-a99b-6e72309644ee","ExecutionId":"b27db20915cf4e5db6cad3589dde88df","PartitionId":"...-workitems","DequeueCount":79}
AbandoningMessage: d8bdcf4d-ef40-4501-a99b-6e72309644ee: Abandoning [TaskScheduled#1] message back to ...-workitems and setting a visibility delay of 600ms: {"Account":"...","TaskHub":"...","EventType":"TaskScheduled","TaskEventId":1,"MessageId":"6128efd4-8904-494d-9a2d-d2d8d01edbe7","InstanceId":"d8bdcf4d-ef40-4501-a99b-6e72309644ee","ExecutionId":"b27db20915cf4e5db6cad3589dde88df","PartitionId":"services-workitems","SequenceNumber":208,"PopReceipt":"AgAAAAMAAAAAAAAAxVkNR+xc2gE=","VisibilityTimeoutSeconds":600}

Side note: the message say setting a visibility delay of 600ms, while the code actually delays for 600s (not ms).

Notice the DequeueCount value of 79 (I've seen values much higher). Perhaps there should be a setting that controls the maximum number of dequeues attempts, after which the message should just be disposed?

@cgillum @davidmrdavid

@cgillum
Copy link
Member

cgillum commented Feb 29, 2024

I think it makes sense to expose this as a setting. The reason that we allow it to keep going by default is to avoid data loss and allow users a chance to fix the root cause, but it makes sense to allow overriding this behavior.

@sig5
Copy link

sig5 commented Sep 24, 2024

I'd like to work on this.

@cgillum
Copy link
Member

cgillum commented Sep 25, 2024

@davidmrdavid you worked on a solution for this recently, right? My memory is fuzzy, but is your implementation broadly available and would it qualify for closing this issue?

@davidmrdavid
Copy link
Collaborator

@cgillum: I have an implementation for DTFx.AS v1, that's been recently superceded by @bachuv's re-implementation in DTFx.AS v2. My implementation has only been made available in private packages and has not been merged yet. The main blocker for @bachuv's PR today is for us to follow up on the action items we discussion after our internal design review: mostly removing some changes in DTFx.Core, and such.

I'm definitely interested in getting this moving again, but the group has been busy. I think at the very minimum what we should do is to in the open the remaining items for @bachuv's PR, and if the community wants to contribute those items, then I think we'd welcome it. @sig5 if you ping this thread again later this week, I can make sure to collect those remaining work items in case those are of interest/ you'd like to contribute to them. Does that sound like a good plan?

@sig5
Copy link

sig5 commented Sep 27, 2024

Yes, that sounds good @davidmrdavid ! Looking forward to contribute 😺.

@davidmrdavid
Copy link
Collaborator

Just linking the poison message handling PR here: #1130

I know I owe you some details here, @sig5. Let me try to first get some clarity on the next steps, it's unclear to me if someone from the core team is already working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants