-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JQ filter #375
Add JQ filter #375
Conversation
ff7829b
to
2d2f513
Compare
Adding JQ filter, which allows us to...filter messages based on a configured JQ command. Similar to already existing JQ mapper, but JQ filter requires output of a command to have boolean type. As there is some shared logic between the mapper and the new filter, I extracted common stuff to `jq_common.go` module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the implementation!
Still thinking whether we should crash, if the output is not boolean, or not..
Does the type of the output depend only on the jq command?
What i mean is: assuming same kind of input, can the same jq command have different type of output? If not, i.e. a jq command applied on snowplow data will always be boolean/not-boolean, then i think that crashing would also make sense.
What do you think?
I think in practice if we get non-boolean output once, it means we'd get it for ANY input data. Not like, it's ok for this event but it fails for the other batch of events. In such case it's a full disaster and we'd generate lots of (unrecoverable?) bad data :/ So...now I think we should crash instead and signal in logs there is a problem with JQ filter configuration |
I think it's one to mull over. One thought I have is that the user provides the thing that causes the problem. The same pattern is handled elsewhere in the app by sending the data to invalid - if the user, for example, divides by 0 in a JS script, we treat that as invalid. But I need to think about it a bit I think |
// This is where actual filtering is implemented, based on a JQ command output. | ||
func filterOutput(jqOutput transform.JqCommandOutput) transform.TransformationFunction { | ||
return func(message *models.Message, interState interface{}) (*models.Message, *models.Message, *models.Message, interface{}) { | ||
shouldKeepMessage, isBoolean := jqOutput.(bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using isBoolean
instead of the standard ok
is a wise move
@pondzix @adatzer I think it's best to keep the current behaviour - if we get a non-boolean treat it as invalid data, and send it to the failure target, don't crash. I'm not 100% sure it's the best answer but it's an existing pattern, so in the name of progressing things, let's just go with that decision. If we encounter problems with it we can always release a change. Happy with that? |
jira ref: PDP-1463
Adding JQ filter, which allows us to...filter messages based on a configured JQ command. Similar to already existing JQ mapper, but JQ filter requires output of a command to have boolean type.
As there is some shared logic between the mapper and the new filter, I extracted common stuff to
jq_common.go
module.