Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JQ filter #375

Merged
merged 1 commit into from
Oct 11, 2024
Merged

Add JQ filter #375

merged 1 commit into from
Oct 11, 2024

Conversation

pondzix
Copy link
Contributor

@pondzix pondzix commented Oct 9, 2024

jira ref: PDP-1463

Adding JQ filter, which allows us to...filter messages based on a configured JQ command. Similar to already existing JQ mapper, but JQ filter requires output of a command to have boolean type.

As there is some shared logic between the mapper and the new filter, I extracted common stuff to jq_common.go module.

@pondzix pondzix force-pushed the jq_filter branch 5 times, most recently from ff7829b to 2d2f513 Compare October 9, 2024 13:04
Adding JQ filter, which allows us to...filter messages based on a configured JQ command. Similar to already existing JQ mapper, but JQ filter requires output of a command to have boolean type.

As there is some shared logic between the mapper and the new filter, I extracted common stuff to `jq_common.go` module.
Copy link
Contributor

@adatzer adatzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the implementation!
Still thinking whether we should crash, if the output is not boolean, or not..
Does the type of the output depend only on the jq command?
What i mean is: assuming same kind of input, can the same jq command have different type of output? If not, i.e. a jq command applied on snowplow data will always be boolean/not-boolean, then i think that crashing would also make sense.
What do you think?

@pondzix
Copy link
Contributor Author

pondzix commented Oct 10, 2024

@adatzer

I think in practice if we get non-boolean output once, it means we'd get it for ANY input data. Not like, it's ok for this event but it fails for the other batch of events. In such case it's a full disaster and we'd generate lots of (unrecoverable?) bad data :/

So...now I think we should crash instead and signal in logs there is a problem with JQ filter configuration

@colmsnowplow
Copy link
Collaborator

I think it's one to mull over.

One thought I have is that the user provides the thing that causes the problem. The same pattern is handled elsewhere in the app by sending the data to invalid - if the user, for example, divides by 0 in a JS script, we treat that as invalid.

But I need to think about it a bit I think

// This is where actual filtering is implemented, based on a JQ command output.
func filterOutput(jqOutput transform.JqCommandOutput) transform.TransformationFunction {
return func(message *models.Message, interState interface{}) (*models.Message, *models.Message, *models.Message, interface{}) {
shouldKeepMessage, isBoolean := jqOutput.(bool)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using isBoolean instead of the standard ok is a wise move

@colmsnowplow
Copy link
Collaborator

@pondzix @adatzer I think it's best to keep the current behaviour - if we get a non-boolean treat it as invalid data, and send it to the failure target, don't crash.

I'm not 100% sure it's the best answer but it's an existing pattern, so in the name of progressing things, let's just go with that decision. If we encounter problems with it we can always release a change.

Happy with that?

@pondzix pondzix merged commit cc30121 into develop Oct 11, 2024
1 check passed
@pondzix pondzix deleted the jq_filter branch October 11, 2024 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants