Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: json output vol2 #281

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chudicek
Copy link
Contributor

@chudicek chudicek commented Jan 7, 2025

Json Output Redo

This PR contains changes in the json output format of the scan command (--output-format=json), as (mostly) discussed in the #239 PR.

Changes

meta field

The meta field is now an object, rather than an array of arrays.

Old format:

"meta": [
	[
		"meta_1",
		"value_1"
	]
]

New format:

"meta": {
	"meta_1": "value_1"
}

This maps more appropriately to the idea of the meta fields being a key-value collection.

Top-level json

The top-level json is now formed from an object that then holds the hits as one of its fields, rather than the top-level json being an array of hits.

Old format:

[
	{
		// hit object
	}
]

New format:

{
	// some extra fields
	"hits": [
		{
			// hit object
		}
	]
}

The benefit of the new representation is that it allows us to include extra information about the scan itself, which is useful for archiving the outputs of the scans (see version field).

Version field

The top-level json holds a field with information about the used version of yara-x.

{
	"version": "<the yara-x version (from Cargo.toml)>"
}

This is one of the possible extra pieces of information useful when archiving the output of the scan.

Denormalized hit objects

The object containing info about a hit contains both info about the scanned file and the rule that it matched.

Old format:

[
	{
		"path": "path to the scanned file",
		"rules": [
			{
				"identifier": "rule_that_matched"
				// ... meta, tags, strings
			}
		]
	}
]

New format:

{
	"hits": [
		{
			"rule": "rule_that_matched",
			"file": "path to the scanned file"
			// ... meta, tags, strings
		}
	]
}

Reasoning behind this is backwards compatibility. This format is already relied on by our tooling. This is the format that we originally proposed in 00ccc34.


Should there be any further questions about the reasoning behind any of the changes, I would be happy to discuss them in this PR.

@chudicek chudicek changed the title Json output vol2 refactor: json output vol2 Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant