-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter for ConfigExtractor to Improve Performance #299
Comments
Hmm... most extractors within ConfigExtractor should have a YARA rule associated to trigger the extractor, that being said there are some extractors that may not have any rules associated and so they try to brute-force analysis (the assumption is that they have tests at runtime to determine if the file is relevant to the extractor). If you have any samples that are shareable that did result in long processing, I would be curious to test them out and push PRs back to the maintainers 😁 |
I think the bottleneck that I see on ConfigExtractor is that many file objects are being queued for processing by the service that should not in the first place. I understand the filters inside the service, but I'm thinking of a feature to prevent junk from being queued in the first place. The test file I have been using produces 16 objects that then get queued for this service. Almost all of the objects are PE components except for two the installer and the one file in the NSIS archive overlay. I'd even allow for the uninstaller to be considered a third file. But sending a bunch of sections and other chunks carved from a PE to the config extraction service in the first place is what I'm thinking about preventing. Test file: c453b20437d728f5c6f0133bc3709ac24a0edb964304724bfbe62fa65ba77b1d |
I think there is a general issue with the Config Extractor performance - I disabled the service by default and use it only when I suspect it may be helpful.
You can verify it afterwords by looking in the results - if there is an empty result, then they were processed. It's however a good question if the extractor configs really expect any file type to be a possible configuration (but I could imagine this) - if not, maybe the accepted file type should be limited to executables? |
I need to dig into everything happening inside the ConfigExtractor (CEx) service to make a complete recommendation or PR, but in general I could see a benefit of doing some decision making outside the service before an object is queued to be processed. I sketched out a diagram that should help understand what I am thinking. After a sample is processed, there are three general categories of objects downstream from the processing: the input object, whole object children, and fragment children. An example to differentiate what I mean is in the NSIS installer test file above, there are three identical executables in the archive overlay. These are whole object children. There are an array of PE components like sections that are produced by the PE analysis. These are fragment children. The service that produces these files has knowledge of what they are and based on that should mark them somehow. Fragment children should never be queued for ConfigExtraction in the first place, so I have colored that red. The ConfigExtractor I have split into three flavors: Targeted, YOLO, and Brute Force. These can all work from the exact same container but deployed with configuration options that change, enable, or disable processing as appropriate. The result is actually three service flavors running separately. The targeted flavor would only process files that are a-priori known to be a malware family handled by code in the service. The YOLO flavor is a more generalist configuration that handles any exe or document. And the brute force flavor would do its thing on every object that is sent to it. Depending on the use case, a user can enable or disable any of these three service flavors. For all objects, they would be processed and file type identified, except for the parent which would already have that. They all would also go through YARA scanning to get tags. Based on the file typing, some of the resulting files would be sent to the YOLO. Based on the YARA tagging some of the resulting files would be sent to the targeted service. And then optionally, everything can be sent to brute force. |
This sounds reasonable and more specific configuration sounds good, but I'd suggest to first confirming that the ConfigExtractor is slow on every file (or did you do it already?). If there was a one file processed in 9 minutes and the rest rejected immediately, we won't get almost any improvement filtering them earlier. |
No, not yet. I need to do some deeper analysis on this problem. I am making an educated guess based on the number of files that were queued for processing in this service compared with the number of objects shown as child objects from the test file I submitted in the UI. I have a bunch of projects going at the moment, but I will dive into this more completely soon. |
One way this could be done is limit the file acceptance to That being said, based on our usage of the service, we've seen hits for files that match the pattern:
We do kind of have something like this but not at the service-level, it's handled by the underlying library and it's only really handles the use-case of targeted (self declared by the extractor by using YARA rules) or brute force (where the expectation is that the extractor will be able to handle things at runtime, quit early or not):
YOLO is what I would describe as an extractor that could use a YARA rule to loosely target anything that resembles an exe or an acceptable document based on the magic bytes or it could make that determination at runtime by performing those validity checks before any attempt at config extraction (basically a smarter brute force). |
Running a similar test in our production system with all service categories, we were able to complete processing of all 17 (root + children) files in ~90s. What I wonder is if the additional time on your deployment is coming from the system having to scale up the number of service instances in response to the backlog for the service? On our deployment, we have the Performing some custom filtering before the service is a tricky situation as it would need to involve scheduling in the dispatcher, and it introduces a situation of the service needing to depend on the results of another from a previous stage where they should maintain their independence if they can. |
Is your feature request related to a problem? Please describe.
All files, even file fragments from PE component extraction, are sent to the ConfigExtractor. There should be a filter so that only files that can potentially have a config extracted are sent to this service. I understand that this may increase the burden of maintaining YARA rules or other filtration methods for identifying files that could even have a config extracted in the first place.
Describe the solution you'd like
YARA or other detection methods for identifying malware families that are within the realm of the possible for the ConfigExtractor
Additional context
I submitted a test file which is a basic PE NSIS installer with three identical PE files in its archive. Using a MicroK8s appliance deployment, all of the processing was nearly instantaneous, but then processing took a total of about 10 minutes while 16 files including file fragments were queued in the ConfigExtractor. The ConfExt processing took the rest of the remaining ~9 minutes until the processing was fully complete.
The text was updated successfully, but these errors were encountered: