Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config to lower level of heuristics EXTRACT.28 and EXTRACT.13 #245

Closed
kam193 opened this issue Aug 12, 2024 · 2 comments
Closed

Config to lower level of heuristics EXTRACT.28 and EXTRACT.13 #245

kam193 opened this issue Aug 12, 2024 · 2 comments
Assignees
Labels
accepted This issue was accepted, we will work on this at some point enhancement New feature or request service-extract

Comments

@kam193
Copy link

kam193 commented Aug 12, 2024

Is your feature request related to a problem? Please describe.
When analysing Python packages, I often came across false positives from those two heuristics:

  • EXTRACT.28 (Extractable _RDATA section found)
  • EXTRACT.13 (Single Executable Inside Archive File)

Apparently, it's not an uncommon situation for Python packages compiled for Windows to have executable parts in _RDATA, I also came across multiple DLLs triggering this heuristic (e.g. fe27c4c07c0cfbb2ee28c8409e5a8db89d86c6c2d76c6e3b79ab31979138b215).

The second isn't uncommon as well, and in addition - looks like it has some issues with properly handling binary files from which an executable was extracted. I've noticed it's sometimes triggered when an exe is extracted from another exe, or when it manages to decompile code from a PYC file (but not always):

obraz

(__decompiled_source.py comes from my service, Extractor didn't see it)

Describe the solution you'd like

  • A config option to make EXTRACT.28 and EXTRACT.13 informative.
  • Improvements to EXTRACT.13 to be triggered only when an executable is extracted from an archive, not another executable or decompiled.

Describe alternatives you've considered

  • Option to disable heuristics at all - but it would just remove information that could be useful.
  • Hardcoded lowering score of those heuristics - definitely not, it would break use cases apart from mine.
  • Generic option to override the default/maximum score of a heuristic by an administrator - this could be interesting and allow adjusting an AL instance to the local use case, but requires much more work.

Additional context
Extract service already has multiple options for adjusting some heuristics, but not those.

@kam193 kam193 added assess We still haven't decided if this will be worked on or not enhancement New feature or request labels Aug 12, 2024
@gdesmar gdesmar added service-extract accepted This issue was accepted, we will work on this at some point and removed assess We still haven't decided if this will be worked on or not labels Nov 6, 2024
@gdesmar
Copy link

gdesmar commented Nov 7, 2024

This is another issue that I missed a while ago.
A couple months ago we modified the score of EXTRACT.28 to reduce it to 200 points instead of 300. That should stop it from causing a suspicious result section.
Regarding EXTRACT.13 (and possibly EXTRACT.16), I just updated the Extract service to not flag any python script coming out of executables. I also added a new structure for false positive detection for file extension, just in case our identification for python script is not perfect... 😅
You should be able to test v4.5.0.stable48 and see if that solves your issues! Thanks for the report!

@kam193
Copy link
Author

kam193 commented Nov 8, 2024

Thank you! I think it should almost entirely solve my issue. In exchange, I've uploaded new misidentified files 😇 #284

@kam193 kam193 closed this as completed Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This issue was accepted, we will work on this at some point enhancement New feature or request service-extract
Projects
None yet
Development

No branches or pull requests

3 participants