-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for ASAR Archives #210
Comments
From identify's defaults: {"al_type": "archive/tar", "regex": r"^(GNU|POSIX) tar archive"},
{"al_type": "archive/ar", "regex": r"ar archive"},
{"al_type": "archive/vhd", "regex": r"^Microsoft Disk Image"}, The file's magic is "Electron ASAR archive, header length: 558428 bytes". That regex definitely needs be anchored or deleted. I made a PR to only look for Regarding 7z's error, do you know in more detail how the asar archive is created? From their github page they claim it Regarding the wrong identification when your URL service is downloading that file, yes, you are totally right and the service does influence the identification if it's running as a privilege service. I assume it is. When a privileged service extracts a file and reuploads it, it does write directly to the file index. If you are unprivileged, you should go through service-server, which should already be up-to-date, so it would be a different problem. When you resubmit the file (I assume, by hash) it goes through the core, redo the identification like it would do with a new file, and update the fileinfo in the file index with the new type. If you rebuild you service using the latest base, you should have it identify it correctly on the first time. On a side note, my dev computer is using libmagic 5.39 while we are using libmagic 5.44 in our containers, and I do have some asar archive identify as |
Hey, thanks for the detailed analysis and fixing the identification! First, I've uploaded the second file to VT. Both come from an info stealer which tried to replace cryptocurrency wallet app with them. So, re: end goal - I just wanted to find out, if I can easily see, what the real actions are. But you're right, 2k files doesn't sound reasonable to extract. Unfortunately, I don't have deeper knowledge about ASAR format (yet). I think the best option currently is to stay with the identification update only. When I find time, I'll take a look at the format and craft a custom service for the extraction. Most probably with the approach I use for bundled Python executables - trying to estimate where the interesting code should usually be, and leaving full extraction optional. Thanks for the clarification about identification. This is indeed the case, the #167 is still hitting my setup (although I have to check again, I've recently fixed some networking issues (it's always DNS)). And indeed the libmagic was out-dated... But I have already rebuilt the service for AL 4.5.0... But my configuration had an overridden container image with hard-coded older version 😂 |
FYI: I've created a simple service to extract ASAR. So far, the default filtering just omits |
Someone from the community has tried to add it to JsJaws (CybercentreCanada/assemblyline-service-jsjaws/pull/726) but we have given no follow up on it (and a few things should be improved before merging). From my understanding, it would give about the same result by only checking to extract when isfiles is True, since node_modules is always a folder? |
I think so, although I don't know if node_modules is the only possible directory in the archive. I think the format doesn't exclude additional directories 🤔 I also give a possibility to extract everything (likely not useful 😆) and I have in head another option to put a regex/key to extract selected other files. I don't want to exclude exporting node modules completely, as I didn't find a way to verify the node packages weren't tampered |
Is your feature request related to a problem? Please describe.
I came across ASAR archives, which are Electron-app archives https://www.electronjs.org/docs/latest/tutorial/asar-archives It looks like they are currently only partially supported.
Describe the solution you'd like
I would like a stable support for ASAR in Extractor & clear file identification.
Describe alternatives you've considered
A separated service could be created, although it really fits perfectly into Extractor. Trouble is, that I don't see good non-JS extractors so far.
Additional context
I came across two files, and AssemblyLine behaves a little strange for them.
0c1ddd33e630f4ac684880f0e673dfa84919272494c11da0f1ec05fb4f919ce8
This file was once identified as
document/email
, and in re-submit a few minutes later asarchive/ar
. The second time Extractor tried to extract data, but failed at allabe19b0964daf24cd82c6db59212fd7a61c4c8335dd4a32b8e55c7c05c17220d
This file was once identified as
code/html
, and then after resubmit twice asarchive/ar
. On one re-submit, Extractor failed at all with the "pre-empted" error, on the second try some files were extracted, although 7zip reported some errors (I didn't find the exact error in logs).As I understand, AR is not the fully correct identification, but let the Extractor try. Wrong identifications were when the file was downloaded from the URL (both cases using my service - I had to set a specific User-Agent; can the service influence the identification?).
The text was updated successfully, but these errors were encountered: