-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version 0.5 causes switch/case symbols to show up #8
Comments
Hi, Given all the bugs I've fixed over the past three months (see #6 for a partial history), the COFF exporter wasn't fit for purpose until version 0.5.0 ; I would not trust any results coming from version 0.4.0. This is not a use-case I have foreseen and I'm not sure it can be fully accommodated. This extension was developed with the idea that it's not only possible to delink executables, but that the result could also be interchangeable with the original object file as far as the linker is concerned. This means that the contents of the exported object file can differ, because there are multiple equivalent ways to encode its information. For example, comparing Link-time optimizations and relaxations are another source of possible discrepancies. One particularly tricky example on MIPS hoists instructions inside branch delay slots, requiring some migraine-inducing workarounds if those instructions happened to have HI16 relocations. I haven't seen anything remotely as drastic on x86, but I'm vaguely aware of peephole optimizations that can eliminate some indirections through the PLT or GOT at link time by rewriting instructions and relocation types. Going back to your initial issue, if the |
This is fine. Objdiff can be configured to ignore relocation differences. This allows it to paper over slight differences in relocation encoding that ultimately result in the same value (at the expense of potentially not seeing actual differences "hidden" by a relocation, but that's a tradeoff we can afford to take). And anyways the diffing doesn't need to be "perfect" - we're using it as a tool to guide our re-implementation towards having the correct assembly. We're currently using
Unfortunately, I tried both with and without "Give dynamic symbols external visibility", and it does not get rid of the "switchD" symbols. When it's enabled, I additionally get symbols for
That's be great 👀 |
This checkbox gives dynamic symbols external visibility instead of local. I assumed that the I've looked into Ghidra's symbol handling and dynamic symbols are defined as not having a I didn't really needed to deal with that problem before and it turns out Ghidra doesn't have a concept of symbol visibility I can piggyback. I think the best we can do here is to add an option where symbol names matching an user-supplied regex pattern will be considered local, on top of checking for dynamic symbols. Assuming symbol visibility is a solved problem, next step is the ability to generate section-relative relocations as stated above, so that local symbols can be trimmed from the exported object file's symbol table. That will require some refactorings first, mostly because currently relocations unapply themselves based on a predetermined named symbol and addend, so there's no mechanism to handle unnamed section-relative relocations with a different addend.
That's probably because version 0.4.0 trimmed local symbols that were not used by any relocations. Combined with the trimming of PC-relative relocations whose source and destination addresses lie within the same contiguous selected address range, that might explain why they don't show up in that version (aside from the bugginess of the COFF exporter back then). The local symbol trimming was lost when I recently refactored and cleaned up the generic parts of symbol handling out of the exporters, so that symbol name mangling could be properly handled with the symbol name preference feature. The way the local symbol trimming was done was rather hacky and it wasn't that useful in practice. I'd rather not reintroduce it, which is why I'm preferring section-relative relocations to accommodate your use-case. TL;DR Implementing regex-based symbol visibility and section-relative relocations should eliminate the unwanted symbols from your exported object files. The first part should be straightforward to implement, the second part will require some refactorings to pull off. |
Rather than a regex-based solution, could we perhaps have an option to only export symbols in the executable section if they have a matching Function? This should get rid of the goto labels/switchcase, while leaving the symbols for all the functions and the data. It's obviously not a perfect heuristic (some programs might store data in the .text section/segment), but I feel like it should work decently enough for our use-cases. |
Hmm... We can check if a function contains the address of a symbol. If so, then the symbol can be made local unless it's at the entrypoint. That should get rid of all the labels within functions and I don't see a downside that would cause problems later on. As for the switch data labels, the problem remains. I can't use your suggested heuristic because of the sheer scope of this extension. It supports two object file formats, two ISAs and to my knowledge was tested with artifacts from three different platforms (Linux, Windows, PlayStation) spanning 25 years. That's a very delicate balancing act to pull off and it will only get worse as time goes on. From experience, taking shortcuts inside this code base never ends well, it's unusually exacting. That's why I'm looking for solutions which not only work for one use-case, but are correct for all use-cases. Unless there's a better proposal, an option for a regex that matches local symbols (by default set to |
I've pushed some commits on master to add options that should give these symbols static visibility. Can you check if this catches all the symbols you want gone? Next step will be to make their relocations section-relative and trim them from the symbol table, but it'll require some refactorings first to make it happen. |
Hey, thanks for the great extension!
I'm currently working on using this along with objdiff in my decompilation project th06 to enable a better diffing workflow.
I have something working great with delinker version 0.4. However, in version 0.5, the delinker now contains a bunch of extra symbols that add a ton of noise, and don't seem to be of great value. In particular
Delinker 0.4:
Delinker 0.5:
Those symbols seem to be inserted automatically by ghidra during switch/case analysis.
Would it make sense perhaps to detect them and not include them in the symbol table of the generated object file?
The text was updated successfully, but these errors were encountered: