Add relocation `R_RISCV_RELAX_NORVC` for linker relaxation. #116

Nelson1225 · 2019-09-23T07:35:59Z

Dear all,

According to this discussion:
riscv-collab/riscv-gnu-toolchain#445

I think this can be divided into two different problems:

Current linker may relax instruction sequence to the compressed one for the norvc code.
Alignment issue when both rvc and norvc code present.

For now, GNU ld checks the ELF e_flag of each input object, and then decide whether or not to do the rvc relaxations. Consider the following case:

.option rvc # Enable rvc
call Label # The first call
...
.option norvc # Disable rvc
call Label # The second call
...

The ELF e_flag's EF_RISCV_RVC of the object file is set to true if the rvc is enabled somewhere, even if we disable the rvc in another place. Therefore, linker will convert the two call to 2-byte compressed jal, but the second c.jal isn't allowed since the .option norvc is set. Checking the ELF e_flag is a file level (or object level) method, so we need another checking method to solve the first problem when both rvc and norvc code present in the same object.

Add a new relocation R_RISCV_RELAX_NORVC seemed the only solution. The instruction sequence paired with R_RISCV_RELAX indicating that linker can relax this sequence to the shorter one, including the compressed instruction. The R_RISCV_RELAX_NORVC is same as R_RISCV_RELAX except that the sequence can't be relaxed to the compressed instruction.

Even if the first problem is resolved, there still have some cases cause the second alignment problem, so I think that is another issue. This change of elf-psabi try to solve the first problem, not the second one :)

Thanks
Best Regards
Nelson

kito-cheng · 2019-09-23T07:39:10Z

@asb @jim-wilson @palmer-dabbelt @aswaterman

jim-wilson · 2019-09-24T04:16:30Z

The general concept looks correct to me.

It mentions instruction pairs, but that is really more a property of R_RISCV_CALL than R_RISCV_RELAX. R_RISCV_RELAX only applies to the one previous relocation, and that relocation only applies to an instruction pair if it is R_RISCV_CALL. The other relocs only apply to one instruction. For instance, if we have a lui/addi sequence, then each instruction has its own reloc and each reloc has its own R_RISCV_RELAX (or R_RISCV_RELAX_NORVC).

marceg · 2019-09-24T05:00:22Z

Consider perhaps implementing something like Xtensa's property tables (see Xtensa port in BFD). They're an extra ELF section with {vma, size, flags} triplets that identify various properties of sequences of code or data. (The Xtensa equivalent of norvc is XTENSA_PROP_INSN_NO_DENSITY.)
Extremely useful for this sort of thing, including required alignment of various subblocks of a section, which allows the linker to relax correctly in the presence of .align directives. And a lot of the linker work is already done. Relocations, on the other hand, target a specific instruction or word/address.

jim-wilson · 2019-09-24T18:09:43Z

ARM has mapping symbols. Every time you switch from one type of output to another, gas emits a mapping symbol, and a mapping symbol can indicate that the following block is arm code, thumb code, or data. This is used by the disassembler. Arm and thumb are different overlapping ISAs so you always have to know which mode you are in. Also, ARM has constant pools that can appear in the middle of functions, which we don't want to disassemble as code. These aren't used by the linker though. The only bfd support is to make sure that strip doesn't remove the mapping symbols.

RISC-V doesn't have this problem as we don't have two different overlapping ISAs that can be linked together, and we don't have constant pools in the middle of functions. But maybe we will have some of these problems later.

The xtensa property tables do look interesting. This is a much more general solution than the ARM solution. But it looks like a lot of code both in the assembler and in bfd to make this work. This does not look like an easy approach. I don't see anything being done by the linker, other than the fact it will copy the sections around for us. There is still a lot of hard work being done in the xtensa bfd port to read the property tables and parse them and use them. With a property table entry created for each frag, this also looks like it will increase object code size and maybe executable size if we copy the info over to executables.

We have two assemblers/linkers (clang/lld and GNU as/ld). Property tables look like too much work to solve the simple problem we have right now.

kito-cheng · 2019-09-26T14:59:05Z

Forgot to ping @MaskRay @PkmX , our important friends on LLD side :P

marceg · 2019-09-26T16:57:56Z

Property tables add a bit to the ELF file, but they're never loaded in target memory so the size hasn't been an issue. Mapping symbols might be able to do the same thing (and probably take a little more space in the ELF file). Perhaps the most useful and most complicated benefit is the ability of the linker to shrink and expand sections when it applies relocations, instead of having to pad NOPs -- which requires handling alignment directives properly. It definitely helps code size and performance. However doing this correctly is a bit tricky, and sometimes unintuitive: for example, shrinking section A followed by an align directive then section B, might increase the distance between a branch and its target across A and B, causing it to go out of range. I haven't looked at the code lately, perhaps that's some of the complexity you were seeing (or perhaps it's more in ld than bfd); and indeed no need for constant pool handling code (not expecting a single function to span more than 4 GB :).

Is it currently possible to reliably identify what parts of a RISC-V ELF section are code vs data? In particular when .align directives are present in the code (whether for performance, cache alignment, or otherwise). The ability to know exactly what parts of the ELF file contains code vs data has been particularly helpful for profiling tools and code transformation tools.

PkmX · 2019-09-26T18:29:16Z

Well, the concept of R_RISCV_RELAX_NORVC seems sound to me.

@marceg Would you explain how property tables would help in those situations? RISC-V kind of workaround the cross-section relaxation with alignment issue by considering the maximum alignment when trying to determine if a relocation can be relaxed.

marceg · 2019-09-26T18:47:43Z

They identify .align directives within a section. My example of sections A and B is more accurately blocks A and B within the same section, as relative branches (conditional branches in particular, which have limited reach) don't generally cross section boundaries. It's not possible generally to grow or shrink an instruction or instruction sequence (such as for a call, given a relocation) without information about any .align directive following these instructions in the same section. Such directives are not very common, so if the linker currently shrinks/expands code based on RELAX_RVC, it may be lucky in that it usually isn't an issue, but it might not be correct. Mid-section .align directives happen a lot more in embedded code than Linux-ish code, I suspect. Or to align branch targets for performance (for those, there's a performance loss but no correctness issue).

Granted such embedded .align were common in Xtensa due to its requirement to 4-byte align PC-relative call targets. However, there were several other cases (cache-line alignment within a function; vector-table alignment, for example regularly-spaced sets of vectors with variable code in each, with relocations that affect their size; jump tables; others too far to remember).

jim-wilson · 2019-09-26T18:51:41Z

For RISC-V, we currently don't mix code and data in the same section, so there is no problem telling them apart. This would change if we ever implemented constant pools in the text section.

PkmX · 2019-09-26T19:10:25Z

The way RISC-V handles .align is that it emits alignment - 2 bytes of nop and the linker runs an extra pass after relaxation to remove those nops to the correct alignment. Since during the whole relaxation + alignment process offsets within a single section only decrease monotonically, those relocations cannot go out of range. Of course, this comes with a cost of some missed relaxation opportunities, since emitting alignment - 2 bytes of nop pretty much forces the linker to compute offsets using the worst case scenario.

marceg · 2019-09-26T21:06:26Z

Interesting. Effectively the object file is not aligned until the linker fixes it. (No easy direct use of .o's as observed in the wild.) I'm not sure how that plays with vector section sizing checks. I suppose you haven't seen a lot of .org in assembly code. (Yet :)

RISC-V's PC-relative accesses have a +/- 2 GB range (you pretty much have to use AUIPC), so literal pools can probably safely be after the whole text section, and thus always be in a separate section, assuming code never reaches 2 GB in size (so far, a fair assumption). Perhaps more likely are the weird things sometimes done in hand-written assembly, especially for the embedded/OS space.

Nelson1225 · 2019-09-30T04:43:32Z

Dear Jim, Marc, Kito and PkmX,

Thanks for your suggestion and reply :)

Currently, it seems that add a new relocation R_RISCV_RELAX_NORVC is the more direct and simple way to resolve the problem that linker may relax instruction sequence to the compressed one for the norvc code. Just like Jim said, we would use the more appropriate method if we ever implemented the constant pools in the text section.

I have rewritten the description of the new relocation R_RISCV_RELAX_NORVC. Please see the details in the newest commit.

Thanks again
Best Regards
Nelson

jrtc27 · 2019-10-21T09:27:56Z

riscv-elf.md

+expands to the following assembly and relocation:
+
+```
+   auipc ra, 0              # R_RISCV_CALL (symbol1), R_RISCV_RELAX (symbol1)


R_RISCV_RELAX has zero symbol and addend (printed as *ABS* by objdump), as all the information it needs is in the other relocation for the same offset. I notice however the existing text about R_RISCV_RELAX gets this wrong too.

Yes, you are right. How about use R_RISCV_RELAX only rather than attach a symbol after it?

For example:
auipc ra, 0 # R_RISCV_CALL (symbol1), R_RISCV_RELAX
jalr ra, ra, 0

riscv-elf.md

Nelson1225 · 2019-11-01T09:56:46Z

Dear jrtc27,

Sorry for the delayed response, and thank you so much for your review and suggestions :)
I have fixed the grammatical issues and modified some descriptions according to your suggestions.

Thanks again
Best Regards
Nelson

kito-cheng · 2019-12-04T12:59:55Z

@MaskRay let me know if you think it's LGTM to you :)

asb · 2022-08-01T20:22:03Z

I feel it would be worth finding some kind of resolution to this for 1.0.

Given the time constraints, the path to doing so would probably take the form of a note for implementers in the linker relaxation section noting the potential unresolvable interaction of alignment directives and options to disable RVC support for regions of code. @jrtc27 @kito-cheng would do you think?

Nelson1225 · 2022-08-02T02:16:52Z

BTW, if we could generate the mapping symbols with ISA, then we probably don't need to add RELAX_NORVC relocation anymore, #196. But the disadvantage is that we need to check the symbol table before relaxing, that will take extra link times.

MaskRay · 2022-08-02T04:21:07Z

Thanks for the explanation. Adding R_RISCV_RELAX_NORVC makes sense to me.

kito-cheng · 2022-08-02T13:12:46Z

@Nelson1225 could you rebase that and sending PoC patch to mailing list?

MaskRay · 2022-08-09T18:23:41Z

Ping for rebase :)

jrtc27 · 2022-08-09T18:35:03Z

My concern is this is specific to RVC and the same issue will arise when dealing with other extensions in future whose instructions the linker might want to use in general. Given R_RISCV_RELAX currently has no symbol associated with it, just the type, I would propose using dummy symbols for the ISA string, and would suggest using the same encoding for them as the mapping symbols (then you can reuse the same symbols, saving space and making the implementation easy, and thinking forward to things like CHERI-RISC-V the mapping symbols will need to include the encoding mode (we'd do $c rather than $x to mirror what Arm did for Morello) as the encoding mode is an orthogonal concept to the list of extensions so this ensures that information is still available. A symbol index of 0 would continue to mean what it does today. Using the symbol for this also has the advantage of backwards-compatibility, since I believe both binutils and LLD totally ignore the symbol for R_RISCV_RELAX, rather than a new relocation which would be a hard error unless you have a new enough linker.

kito-cheng · 2022-08-10T02:28:28Z

TL;DR: I think @jrtc27's proposal is good idea and I support that.

@Nelson1225 and me has some offline discussion about the possibility of using mapping symbol instead of adding R_RISCV_RELAX_NORVC to resolve this issue, but we thought that might complicated the linker implementation, because linker need to scan the R_RISCV_RELAX located in which region, but I think @jrtc27 just give a good idea to resolve this issue! let R_RISCV_RELAX to bind with a dummy symbol sounds simple and more scalable solution for future, we don't need any new relocation R_RISCV_RELAX_NORVYYY for future extensions.

Edit: Remove sentence refer to Zc*, I should have double check with that before posting.

Nelson1225 · 2022-08-23T08:00:09Z

I agree with @jrtc27, refer the mapping symbol to R_RISCV_RELAX should work, and also resolve the linker relax time problem which mentioned by @kito-cheng above. I should be free that can try to make prototype PoC patches (including the mapping symbols with ISA string) recently for this in binutils.

Nelson1225 · 2022-09-30T09:29:27Z

[PATCH 1/2] RISC-V: Output mapping symbols with ISA string.
https://sourceware.org/pipermail/binutils/2022-September/123192.html

[PATCH 2/2] RISC-V: Refer mapping symbol to R_RISCV_RELAX for rvc relaxations.
https://sourceware.org/pipermail/binutils/2022-September/123191.html

Here are Jessica's proposal, work as expected, pass regressions of riscv-gnu-toolchain.

RISC-V Psabi pr116, riscv-non-isa/riscv-elf-psabi-doc#116 bfd/ * elfnn-riscv.c (_bfd_riscv_enable_rvc): New function to check the mapping symbol with architecture string in the R_RISCV_RELAX, to see if rvc is enabled or not. If we don't find any mapping symbols, then just check the elf header flag as usual. (_bfd_riscv_relax_call): Updated since above change. (_bfd_riscv_relax_lui): Likewise. gas/ * config/tc-riscv.c (append_insn): Store $x+arch in tc_fix_data if exsit. (md_apply_fix): Refer the $x+arch from tc_fix_data into R_RISCV_RELAX, these only for the relocations used to do rvc relaxation. Besides, set addend of R_RISCV_RELAX (fx_offset) to zero. * config/tc-riscv.h (TC_FIX_TYPE): Defined to store $x+arch in each fixup only for risc-v target. (struct riscv_fixup_type): Likewise. (TC_INIT_FIX_DATA): Likewise. * testsuite/gas/riscv/mapping-relax.d: New testcase. * testsuite/gas/riscv/mapping-relax.s: Likewise. ld/ * testsuite/ld-riscv-elf/c-relax.d: New testcase for specific rvc relaxation. * testsuite/ld-riscv-elf/c-relax.s: Likewise. * testsuite/ld-riscv-elf/ld-riscv-elf.exp: Updated.

kito-cheng · 2023-07-19T13:55:02Z

@Nelson1225 could you create another PR to contain @jrtc27's proposal? I think you already has PoC for a while

jrtc27 reviewed Oct 21, 2019

View reviewed changes

Add relocation R_RISCV_RELAX_NORVC for linker relaxation.

d55f45d

Nelson1225 mentioned this pull request Dec 2, 2019

Encoding of HINT instructions riscv-collab/riscv-gnu-toolchain#552

Closed

jim-wilson mentioned this pull request Mar 4, 2021

relaxation alignment issues when both rvc and norvc code present riscv-collab/riscv-gnu-toolchain#445

Open

nick-knight mentioned this pull request Apr 11, 2023

Linker relaxation ignores .option norvc riscv-non-isa/riscv-asm-manual#84

Open

kito-cheng mentioned this pull request Jul 19, 2023

Add R_RISCV_ALIGN_DOWN #392

Open

Nelson1225 closed this by deleting the head repository Jul 20, 2023

Nelson1225 mentioned this pull request Jul 21, 2023

Refer mapping symbol into R_RISCV_RELAX for rvc/norvc relaxations. #393

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add relocation `R_RISCV_RELAX_NORVC` for linker relaxation. #116

Add relocation `R_RISCV_RELAX_NORVC` for linker relaxation. #116

Nelson1225 commented Sep 23, 2019 •

edited

Loading

kito-cheng commented Sep 23, 2019

jim-wilson commented Sep 24, 2019

marceg commented Sep 24, 2019

jim-wilson commented Sep 24, 2019

kito-cheng commented Sep 26, 2019

marceg commented Sep 26, 2019

PkmX commented Sep 26, 2019

marceg commented Sep 26, 2019

jim-wilson commented Sep 26, 2019

PkmX commented Sep 26, 2019

marceg commented Sep 26, 2019

Nelson1225 commented Sep 30, 2019

jrtc27 Oct 21, 2019

Nelson1225 Nov 1, 2019

Nelson1225 commented Nov 1, 2019

kito-cheng commented Dec 4, 2019

asb commented Aug 1, 2022

Nelson1225 commented Aug 2, 2022

MaskRay commented Aug 2, 2022

kito-cheng commented Aug 2, 2022

MaskRay commented Aug 9, 2022

jrtc27 commented Aug 9, 2022

kito-cheng commented Aug 10, 2022 •

edited

Loading

Nelson1225 commented Aug 23, 2022 •

edited

Loading

Nelson1225 commented Sep 30, 2022

kito-cheng commented Jul 19, 2023

Add relocation R_RISCV_RELAX_NORVC for linker relaxation. #116

Add relocation R_RISCV_RELAX_NORVC for linker relaxation. #116

Conversation

Nelson1225 commented Sep 23, 2019 • edited Loading

kito-cheng commented Sep 23, 2019

jim-wilson commented Sep 24, 2019

marceg commented Sep 24, 2019

jim-wilson commented Sep 24, 2019

kito-cheng commented Sep 26, 2019

marceg commented Sep 26, 2019

PkmX commented Sep 26, 2019

marceg commented Sep 26, 2019

jim-wilson commented Sep 26, 2019

PkmX commented Sep 26, 2019

marceg commented Sep 26, 2019

Nelson1225 commented Sep 30, 2019

jrtc27 Oct 21, 2019

Choose a reason for hiding this comment

Nelson1225 Nov 1, 2019

Choose a reason for hiding this comment

Nelson1225 commented Nov 1, 2019

kito-cheng commented Dec 4, 2019

asb commented Aug 1, 2022

Nelson1225 commented Aug 2, 2022

MaskRay commented Aug 2, 2022

kito-cheng commented Aug 2, 2022

MaskRay commented Aug 9, 2022

jrtc27 commented Aug 9, 2022

kito-cheng commented Aug 10, 2022 • edited Loading

Nelson1225 commented Aug 23, 2022 • edited Loading

Nelson1225 commented Sep 30, 2022

kito-cheng commented Jul 19, 2023

Add relocation `R_RISCV_RELAX_NORVC` for linker relaxation. #116

Add relocation `R_RISCV_RELAX_NORVC` for linker relaxation. #116

Nelson1225 commented Sep 23, 2019 •

edited

Loading

kito-cheng commented Aug 10, 2022 •

edited

Loading

Nelson1225 commented Aug 23, 2022 •

edited

Loading