Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM verification #356

Open
wants to merge 38 commits into
base: main
Choose a base branch
from
Open

LLVM verification #356

wants to merge 38 commits into from

Conversation

AFOliveira
Copy link
Collaborator

Following up on #258.

I've been doing work on what @lenary proposed as a first approach, I think this is an ok first mock-up. Still a WIP since many instructions still have bugs, but if you have any comments or recommendations, please LMK :).

I did not add LLVM as a submodule yet because it may be easier licensing wise to just point at it in the script? Usage is python3 riscv_parser.py <tablegen_json_file> <arch_inst_directory>"). I've also used jq to enhance readibility on the output of llvm-tblgen -I llvm/include -I llvm/lib/Target/RISCV llvm/lib/Target/RISCV/RISCV.td --dump-json -o <path-to-json-output>.

Copy link
Collaborator

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good start, and is heading in the right direction. Some immediate comments before I look again in the next few days.

Is there anything specific you want feedback on?

Which instructions are you running into issues with so far?

Another overall thought I have is: if this is a test/validation suite, could this be structured using pytest? For instance, you could parse the YAML and the JSON to match up instruction descriptions, and use that to create parameterized fixtures (one instance per matched description) - the advantage of this is that you get all of the niceness of pytest's asserts, and pytest's test suite reports, without having to reimplement all the testcase management. This also makes it easier to split the test code that checks the encoding matches from tests that check the assembly strings match (for example, but we can think of others).

ext/auto-inst/parsing.py Outdated Show resolved Hide resolved
ext/auto-inst/parsing.py Outdated Show resolved Hide resolved
ext/auto-inst/parsing.py Outdated Show resolved Hide resolved
ext/auto-inst/parsing.py Outdated Show resolved Hide resolved
@AFOliveira
Copy link
Collaborator Author

I think this is a good start, and is heading in the right direction. Some immediate comments before I look again in the next few days.

Thanks for your feedback!

Is there anything specific you want feedback on?

Not yet, the point was just some general considerations about the initial approach.

Which instructions are you running into issues with so far?

I still didnt find a pattern, but I'll try to fix this soon and if I run into any issue I'm not able to solve by myself, I'll try to bring it up, thanks!

Another overall thought I have is: if this is a test/validation suite, could this be structured using pytest? For instance, you could parse the YAML and the JSON to match up instruction descriptions, and use that to create parameterized fixtures (one instance per matched description) - the advantage of this is that you get all of the niceness of pytest's asserts, and pytest's test suite reports, without having to reimplement all the testcase management. This also makes it easier to split the test code that checks the encoding matches from tests that check the assembly strings match (for example, but we can think of others).

I'll take a look into pytest and see how to port this, thanks for the suggestion!

@AFOliveira
Copy link
Collaborator Author

AFOliveira commented Dec 19, 2024

current known bugs are:

  • Can't parse yaml complex imm encodings (i.e. 31|7|30-25|11-8)
  • Some instructions need to be fixed in the UDB, since this already caught some errors.
  • Need to split pytest into different unit tests rather than one big test.
  • Use ASMString from JSON.

I'm working on both now.

@lenary
Copy link
Collaborator

lenary commented Dec 19, 2024

Somewhere, a __pycache__ and *.pyc is missing from a gitignore. :)

instr_name = instr_name.lower().strip()

# Search through all entries in json_data
for key, value in json_data.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the !instanceof metadata to only search through things with an encoding, I pointed this out in the comment about the initial approach. This will save you iterating through aliases (and other data like isel patterns)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see !instanceof follows this structure, but I'm finding it hard to understand how to use it for the parsing purposes, can you please further explain?

{
  "!instanceof": {
    "AES_1Arg_Intrinsic": [
      "int_arm_neon_aesimc",
      "int_arm_neon_aesmc"
    ],
    "AES_2Arg_Intrinsic": [
      "int_arm_neon_aesd",
      "int_arm_neon_aese"
    ],
    "ALUW_rr": [
      "ADDW",
      "ADD_UW",
      "DIVUW",
      "DIVW",
      "MULW",
      "PACKW",
      "REMUW",
      "REMW",
      "ROLW",
      "RORW",
      "SH1ADD_UW",
      "SH2ADD_UW",
      "SH3ADD_UW",
      "SLLW",
      "SRAW",
      "SRLW",
      "SUBW"
    ],
    "ALU_ri": [
      "ADDI",
      "ANDI",
      "ORI",
      "SLTI",
      "SLTIU",
      "XORI"
    ],

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, let's assume you parsed the whole json object, into a python variable called json.

json["!instanceof"] is a map, from tablegen classes, to a list of definition names that are instances of that class (or its sub-classes). These definition names appear in the top-level json.

You would use code like the following:

for def_name in json["!instanceof"]["RVInstCommon"]:
  def_data = json[def_name]
  ...

This saves you having to look at all the tablegen data that is not an instruction (so an alias or a pattern or CSR or something).

Note you'll still have to look at isPseudo and isCodeGenOnly and potentially exclude items where one or both of those is true.

@AFOliveira
Copy link
Collaborator Author

Unit testing is now working.

TODO:
Solve complex immediates
Add the optimization @lenary proposed on code structure
Add this to the validation process

This was referenced Dec 23, 2024
Signed-off-by: Afonso Oliveira <[email protected]>
@AFOliveira
Copy link
Collaborator Author

Can I insert LLVM as a submodule or would this be another licensing problem?

@AFOliveira
Copy link
Collaborator Author

AFOliveira commented Dec 23, 2024

This is functional. Things missing is how to address LLVM and adding it the test to Rakefile. I will also see how I can optimize code, e.g. what @lenary proposed on one review.

@lenary
Copy link
Collaborator

lenary commented Dec 23, 2024

Can I insert LLVM as a submodule or would this be another licensing problem?

I highly suggest using environment variables to point to LLVM, and skipping these tests if those environment variables are not set.

Signed-off-by: Afonso Oliveira <[email protected]>
Signed-off-by: Afonso Oliveira <[email protected]>
Signed-off-by: Afonso Oliveira <[email protected]>
Signed-off-by: Afonso Oliveira <[email protected]>
@AFOliveira
Copy link
Collaborator Author

@dhower-qc as on out last call was talked, I added LLVM to CI with caching so it doesn't always have to build the table, For local runs, I still kept it optional, it only runs if the user have the file present, is it ok this way?

Signed-off-by: Afonso Oliveira <[email protected]>
@AFOliveira AFOliveira force-pushed the AFOliveira/LLVM branch 6 times, most recently from f2d419f to 40af15d Compare January 20, 2025 11:40
Signed-off-by: Afonso Oliveira <[email protected]>
@AFOliveira AFOliveira force-pushed the AFOliveira/LLVM branch 6 times, most recently from 6ca01a7 to e2a5b73 Compare January 20, 2025 13:13
Signed-off-by: Afonso Oliveira <[email protected]>
@AFOliveira
Copy link
Collaborator Author

Sorry for the mess with force pushes, but I had some trouble finding what I was messing up with GH actions caching. However, I believe this PR is ready for merge after #428 is solved and after @dhower-qc and @lenary reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants