Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --fdump-tree=FILE option #108

Draft
wants to merge 10 commits into
base: gcos4gnucobol-3.x
Choose a base branch
from

Conversation

lefessan
Copy link
Member

@lefessan lefessan commented Jul 14, 2023

Dump the AST/tree to a file, for debugging purpose.

A file dump_ast_gen.c is generated by an external tool directly from tree.h, it can then be edited manually if the types are modified, until the tool is run again later to clean it up.

@lefessan lefessan marked this pull request as draft July 14, 2023 23:13
@lefessan lefessan force-pushed the z-2023-07-12-print-tree branch from 48dae44 to e668b19 Compare July 14, 2023 23:14
@codecov-commenter
Copy link

codecov-commenter commented Jul 14, 2023

Codecov Report

Merging #108 (00faae3) into gcos4gnucobol-3.x (6b44051) will decrease coverage by 1.38%.
The diff coverage is 0.70%.

❗ Current head 00faae3 differs from pull request most recent head 41819d1. Consider uploading reports for the commit 41819d1 to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@                  Coverage Diff                  @@
##           gcos4gnucobol-3.x     #108      +/-   ##
=====================================================
- Coverage              65.39%   64.02%   -1.38%     
=====================================================
  Files                     32       33       +1     
  Lines                  58797    60063    +1266     
  Branches               15492    16176     +684     
=====================================================
+ Hits                   38449    38454       +5     
- Misses                 14362    15621    +1259     
- Partials                5986     5988       +2     
Impacted Files Coverage Δ
cobc/codegen.c 75.33% <ø> (ø)
cobc/dump_tree.c 0.00% <0.00%> (ø)
cobc/parser.y 68.55% <0.00%> (ø)
cobc/cobc.c 72.02% <6.25%> (-0.29%) ⬇️
cobc/flag.def 100.00% <100.00%> (ø)
cobc/tree.c 74.09% <100.00%> (+0.01%) ⬆️
cobc/typeck.c 64.81% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@lefessan lefessan force-pushed the z-2023-07-12-print-tree branch 3 times, most recently from 16ea769 to 4f413a2 Compare July 15, 2023 09:48
@GitMensch
Copy link
Collaborator

GitMensch commented Jul 15, 2023 via email

const char* string_of_cb_usage (enum cb_usage x)
{
switch (x){
case CB_USAGE_BINARY: return "CB_USAGE_BINARY"; /* 0 */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't those and similar be "stringified" using our existing macros? This would still need a switch-case, but if the source is a macro then it can be just defined as additional macro instead of the full switch-case, otherwise a macro can still be used to only end in CASE_STRING(some-enum).

Note: you don't need to include the numeric values (which possibly be changed later on in any case).

@lefessan lefessan force-pushed the z-2023-07-12-print-tree branch from 4f413a2 to 610905c Compare July 15, 2023 21:33
@lefessan
Copy link
Member Author

I changed the options to -fdump-tree and -fdump-tree-flags, it should be close enough gcc devs expectations.

I ran all the testsuite with both formats (using env variables to not modify scripts), and I could validate both OCaml and JSON files.

I am not sure that adding a test in the testsuite would be a good idea: (1) the output depends on the internal types in tree.h, and (2) even for small examples, the output is pretty big, a few thousands lines...

The OCaml format is actually standard syntax for basic types in the OCaml language (https://ocaml.org). Since I mostly always program in OCaml (except for GnuCOBOL :-) ), it is easier for me to have this format than JSON, as I could easily write tools to simplify/filter information in the AST.

For examples, JSON format (with flags +Ai):

{
   "type_": "cb_program",
   "address_": " 0x559ec2ee75f8",
   "common": {
      "tag": { "type_": "cb_tag", "value_": "PROGRAM" },
      "source_file": "test.cob"
   },
   "program_name": "DOUBLE-OPEN",
   "program_id": "DOUBLE__OPEN",
   "orig_program_id": "DOUBLE-OPEN",
   "word_table": { "type_": "TODO", "value_": "struct cb_word **" },
   "local_include": { "type_": "TODO", "value_": "struct local_filename*" },
   "entry_list": [
      {
         "type_": "cb_list",
         "address_": " 0x559ec2f45138",
         "purpose": {
            "type_": "cb_label",
            "address_": " 0x559ec2f44f78",
            "common": {
               "tag": { "type_": "cb_tag", "value_": "LABEL" },
               "source_file": "test.cob",
               "source_line": 39
            },
            "name": "DOUBLE__OPEN",
            "orig_name": "DOUBLE-OPEN",
            "xref": { "type_": "TODO", "value_": "struct cb_xref" },
...

whereas OCaml format looks like:

RECORD [
   "type_", STRING "cb_program";
   "address_", POINTER 0x55acb46095f8L;
   "common", RECORD [
      "tag", CONSTR ("cb_tag", STRING "PROGRAM");
      "source_file", STRING "test.cob";
   ];
   "program_name", STRING "DOUBLE-OPEN";
   "program_id", STRING "DOUBLE__OPEN";
   "orig_program_id", STRING "DOUBLE-OPEN";
   "word_table", CONSTR ("TODO", STRING "struct cb_word **");
   "local_include", CONSTR ("TODO", STRING "struct local_filename*");
   "entry_list", LIST [
      RECORD [
         "type_", STRING "cb_list";
         "address_", POINTER 0x55acb4667138L;
         "purpose", RECORD [
            "type_", STRING "cb_label";
            "address_", POINTER 0x55acb4666f78L;
            "common", RECORD [
               "tag", CONSTR ("cb_tag", STRING "LABEL");
               "source_file", STRING "test.cob";
               "source_line", INT( 39);
            ];
            "name", STRING "DOUBLE__OPEN";
            "orig_name", STRING "DOUBLE-OPEN";
            "xref", CONSTR ("TODO", STRING "struct cb_xref");
...

@lefessan lefessan force-pushed the z-2023-07-12-print-tree branch from 610905c to d3294aa Compare July 15, 2023 21:55
@lefessan lefessan marked this pull request as ready for review July 15, 2023 21:55
@lefessan
Copy link
Member Author

Note that there are still some TODOs in the code. There are many fields in these data structures, and it's not clear what should be printed or not.

Also, it might be interesting at some point to have "levels" of importance for fields, so that, for example, only important fields would be printed at level 1, then less important fields at level 2, and so on... However, such levels should probably be decided after experimenting with using these outputs for real debugging.

@lefessan lefessan force-pushed the z-2023-07-12-print-tree branch 2 times, most recently from 00faae3 to 41819d1 Compare July 15, 2023 22:04
@lefessan lefessan changed the title add --output-tree=FILE option add --fdump-tree=FILE option Jul 15, 2023
@lefessan lefessan force-pushed the z-2023-07-12-print-tree branch from 41819d1 to 266808b Compare July 16, 2023 09:22
@lefessan lefessan marked this pull request as draft January 31, 2024 08:34
@lefessan lefessan changed the title add --fdump-tree=FILE option WIP: add --fdump-tree=FILE option Jan 31, 2024
May be useful for debugging code generation.

Use the extension of FILE to choose the format of output:
* .ml : OCaml code format
* any other: JSON format (output validated by jsonlint-php)
@lefessan lefessan force-pushed the z-2023-07-12-print-tree branch from 266808b to f47a04a Compare October 22, 2024 15:12
Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the build_literal changes are part of a broken merge - you'd like to revert them then (at least with an extra cleanup commit)

To ensure that at least something is tested you definitely want to add some testcases, possibly in used_binaries.at

cobc/Makefile.am Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be dropped

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just put the ChangeLog of the branch outside of the ChangeLog, because it creates a conflict everytime I rebased it. I will keep it that way until the work is finished.

cobc/ChangeLog Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please one-time merge that file and everything is nice in it

cobc/ChangeLog Outdated
Comment on lines 4 to 13
* cobc.c, dump_tree.c: new options to dump the AST in text format:
--dump-tree=<file>, and --dump-tree-flags=<flags>. Format is
either OCaml (for files with .ml extension) or JSON. If file
ends with '/', then it is expected to be a directory and the
file will be generated with the program id as name. Flags are
'+/-' for enable/disable, 'c' for cb_common, 'l' for locations,
't' for types, 'p' for pointers, 'i' for indentation, 'n' for
newlines, 'A' for all infos, 'O' for OCaml format, 'J' for
JSON format, 'm' for message. Env variables COB_DUMP_TREE and
COB_DUMP_TREE_FLAGS can also be used to set these flags.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this misses the new files and their inclusion in the Makefile.am

cobc/Makefile.am Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this file is generated it may should not be included in VCS but added to git/svn-ignore (not sure yet)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is generated by a "non standard" tool, I am still wondering what to do. Adding a permanent dependency on the tool is probably a bad idea, I am more considering making this feature a configure-time option, disabled by default, and only available on-demand (with a specific workflow in the Github CI to check whether it is still working or not).

cobc/dump_tree.c Outdated Show resolved Hide resolved
cobc/Makefile.am Outdated
.PHONY: update-dump-tree

update-dump-tree:
ctypes2json -I $(top_srcdir)/cobc -I $(top_srcdir)/lib > $(top_srcdir)/cobc/dump_tree_gen.c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ctypes2json should likely get an entry in the HACKING file

@GitMensch
Copy link
Collaborator

GitMensch commented Oct 24, 2024 via email

@lefessan
Copy link
Member Author

That's strange, if you keep it inserted "down where it is" until there are possibly more changes, then there should be no conflict as no other commits as to the changelog down.

I see. For me, it has to be inserted at the point where it is merged, i.e. the ChangeLog before should correspond to what is in the code at the point where the work is done, and not when it was started, no ?

@GitMensch
Copy link
Collaborator

That depends. If most of the old work is unchanged and "old", then I personally would have one "old" and one "new" entry.
If it is totally new like this I'd likely move the Changelog entry up - but only directly before the actual merge (so no conflict in between).

Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bunch of issues from the last iteration solved and cleaned up correctly - looking forward for the test

cobc/Makefile.am Outdated Show resolved Hide resolved
cobc/cobc.c Outdated Show resolved Hide resolved
cobc/dump_tree.c Outdated Show resolved Hide resolved
cobc/dump_tree.c Outdated Show resolved Hide resolved
cobc/dump_tree.c Outdated Show resolved Hide resolved
cobc/dump_tree.c Outdated Show resolved Hide resolved
cobc/dump_tree.c Outdated
Comment on lines 479 to 480
case '+': sign = 1; break;
case '-': sign = 0; break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these flags duplicate the filename option and should therefore be dropped here and in the flag.def help

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to be able to enable/disable flags in the same definition, for example +z-i (add zero-value fields and disable indent)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, that's reasonable of course, improved docs and tests will help me to understand that

cobc/flag.def Outdated
Comment on lines 290 to 292
_(" -fdump-tree-flags=<flags> set flags used when dumping the parsed AST\n"
" '+/-' to enable/disable, 'l' for locations,\n"
" 't' for types, 'p' for pointers, 'i' for indent."))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this misses some of the flags

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most flags have disappeared, as it does not print to JSON/OCaml anymore, but to a human readable format.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you consider the "no JSON/OCaml" temporary or is the idea to leave that as-is? Just asking....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I thought that the AST of GnuCOBOL was "clean", in the sense that it was an exact representation of the COBOL code. However, it is not the case: the parser starts translating code to C (for example, replacing some nodes by direct calls to C functions), so the AST is actually half COBOL code, half translated code. So, the interest of sending this AST to another program (through OCaml or JSON) is less interesting. So, for now, I have given up on the idea of a correct JSON/OCaml encoding, focusing more on the ability to debug the compiler using a printed AST.

Yet, on the long term, I think it would be nice to move translations done in the parser to the codegen, and get a clean AST. At that point, it could be used by other programs and this output could be improved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree that this separation should be clearer; to do so we'd likely move most of the included C code to a new cb_tree like it currently is implemented, but into a new type of node that is only converted to C in codegen.
... but then - nearly all of the C parts should be FUNC_CALLs and most of those should be quite easily be separated and could be converted "on the fly", if wanted.

... but your human-readable AST dump may be ideal to show what is really in and consider adjustments that way.
I'm looking forward to some test cases showing what you consider to be bad in the AST. :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example of the generated output for a small file:

writefile.txt

I am not really sure it is interesting to create a test, as we don't really expect the format to be kept backward compatible. We may just want to make a test to verify that a file is generated, but not check the file format itself.

cobc/dump_tree.c Outdated Show resolved Hide resolved
cobc/tree.h Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all changes to this file but the definition of the new function can be checked in upstream at any time

@GitMensch
Copy link
Collaborator

Last set of errors from the CI:

In file included from ../../cobc/dump_ast.c:190:
../../cobc/dump_ast_gen.c: In function 'init_ptr_table':
../../cobc/dump_ast_gen.c:39:5: error: implicit declaration of function 'bzero' [-Wimplicit-function-declaration]
   39 |     bzero (ptr_table, ptr_table_size * sizeof(void*) );
mv -f .deps/replace.Tpo .deps/replace.Po
      |     ^~~~~
../../cobc/dump_ast_gen.c:39:5: warning: incompatible implicit declaration of built-in function 'bzero' [-Wbuiltin-declaration-mismatch]
../../cobc/dump_ast_gen.c: In function 'print_struct_cb_funcall':
../../cobc/dump_ast_gen.c:7826:40: warning: the comparison will always evaluate as 'true' for the address of 'argv' will never be NULL [-Waddress]
 7826 |     if( print_zero_fields || ptr->argv != NULL ){
      |                                        ^~
In file included from ../../cobc/dump_ast.c:37:
../../cobc/tree.h:1288:33: note: 'argv' declared here
 1288 |         cb_tree                 argv[CB_BUILD_FUNCALL_MAX];     /* Function arguments */
      |                                 ^~~~
../../cobc/dump_ast_gen.c: In function 'print_struct_cb_intrinsic':
../../cobc/dump_ast_gen.c:8420:58: warning: passing argument 2 of 'print_struct_cb_intrinsic_table_ptr' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
 8420 |        print_struct_cb_intrinsic_table_ptr (indent+2, ptr->intr_tab);
      |                                                       ~~~^~~~~~~~~~
../../cobc/dump_ast_gen.c:7995:89: note: expected 'struct cb_intrinsic_table *' but argument is of type 'const struct cb_intrinsic_table *'
 7995 | static void print_struct_cb_intrinsic_table_ptr (int indent, struct cb_intrinsic_table *ptr){
      |                                                              ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
../../cobc/dump_ast.c: In function 'cb_dump_ast_to_file':
../../cobc/dump_ast.c:240:21: warning: unused variable 'len' [-Wunused-variable]
  240 |                 int len = strlen (filename);
      |                     ^~~
../../cobc/dump_ast_gen.c: At top level:
../../cobc/dump_ast_gen.c:2509:13: warning: 'print_struct_cob_prof_procedure_ptr' defined but not used [-Wunused-function]
 2509 | static void print_struct_cob_prof_procedure_ptr (int indent, struct cob_prof_procedure *ptr){
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../cobc/dump_ast_gen.c:2090:13: warning: 'print_struct_cb_call_xref_ptr' defined but not used [-Wunused-function]
 2090 | static void print_struct_cb_call_xref_ptr (int indent, struct cb_call_xref *ptr){
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../cobc/dump_ast_gen.c:2036:13: warning: 'print_struct_handler_struct_ptr' defined but not used [-Wunused-function]
 2036 | static void print_struct_handler_struct_ptr (int indent, struct handler_struct *ptr){
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../cobc/dump_ast_gen.c:1258:13: warning: 'print_struct_cb_xref_ptr' defined but not used [-Wunused-function]
 1258 | static void print_struct_cb_xref_ptr (int indent, struct cb_xref *ptr){
      |             ^~~~~~~~~~~~~~~~~~~~~~~~
../../cobc/dump_ast.c:162:13: warning: 'print_int_array' defined but not used [-Wunused-function]
  162 | static void print_int_array (int indent, int n, int *x)
      |             ^~~~~~~~~~~~~~~
make[3]: *** [Makefile:639: dump_ast.o] Error 1

The MSVC builds all fail because the new dump_ast files need to be added to build_windows.

Comment on lines +98 to +100
<None Include="..\..\cobc\dump_ast_gen.c">
<Filter>Header Files</Filter>
</None>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused - that is an auto-generated source (like parser.c) and therefore should be moved there and ClCompiled as well, no [that applies to all of those files under build_windows]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand these files, I just grep for other similar files (warning.def and typeck.c) and modified the files accordinly...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we handle that as a header file? I guess we call static functions in here directly from dump_ast.gen.c?
If this is the case and there are not too much, then dropping the static attribute via sed and adding those "entry prototypes" to tree.h would work better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should just be renamed as .def to be consistent with other included files that are not actually include files.

Comment on lines +39 to +40
memset (ptr_table, 0, ptr_table_size * sizeof(void*) );
memset (num_table, 0, ptr_table_size * sizeof(int) );
Copy link
Collaborator

@GitMensch GitMensch Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a manual change after generation? In this case you possibly want to define bzero as macro instead to do that change.

I guess we cannot generate this file using the tool you do for C89 compat (the one failing test in CI)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new file is generated by a new version of the tool (https://github.com/OCamlPro/ctypes-gen), but I guess most devs will edit it by hand, and the tool will only be used from time to time to get a clean version.

@lefessan lefessan changed the title WIP: add --fdump-tree=FILE option Add --fdump-tree=FILE option Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants