Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After modifying PostgreSQLLexer.g4 and PostgreSQLParser.g4 to include LineComment and BlockComment in the commentstmt rule and removing -> channel(HIDDEN), why is the enterCommentstmt method in PostgreSQLParserBaseListener.java still not being executed? #4376

Open
drakshayanin opened this issue Jan 8, 2025 · 5 comments

Comments

@drakshayanin
Copy link

I am working with PostgreSQLLexer.g4 and PostgreSQLParser.g4. I have extracted the support files from those files, including PostgreSQLParserBaseListener.java. In this file, I have the enterCommentstmt method, and I have overridden it, but the method is not being executed.

@OverRide
public void enterCommentstmt(PostgreSQLParser.CommentstmtContext ctx) {}
After reviewing the Lexer file, I found the following definitions:

LineComment: '--' ~ [\r\n]* -> channel(HIDDEN);

BlockComment:
('/' ('/' BlockComment | ~ [/] | '/'+ ~ [/] | ''+ ~ [/])* '' '*/') -> channel(HIDDEN);
In PostgreSQLLexer.g4, I have removed the -> channel(HIDDEN) as follows:

LineComment: '--' ~ [\r\n]* ;

BlockComment:
('/' ('/' BlockComment | ~ [/] | '/'+ ~ [/] | ''+ ~ [/])* '' '*/');
In PostgreSQLParser.g4, I added LineComment and BlockComment to the commentstmt rule as shown below:

commentstmt
: LineComment
| BlockComment;
However, after making these changes, the enterCommentstmt method is still not being executed. How should I proceed??

@kaby76
Copy link
Contributor

kaby76 commented Jan 8, 2025

You don't give the input. It's impossible to answer. Likely, your input does not parse, but we can't do anything without the input to "see" the parse tree.

@drakshayanin
Copy link
Author

Input.sql:

-- Sample SELECT statement
SELECT id, name, salary
FROM employees
WHERE salary > 50000;

-- Sample INSERT statement
INSERT INTO employees (id, name, salary)
VALUES (1, 'John Doe', 55000);

@kaby76
Copy link
Contributor

kaby76 commented Jan 8, 2025

The parse fails on line 2. The parse tree is incomplete.

I would recommend that you introduce a "comment" mode in the lexer. There is no way your changes can work.

Also, backing up a bit, you are trying to fit a square peg in a round hole. Why are you trying to extract comments through Antlr visitors or listeners? The comments are on the token stream as HIDDEN.

@drakshayanin
Copy link
Author

I need the code chunk so that the comments can also be read as data. I have modified the lexer file like this, but when I try to generate the supported files, they are not being generated.

COMMENT_MODE: // A custom lexer mode for comments
{ // Switch to the COMMENT_MODE for comment handling
'--' ~[\r\n]* -> skip; // line comment (skip)
'/' .? '*/' -> skip; // block comment (skip)
};

LineComment: '--' ~ [\r\n]* -> pushMode(COMMENT_MODE));

BlockComment:
('/' ('/' BlockComment | ~ [/] | '/'+ ~ [/] | ''+ ~ [/])* '' '*/') -> pushMode(COMMENT_MODE)
;

@kaby76
Copy link
Contributor

kaby76 commented Jan 10, 2025

It doesn't work because the input does not parse. (You should print out the parse tree, parse result, and tokens for your input.) You could fix this by inserting a semi-colon after the comment in the lexer mode to make the comment really look like a "statement", or redo the grammar even more to allow "comment statements" to not require a following semi-colon. But I would not do any of this. Antlr parse trees don't contain intertoken or "hidden" tokens. And changing the grammar to make comments work in the parse is going to introduce all sorts of problems.

Here's what you should do:

  • Leave the grammar unmodified.
  • Write a visitor or listener for the parse tree node you want to examine, and check the token stream directly for comments in the token stream corresponding to the interval in the parse tree.

For example, if you override the visitor or listener for stmtmulti, you can get the token index for the left-most leave node for each stmt or the token index for each SEMI. You'll have to write a recursive function (or visitor) to go down the tree and get this token index. Then, use CommonTokenStream.getTokens()[index] to get at the comment token(s), checking the token type to make sure you get comments, not whitespace. Then, you can synch up the comment with the following statement. Or just use CommonTokenStream.getTokens()[index] and look for LineComment and BlockComment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants