Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix C++ port behavior divergence with reference implementation #136

Open
vityaman opened this issue Jul 28, 2024 · 4 comments
Open

Fix C++ port behavior divergence with reference implementation #136

vityaman opened this issue Jul 28, 2024 · 4 comments

Comments

@vityaman
Copy link
Contributor

Description

During rewriting test scenarios from TypeScript to C++ there was explored a behavior divergence in C++ port.

Also after adding GitHub Workflow for Java also failing test was detected.

Examples of that

Related issues and MRs

vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Jul 31, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Jul 31, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 1, 2024
mike-lischke pushed a commit that referenced this issue Aug 1, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 4, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 4, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 4, 2024
@vityaman
Copy link
Contributor Author

vityaman commented Aug 4, 2024

I found a bug.

When running tests using ctest (cd test && ctest) they passed, but running tests from a binary file ./expr/antlr4-c3-test-expr they fail. Then I explored that CTest runs all tests by sequentially running a binary file with a filter for an only test case, while running tests using a binary file leads to all test scenarios execution.

So, I decided that the problem was in some implicit dependency of test scenarios. I switched Expr MostSimpleSetup and TypicalSetup test cases and got test failure.

This is because of caching setsPerState[startState->stateNumber]. Ignoring the cache helped and made tests green.

@mike-lischke
Copy link
Owner

But disabling the cache will certainly make everything slower. The must be a difference in behavior of the cache in both TS and C++.

@vityaman
Copy link
Contributor Author

vityaman commented Aug 5, 2024

I did some more research. What I found.

The problem is that cache setsPerState is not idempotent. FollowSetsHolder contains the same FollowSetWithPath::path, FollowSetWithPath::intervals and combined, but different FollowSetWithPath::following.

How is that possible while procedures determineFollowSets -> collectFollowSets -> getFollowingTokens look quite isolated? Quite, but not entirely, because getFollowingTokens uses ignoredTokens in a such way that it will exclude an ignored token from following sequence.

In ExprTest::MostSimpleSetup and ExprTest::TypicalSetup test cases, we have different sets of CodeCompletionCore::ignoredTokens. As {ExprLexer::ID, ExprLexer::EQUAL} are not ignored at the first scenario, they are inside the following sequence. But in the second scenario we ignore them, and therefore we exclude them.

Based on this fact, I can say that current C++ port behavior is correct relatively to getFollowingTokens correctness, as it returns an empty following set, while TypeScript version returns non-empty only because of that setsPerState cache.

@mike-lischke, I propose to drop cache at CodeCompletionCore configuration update. It can be done by adding code to property setter or checking that configuration was not changed from the last collectCandidates (check hashCode and then equals).

If I could design a library differently, I would make everything as immutable as possible to avoid such bugs with caches and also improve performance by building data structures with higher level of specialization for a task. I don't see the advantage of having mutable ignoredTokens for example, as in the typical usage this field is not changed. For example, I prefer to instantiate this class as well as lexer, parser and other things once, and on a new input do reset on everything and rerun collectCandidates with different caretTokenIndex and context. But maybe I should grep over antlr-c3 usages over the GitHub and check how it is used by other people.

Also, I would like to say that this behavior of getFollowingTokensis strange. Because of ignoredTokens we can produce invalid following sequence, for example, {SET: [ID, NUMBER] } instead of {SET: [ID, ASSIGN, NUMBER]}. In suggestions system it is quite unexpected output. I think, it is more logical to return tokens until the first ignored one, for example {SET: [ID]}, or don't check for token ignorance at getFollowingTokens at all. Because I thought that ignoredTokens exists to exclude a token from that tokens: Map<Token, Following> keys.

@mike-lischke
Copy link
Owner

That will take some time for me to analyze and understand. Currently I have no bandwidth to dive deep into the c3 algorithm again.

vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 7, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 7, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 7, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 7, 2024
mike-lischke pushed a commit that referenced this issue Aug 8, 2024
mike-lischke pushed a commit that referenced this issue Aug 8, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 12, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 12, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 12, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 12, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 26, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 26, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 26, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 26, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 26, 2024
vityaman added a commit to vityaman/antlr4-c3 that referenced this issue Aug 26, 2024
mike-lischke pushed a commit that referenced this issue Aug 26, 2024
mike-lischke pushed a commit that referenced this issue Aug 26, 2024
mike-lischke pushed a commit that referenced this issue Aug 26, 2024
mike-lischke pushed a commit that referenced this issue Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants