Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't copy if should_copy is false #251

Closed
wants to merge 1 commit into from

Conversation

jumerckx
Copy link

@jumerckx jumerckx commented Oct 7, 2024

No description provided.

@gkronber
Copy link
Collaborator

gkronber commented Oct 7, 2024

Copying the nodes for the parent list is on purpose. (see #239)

@jumerckx
Copy link
Author

jumerckx commented Oct 7, 2024

Ah sorry, should've checked the blame.
My use-case is that I'm storing additional data linked to enodes outside of the egraph. This copy makes it such that the enodes stored in an eclass don't have the same objectid as the keys of memo.
I'm not yet seeing what problems this can cause. Alternatively, I should probably stop using an IdDict{VecExpr, T} change how I store this external data.

@gkronber
Copy link
Collaborator

gkronber commented Oct 8, 2024

Storing enodes outside of the egraph is problematic as enodes are canonicalized (mutated) after each rebuilding phase and equal enodes after canonicalization are deleted from the internal datastructures.

Probably you can change your implementation to use semantic analysis values (for eclasses)?

@jumerckx
Copy link
Author

jumerckx commented Oct 9, 2024

Storing enodes outside of the egraph is problematic as enodes are canonicalized (mutated) after each rebuilding phase and equal enodes after canonicalization are deleted from the internal datastructures.

In my case that's okay since canonicalized enodes are supposed to have the same external data as well.

Probably you can change your implementation to use semantic analysis values (for eclasses)?

The reason I didn't go this route from the start is because I also have enodes that are never added to the egraph for which I also want to store this data. But on second thought, it makes more sense to use an eclass analysis and handle these external enodes separately. Thanks for the advice!

I still don't really get what can go wrong with the egraph invariants when the copy is omitted, but feel free to close this pr if this is something that definitely shouldn't happen.

@0x0f0f0f
Copy link
Member

@jumerckx

I still don't really get what can go wrong with the egraph invariants when the copy is omitted, but feel free to close this pr if this is something that definitely shouldn't happen.

discussion was in #239 :)

@0x0f0f0f 0x0f0f0f closed this Oct 10, 2024
@jumerckx
Copy link
Author

@0x0f0f0f

discussion was in #239 :)

In that discussion it is said that enodes in memo and in nodes need to be different because in nodes they can change by canonicalization. However, every access to memo in the codebase is preceded by a call to canonicalize so I believe there's no real issue there? The keys in memo will simply also be canonicalized.

@gkronber
Copy link
Collaborator

@jumerckx, you might be right that the copy is not necessary. Could you please check whether the unit tests pass with check_memo and check_analysis set to true in SaturationParams?

The code is a bit tricky because canonicalization of nodes that contained in memo may have the effect that haskey(g.memo, n) is false (because of new hash values) even though memo still contains the key.

@jumerckx
Copy link
Author

Could you please check whether the unit tests pass with check_memo and check_analysis set to true in SaturationParams?

The tests pass with these flags set to true.

However, I now understand the issue, dictionary keys shouldn't get modified. So this pr would indeed introduce a bug 😅. I believe it might be possible to have a dict-like data structure specifically for vecexprs that doesn't include the eclasses in its hash and instead verifies these on lookup.
But the copy here is clearly much more pragmatic and since I've been able to solve my problem much more nicely with eclass Analysis I don't see a reason to further pursue this.

Thanks a lot for the feedback, learned a lot about Metatheory's internals!

@jumerckx jumerckx deleted the patch-1 branch October 11, 2024 11:12
@gkronber
Copy link
Collaborator

It's a bit of a mess. memo is used as a deduplication mechanism for enodes. The pseudo-code in the egg paper removes enodes from memo before canonicalization and then adds the canonical enodes again. In the current egg implementation, however, they seem to just ignore this and mutate memo keys.

I'm currently working on this again in #253 , trying to reduce the number of enode copying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants