Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make topological sort fully general. #1138

Merged
merged 3 commits into from
Feb 17, 2024
Merged

Make topological sort fully general. #1138

merged 3 commits into from
Feb 17, 2024

Conversation

porcuquine
Copy link
Contributor

This PR generalizes the topological sort used when computing the provenances. The previous code relied on limitations on possible DAG shape in the singly self-recursive queries implemented so far. These changes should be fully general and support any dependency structure recorded by the queries.

@porcuquine porcuquine requested a review from a team as a code owner February 15, 2024 21:13
Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see inline comments.

Comment on lines +579 to +584
if dep_count == 0 {
// Queries are ready if they have no missing dependencies.
// Initially, this will be the base cases -- which have no dependencies.
ready.insert(key);
}
(key, dep_count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a query is ready here, we still include (key, 0) in missing_dependency_counts...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, a ready query has no missing dependencies. This also means that no other queries (its formerly missing dependencies) will trigger modification of that stat in the future.

Comment on lines 632 to 640
for dependent in dependents {
if provenances.get(dependent).is_none() {
next.insert(dependent);
}
missing_dependency_counts
.entry(dependent)
.and_modify(|missing_count| {
*missing_count -= 1;
if *missing_count == 0 {
next.insert(dependent);
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... which means IIUC that here we may receive a missing_dependency_counts with some (key, 0) entry.
This would then underflow

                        .and_modify(|missing_count| {
                            *missing_count -= 1;
                            if *missing_count == 0 {
                                next.insert(dependent);
                            }
                        }

Is this the behavior we want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we are looking up the missing dependency counts for the dependent of the query that is ready. In other words, after we process this query, then everything that depends on it is a little more ready.

Whenever missing_count actually reaches zero, that query becomes immediately ready. The situation you are worried about could (maybe?) happen only if a query depends on itself. However, this is forbidden and already leads to unprovability. Moreover, it can't actually happen because (besides being illegal and unprovable via the very provenances we are assembling here) such a query would never have become ready in the first place.

It's possible there is some other bug in this code, but I don't think your concern here represents a problem. The whole point is that we are constructing the provenances bottom-up as we topologically sort. This process also implies that each query will only ever be have its missing_count decremented before it is processed. Once that count reaches 0, it will never be affected again, because all the queries on which it depends have already been processed previously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does point out that there is a way for a query to not have its provenance created. Any cyclic computation will never become ready. This is good, since we would never be able to generate a proof. Therefore, we could add an extra check after computing all provenances. If any provenances are missing then we have detected a cyclic computation and can eagerly abort for clear reason. cc: @gabriel-barrett

Copy link
Contributor Author

@porcuquine porcuquine Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, the impossibility of generating a provenance for a cyclic computation is exactly the reason we force the prover to manifest provenance in the first place — to demonstrate the lack of such cycles. The benefit of the bottom-up topological sorting vs top-down is that it lets us detect the error condition explicitly, rather than enter into an infinite loop or wild-goose-chase that exhausts the stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the check here: 3f30a85. If an application manages to allow for and actually try to prove a cyclic computation, it will be caught. If this happens, the Lang supplying a Query enabling it is noncompliant.

Copy link
Contributor

@huitseeker huitseeker Feb 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you're saying is you're unlikely to look up the query that would here provoke an underflow, especially since you end up adding an additional check for cyclic dependencies. In the current form of the code, that's probably right.

In order to prevent a future situation where an errant developer might edit this code without understanding all its implications, simply replacing

*missing_count -=1

by

*missing_count = missing_count.checked_sub(1).expect("ready queries should not be looked up at this stage")

1 could provide the local context you just offered, and avoids us having to one day look into a query never being processed because its missing_count has just become enormous.

Footnotes

  1. or some more enlightening message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@porcuquine porcuquine added this pull request to the merge queue Feb 17, 2024
Merged via the queue into main with commit f2d70f4 Feb 17, 2024
11 checks passed
@porcuquine porcuquine deleted the fix-topo branch February 17, 2024 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants