Add Graphs as States #210

alip67 · 2024-11-06T12:43:26Z

Description:
Unlike the current States object that necessitates appending dummy states to batch trajectories of varying lengths, our approach aims to support Trajectories through a nested Batch object representation. The Data class in Torch Geometric represents the graph structure, while the Batch class, which encapsulates batching of Data objects and their efficient indexing, represents the GraphStates object.

The current implementation of Trajectory supports the indexing dimensions: (Num time steps, Num trajectories, State Size). By using a nested Batch of Batch object to represent state Trajectories, the indexing would inherently take the form (Num trajectories, Num timesteps, State size). This approach requires implementing logic within _getitem_() and _setitem_() to internally.

To Do:
Compatibility check with Trajectories, Transition class

src/gfn/gym/graph_building.py

…o graph-states

saleml · 2024-12-06T15:36:35Z

Thank you @younik and @alip67 for this important PR. Is there a script we can play with to see the training of the environment you created?

younik · 2024-12-06T15:40:22Z

Thank you @younik and @alip67 for this important PR. Is there a script we can play with to see the training of the environment you created?

There are still some issues to fix that prevent it from running properly.
I am working to fix them and will post a sample code for using it

saleml

Thanks for the big PR. Here are few questions, comments, suggestions.
I am unable to run the main script.

I don't know why CI isn't triggered in this PR.

I'd be happy to merge once the main issues are resolved, and the smaller ones defined as github issues

src/gfn/states.py

pyproject.toml

src/gfn/env.py

saleml · 2025-01-11T14:54:50Z

src/gfn/env.py

@@ -255,21 +257,22 @@ def _step(
            )

        new_sink_states_idx = actions.is_exit
-        new_states.tensor[new_sink_states_idx] = self.sf
+        sf_tensor = self.States.make_sink_states_tensor((new_sink_states_idx.sum(),))


Curious about the reason for this change? Is it specific to GraphStates?

The reason is because of how graphs are represented in the tensor, i.e:

tensor = TensorDict({ 'node_features': shape (N, F1) 'edge_features': shape (M, F2) 'edge_index': shape (2, M) })

Notice that tensor[some_index] doesn't make sense, and doesn't work. There is a more complex behavior defined in GraphStates.__setitem__, to do it correctly

I think a comment would be worth adding here.

src/gfn/modules.py

saleml · 2025-01-11T15:56:02Z

src/gfn/modules.py

+        dists["action_type"] = CategoricalActionType(probs=action_type_probs)
+
+        edge_index_logits = module_output["edge_index"]
+        if states.tensor["node_feature"].shape[0] > 1 and torch.any(edge_index_logits != -float("inf")):


This check is per batch, not per graph. Shouldn't you use batch_ptr to check each graph separately

BTW this might also need to handle masks for valid edges

At the moment, all the nets return the same action type, so this works, however, you are right, this is not general,.
I restrict to this case because when the number of nodes varies across the batch, also the outputs of the nets vary (e.g. for one batch you will have probs of (N1, N1) for edge_indexes and the other (N2, N2) breaking batching.

I am not sure how to overcome this problem while being reasonably efficient.

BTW this might also need to handle masks for valid edges

Perfectly right, this is the reason for CI failing; see the other comment

saleml · 2025-01-11T15:56:50Z

src/gfn/modules.py

+            ) * edge_index_probs + epsilon * uniform_dist_probs
+            dists["edge_index"] = CategoricalIndexes(probs=edge_index_probs)
+
+        dists["features"] = Normal(module_output["features"], temperature)


there is no need to mask invalid features here?

What do you mean by invalid features?

In case the action is an EXIT action, these features are ignored... but otherwise, I don't see any invalid feature(?)

@saleml we may need to mask the invalid edge_index indeed.
At the moment:

GraphBuilding env requires the edge not to exist (no multiple edges from & to the same node). This is because it is then easier for backward_step (removing the edge).

There is no real masking for it, so the Estimator can sample an invalid action

This is the reason tests in CI are failing. Two solutions I can see:
1) Add mask in the estimator (the code you commented here)
This is easy to do, but is it general enough that multiple edges (from & to the same node) are not allowed?
Also, I already wanted to raise that we have forward_mask and backward_mask in States, while also having isActionValid method. This is a repetition of code (and now we would add another repetition here).

2) Improving the forward & backward masks in State
The current implementation only masks for the type of action (e.g. if there are no nodes, you cannot add an edge).
This is a bit more complicated to code but avoids the repetition of the code of the above solution.

3) Allow GraphBuilding to have multiple same edges
We need to use the edge features to check which edge to remove in the backward step.

What do you think?

I suggest adding basic edge masking in the estimator for now to fix the CI, but we should create a follow-up issue to properly consolidate all action validation logic into the States class. This will eliminate code duplication and provide a single source of truth for action validity. The States class should be responsible for providing comprehensive masks that cover both action types and edge validity.
This approach:

Gets CI passing quickly

Acknowledges the technical debt

Sets a clear path forward

Keeps the current PR scope manageable

SO

Short-term fix (for this PR):

Go with solution 1 (add masking in the estimator) as a temporary fix to get CI passing

Add a TODO/issue comment indicating this is temporary

Document the current limitation in the docstring
Long-term solution (next PR):

Consolidate all action validation logic into the States class (solution 2)

The States class should provide comprehensive masks that include:

Action type validity (current implementation)

Edge validity (currently missing)

Any other environment-specific constraints

LMK what you think

Sounds good!

Actually, if I make the States masks comprehensive, then I should able to just use them in the estimator for masking (so solution 2).

I will do it soon!

I disagree that we should fix it here (i.e., in the Estimator), or if we do so, we should implement masking properly in the States class before the release of V2 because this is a clear design pattern violation.

There is no problem adding multiple edges to-from the same node in principle (in say, a multi-attribute graph) but that does make things way more complex, and I think we can safely avoid that complexity here, as single edges to-from the same nodes will cover a lot of AI for Science applications in the near term.

Yes, I am trying to implement the complete forward/backward mask in states.

I have the following problem:
In general, the number of nodes for each graph can vary. The mask for edge_index is (B, N, N), where N is the number of nodes. However, the number of nodes varies across batch...

Two possible solutions:

Use (B, N, N), with N the total number of nodes (not in one graph, but across the graphes in the batch). This is more memory consuming, but general.

Enforce the number of nodes to be the same across the batch, which means enforcing the action to be the same across the batch (you cannot add a node to one graph while adding an edge to another).

cc @saleml @josephdviviano

src/gfn/modules.py

saleml · 2025-01-11T16:00:33Z

tutorials/examples/test_graph_ring.py

+
+        return torch.cat([action_type, edge_actions], dim=-1)
+
+class RingGraphBuilding(GraphBuilding):


I cannot run this file. I get this error
ValueError: batch size was not specified when creating the TensorDict instance and it could not be retrieved from source.

Can you run it?

Yes, I can run it...
Which version of TensorDict are you using? I have 0.6.2

ERROR: Could not find a version that satisfies the requirement tensordict==0.6.2 (from versions: 0.0.1a0, 0.0.1b0, 0.0.1rc0, 0.0.2a0, 0.0.2b0, 0.0.3, 0.1.0, 0.1.1, 0.1.2, 0.2.0, 0.2.1, 0.3.0, 0.3.1, 0.3.2) ERROR: No matching distribution found for tensordict==0.6.2

Althout in their github repo there is a 0.6.2 release. I will try to debug this.

younik · 2025-01-14T00:00:00Z

It looks like GitHub is down, I will rerun CI tomorrow, but they are green locally

src/gfn/gym/graph_building.py

josephdviviano

First pass at a review. Thanks so much for your amazing work. I am going to mess around with the test cases and environment next to get a better understanding of how things function together. In the meantime, I have some questions.

pyproject.toml

src/gfn/actions.py

josephdviviano · 2025-01-16T21:58:07Z

src/gfn/env.py

@@ -255,21 +257,22 @@ def _step(
            )

        new_sink_states_idx = actions.is_exit
-        new_states.tensor[new_sink_states_idx] = self.sf
+        sf_tensor = self.States.make_sink_states_tensor((new_sink_states_idx.sum(),))


I think a comment would be worth adding here.

josephdviviano · 2025-01-16T22:05:25Z

src/gfn/env.py

+        """
+        self.s0 = s0.to(device_str)
+        self.features_dim = s0["node_feature"].shape[-1]
+        self.sf = sf


Perhaps we could have a special NoneTensorDict GraphState which acts like None but passes the relevant checks?

josephdviviano · 2025-01-16T23:21:01Z

src/gfn/modules.py

-            self.check_output_dim(out)
-            self._output_dim_is_checked = True
-
+        assert out.shape[-1] == 1


This also seems like a much harder constraint.

Why? The expected_output_dim is 1 in this class, so it seems the same to me (actually softer as I don't check the dtype).

josephdviviano · 2025-01-16T23:28:27Z

src/gfn/modules.py

+            ) * edge_index_probs + epsilon * uniform_dist_probs
+            dists["edge_index"] = CategoricalIndexes(probs=edge_index_probs)
+
+        dists["features"] = Normal(module_output["features"], temperature)


I disagree that we should fix it here (i.e., in the Estimator), or if we do so, we should implement masking properly in the States class before the release of V2 because this is a clear design pattern violation.

There is no problem adding multiple edges to-from the same node in principle (in say, a multi-attribute graph) but that does make things way more complex, and I think we can safely avoid that complexity here, as single edges to-from the same nodes will cover a lot of AI for Science applications in the near term.

src/gfn/samplers.py

src/gfn/states.py

src/gfn/utils/distributions.py

pyproject.toml

saleml · 2025-01-28T13:52:13Z

The code runs, thanks for the last change.

A few suggestions/questions:

If I understand correctly, the goal of the script is to sample ring graphs. I expect that during training, the proportion of sample graphs that are rings gets higher with time. Is it possible to validate that during the training loop? For example, every 10-50-100-whatever iterations, you generate N graphs, and see which ones are rings.
Could you add short descriptions of the functions you are dealing with? For example, the state_evaluator function seems to be central in the code. It is important to know what it does.
Could you merge master into this? Note that this might lead to pyright issues. You can solve most, but some, such as gflownet.loss(), you can ignore, using the same comment used to ignore pyright issues from other scripts

including Graphs as States for torchgfn

9ae28b2

alip67 marked this pull request as draft November 6, 2024 12:43

alip67 marked this pull request as ready for review November 6, 2024 12:50

alip67 marked this pull request as draft November 6, 2024 12:51

younik added 4 commits November 7, 2024 21:50

add GraphEnv

de6ab1c

add deps and reformat

24e23e8

add test, fix errors, add valid action check

1f7b220

fix formatting

63e4f1c

josephdviviano reviewed Nov 8, 2024

View reviewed changes

src/gfn/gym/graph_building.py Outdated Show resolved Hide resolved

src/gfn/gym/graph_building.py Outdated Show resolved Hide resolved

younik and others added 15 commits November 14, 2024 13:42

add GraphAction

8034fb2

fix batching mechanism

d179671

Merge branch 'GFNOrg:master' into graph-states

e018f4e

add support for EXIT action

7ff96d5

Merge branch 'graph-states' of https://github.com/alip67/torchgfn int…

cf482da

…o graph-states

add GraphActionPolicyEstimator

dacbbf7

Merge branch 'GFNOrg:master' into graph-states

98ea448

Sampler integration work

e74e500

Merge branch 'graph-states' of https://github.com/alip67/torchgfn int…

a862bb4

…o graph-states

use TensorDict

5e64c84

solve some errors

81f8b71

use tensordict in actions

34781ef

handle sf

3e584f2

remove Data

d5e438f

categorical action type

fba5d50

younik added 4 commits December 10, 2024 16:46

change batching

478bd14

fix stacking

dd80f28

fix graph stacking

616551c

fix test graph env

77611d4

saleml reviewed Jan 11, 2025

View reviewed changes

younik mentioned this pull request Jan 11, 2025

Make CI running in PRs #227

Merged

younik added 4 commits January 12, 2025 20:18

Merge remote-tracking branch 'origin/master' into graph-states

5d99739

fix formatting

f4fc3ab

address comments

8f1c62c

fix test

6482834

saleml mentioned this pull request Jan 13, 2025

About having graph in the States #153

Closed

younik and others added 7 commits January 14, 2025 00:05

fix test

6db601d

fix pre-commit

c7f8243

Merge remote-tracking branch 'origin/master' into graph-states

c3df427

fix merging issues

78b729a

fix toml

38dd2b0

add dep & address issue

12c49b7

fix pyproject.toml

9bbc48d

saleml reviewed Jan 14, 2025

View reviewed changes

src/gfn/gym/graph_building.py Outdated Show resolved Hide resolved

younik added 5 commits January 15, 2025 00:28

address comments

5e4fc4e

add tests for action

705b4cc

fix test after added dummy action

d765330

add GraphPreprocessor

4ee6987

added TODO

fe9713c

josephdviviano requested changes Jan 16, 2025

View reviewed changes

younik added 5 commits January 19, 2025 16:27

add complete masks

1425eb6

pre-commit hook

36c42ec

adress comments

5747e97

pre-commit

e9f9951

address comments

406cfca

saleml reviewed Jan 21, 2025

View reviewed changes

pyproject.toml Show resolved Hide resolved

fix ring example

08e519b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Graphs as States #210

Add Graphs as States #210

alip67 commented Nov 6, 2024

saleml commented Dec 6, 2024

younik commented Dec 6, 2024

saleml left a comment

saleml Jan 11, 2025

younik Jan 12, 2025 •

edited

Loading

josephdviviano Jan 16, 2025

saleml Jan 11, 2025

younik Jan 14, 2025 •

edited

Loading

saleml Jan 11, 2025

younik Jan 14, 2025

younik Jan 14, 2025 •

edited

Loading

saleml Jan 15, 2025 •

edited

Loading

younik Jan 15, 2025

josephdviviano Jan 16, 2025

younik Jan 17, 2025 •

edited

Loading

saleml Jan 11, 2025

younik Jan 14, 2025

saleml Jan 15, 2025

younik commented Jan 14, 2025

josephdviviano left a comment

josephdviviano Jan 16, 2025

josephdviviano Jan 16, 2025

josephdviviano Jan 16, 2025

younik Jan 20, 2025

josephdviviano Jan 16, 2025

saleml commented Jan 28, 2025


		return torch.cat([action_type, edge_actions], dim=-1)

		class RingGraphBuilding(GraphBuilding):

Add Graphs as States #210

Are you sure you want to change the base?

Add Graphs as States #210

Conversation

alip67 commented Nov 6, 2024

saleml commented Dec 6, 2024

younik commented Dec 6, 2024

saleml left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younik Jan 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younik Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younik Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

saleml Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younik Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younik commented Jan 14, 2025

josephdviviano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saleml commented Jan 28, 2025

younik Jan 12, 2025 •

edited

Loading

younik Jan 14, 2025 •

edited

Loading

younik Jan 14, 2025 •

edited

Loading

saleml Jan 15, 2025 •

edited

Loading

younik Jan 17, 2025 •

edited

Loading