Tutorial 7 Bug In D^{-1/2} A D^{-1/2} Math? #154

johanos · 2025-01-08T22:47:26Z

Tutorial: 7

Describe the bug
I think there's a math error for this claim "Finally, to take the average instead of summing, we calculate the matrix 𝐷̂ which is a diagonal matrix with 𝐷𝑖𝑖 denoting the number of neighbors node 𝑖 has."

I read this and couldn't get this to be an average. When I did the math out I got this result

$$\left( D^{-1/2}AD^{-1/2}\right)_{ij} = \frac{A_{ij}}{\sqrt{D_{ii}} \sqrt{D_{jj}}}$$

which got me to try to implement the math directly instead of just averaging the rows to see

class GCNLayer(nn.Module):
    
    def __init__(self, c_in, c_out):
        super().__init__()
        self.projection = nn.Linear(c_in, c_out)

    def forward(self, node_feats, adj_matrix):
        """
        Inputs:
            node_feats - Tensor with node features of shape [batch_size, num_nodes, c_in]
            adj_matrix - Batch of adjacency matrices of the graph. If there is an edge from i to j, adj_matrix[b,i,j]=1 else 0.
                         Supports directed edges by non-symmetric matrices. Assumes to already have added the identity connections. 
                         Shape: [batch_size, num_nodes, num_nodes]
        """
        # Num neighbours = number of incoming edges
        num_neighbours = adj_matrix.sum(dim=-1, keepdims=True)

        # Create the diagonal matrix D^{-1/2}
        # Avoid division by zero
        num_neighbours = torch.clamp(num_neighbours, min=1) 
        d_inv_sqrt = torch.pow(num_neighbours, -0.5)

        # Create a diagonal matrix from d_inv_sqrt
        identity = torch.eye(num_neighbours.shape[1]).to(node_feats.device)  
        d_inv_sqrt_matrix = identity * d_inv_sqrt


        # Perform D^{-1/2} adj_matrix D^{-1/2}
        adj_matrix = torch.matmul(d_inv_sqrt_matrix, adj_matrix)  
        adj_matrix = torch.matmul(adj_matrix, d_inv_sqrt_matrix)
        
        node_feats = self.projection(node_feats)
        node_feats = torch.bmm(adj_matrix, node_feats)  

        return node_feats

which yielded results that still line up with the claims presented afterwards, mainly that the graph nodes would output the same values because the set of nodes is the same.

Adjacency matrix
 tensor([[[1., 1., 0., 0.],
         [1., 1., 1., 1.],
         [0., 1., 1., 1.],
         [0., 1., 1., 1.]]])
Input features
 tensor([[[0., 1.],
         [2., 3.],
         [4., 5.],
         [6., 7.]]])
Output features
 tensor([[[0.7071, 1.5607],
         [3.3868, 4.5677],
         [3.9107, 4.8660], <- this node output is the same 
         [3.9107, 4.8660]]]) <- as this node, which was the point of the example.

So i guess the main thing is that the 'averaging' claim is not fully on the nose, it's more that it's getting scaled by some pairwise factor.

$$D_{ii}^{-1/2} D_{jj}^{-1/2} * A_{ij}$$

at least that's my intuition of the situation

Runtime environment (please complete the following information):
Colab - GPU

The text was updated successfully, but these errors were encountered:

johanos added the bug Something isn't working label Jan 8, 2025

johanos assigned phlippe Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial 7 Bug In D^{-1/2} A D^{-1/2} Math? #154

Tutorial 7 Bug In D^{-1/2} A D^{-1/2} Math? #154

johanos commented Jan 8, 2025

Tutorial 7 Bug In D^{-1/2} A D^{-1/2} Math? #154

Tutorial 7 Bug In D^{-1/2} A D^{-1/2} Math? #154

Comments

johanos commented Jan 8, 2025