You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In your paper, one of the important points you make is that your method is O(n) whereas Relation nets are O(n^2).
In fact, if we write both architectures in a (very) simplified way (and ignoring certain complications such as softmax), we can see that RMNs are indeed a slight variation on relation nets, which prevent this problem:
Relation net: r(X) = sum_i(sum_j(f_1(f_2(x_i), f_3(x_j)))
For some f_1, f_2, f_3.
(In the case of the paper, f_2 and f_3 are the identity, and f_1 is an MLP on the concatenation of the inputs, which is additionally parameterized by p)
FMN (2-stage): r(X) = sum_i(g_1(g_2(x_i), sum_j(g_3(x_j)))) i.e. we move the sum inside the bracket so we can re-use it for all i
For some g_1, g_2, g_3.
(In this case g_3 is attention with parameter p, g_1 is another attention, and g_2 is identity)
Note: in both cases we can change the function r(X) into a transformation of the individual elements, rather than a representation of the set, by simply not taking the outer sum. This allows us to stack multiple layers on top of each other, as you do with your n-stage FMN.
While this similarity might seem superficial, since very different functions are used for f_1,f_2,f_3 and g_1,g_2,g_3, it turns out a lot of other architectures exist which also fall into this pattern. In particular, the self-attention used in https://arxiv.org/abs/1706.03762 is a better comparison to your architecture, as it uses attention mechanisms.
There's also been a few pieces of work done which would fall into this second category, in particular this paper https://arxiv.org/abs/1703.06114 . I've also written a bit about these variants (as well as relation nets) here. It's possible some of the notes in these might be useful, and it may be interesting to compare your architecture to theirs on their problems (or try theirs on yours), as ultimately they solve the same problem, but with a large difference in architecture.
I hope that all made sense. I have some block diagrams of various architectures that might make things clearer, although they don't really show much on the conceptual level.
Best wishes,
Will
The text was updated successfully, but these errors were encountered:
Hi,
In your paper, one of the important points you make is that your method is O(n) whereas Relation nets are O(n^2).
In fact, if we write both architectures in a (very) simplified way (and ignoring certain complications such as softmax), we can see that RMNs are indeed a slight variation on relation nets, which prevent this problem:
Relation net:
r(X) = sum_i(sum_j(f_1(f_2(x_i), f_3(x_j)))
For some
f_1
,f_2
,f_3
.(In the case of the paper,
f_2
andf_3
are the identity, andf_1
is an MLP on the concatenation of the inputs, which is additionally parameterized byp
)FMN (2-stage):
r(X) = sum_i(g_1(g_2(x_i), sum_j(g_3(x_j))))
i.e. we move the sum inside the bracket so we can re-use it for alli
For some
g_1
,g_2
,g_3
.(In this case
g_3
is attention with parameterp
,g_1
is another attention, andg_2
is identity)Note: in both cases we can change the function
r(X)
into a transformation of the individual elements, rather than a representation of the set, by simply not taking the outer sum. This allows us to stack multiple layers on top of each other, as you do with your n-stage FMN.While this similarity might seem superficial, since very different functions are used for
f_1
,f_2
,f_3
andg_1
,g_2
,g_3
, it turns out a lot of other architectures exist which also fall into this pattern. In particular, the self-attention used in https://arxiv.org/abs/1706.03762 is a better comparison to your architecture, as it uses attention mechanisms.There's also been a few pieces of work done which would fall into this second category, in particular this paper https://arxiv.org/abs/1703.06114 . I've also written a bit about these variants (as well as relation nets) here. It's possible some of the notes in these might be useful, and it may be interesting to compare your architecture to theirs on their problems (or try theirs on yours), as ultimately they solve the same problem, but with a large difference in architecture.
I hope that all made sense. I have some block diagrams of various architectures that might make things clearer, although they don't really show much on the conceptual level.
Best wishes,
Will
The text was updated successfully, but these errors were encountered: