Some more work which might be relevant #2

EndingCredits · 2018-01-26T16:20:17Z

Hi,

In your paper, one of the important points you make is that your method is O(n) whereas Relation nets are O(n^2).

In fact, if we write both architectures in a (very) simplified way (and ignoring certain complications such as softmax), we can see that RMNs are indeed a slight variation on relation nets, which prevent this problem:

Relation net:
r(X) = sum_i(sum_j(f_1(f_2(x_i), f_3(x_j)))
For some f_1, f_2, f_3.
(In the case of the paper, f_2 and f_3 are the identity, and f_1 is an MLP on the concatenation of the inputs, which is additionally parameterized by p)

FMN (2-stage):
r(X) = sum_i(g_1(g_2(x_i), sum_j(g_3(x_j)))) i.e. we move the sum inside the bracket so we can re-use it for all i
For some g_1, g_2, g_3.
(In this case g_3 is attention with parameter p, g_1 is another attention, and g_2 is identity)

Note: in both cases we can change the function r(X) into a transformation of the individual elements, rather than a representation of the set, by simply not taking the outer sum. This allows us to stack multiple layers on top of each other, as you do with your n-stage FMN.

While this similarity might seem superficial, since very different functions are used for f_1,f_2,f_3 and g_1,g_2,g_3, it turns out a lot of other architectures exist which also fall into this pattern. In particular, the self-attention used in https://arxiv.org/abs/1706.03762 is a better comparison to your architecture, as it uses attention mechanisms.

There's also been a few pieces of work done which would fall into this second category, in particular this paper https://arxiv.org/abs/1703.06114 . I've also written a bit about these variants (as well as relation nets) here. It's possible some of the notes in these might be useful, and it may be interesting to compare your architecture to theirs on their problems (or try theirs on yours), as ultimately they solve the same problem, but with a large difference in architecture.

I hope that all made sense. I have some block diagrams of various architectures that might make things clearer, although they don't really show much on the conceptual level.

Best wishes,

Will

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some more work which might be relevant #2

Some more work which might be relevant #2

EndingCredits commented Jan 26, 2018 •

edited

Loading

Some more work which might be relevant #2

Some more work which might be relevant #2

Comments

EndingCredits commented Jan 26, 2018 • edited Loading

EndingCredits commented Jan 26, 2018 •

edited

Loading