Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Positional encoding in RadianceNet #2

Closed
shrubb opened this issue Sep 30, 2021 · 7 comments
Closed

Positional encoding in RadianceNet #2

shrubb opened this issue Sep 30, 2021 · 7 comments

Comments

@shrubb
Copy link

shrubb commented Sep 30, 2021

Hi, and thanks a lot for the implementation!

embed_multires: -1
embed_multires_view: -1

I was wondering why we are not using positional encoding and instead are feeding raw 3D coordinates and view directions here? Especially because IDR is not doing so and the defaults are 6 and 4... 🤔

I tried changing these from -1 to 6 and/or 4, and training collapses or at least goes much slower... To me, this seems extremely weird!

@ventusff
Copy link
Owner

ventusff commented Oct 1, 2021

Hi @shrubb,
Yes, I agree that the embed_multire_view should be 4 in every case.
Sorry that this is not carefully configured before, because in my practice embedding for radiance network seems to not have noticeable influence.

However, the raw 3D coordinates may still be directly feeded, as in IDR, in a way of respecting official choice of implementation. But of course feeding embeded input may lead to better results.

As for the training speed, in my test:

volsdf.yaml

  • only setting embed_multires_view=4:
  ....
  (radiance_net): RadianceNet(
    (embed_fn): Identity()
    (embed_fn_view): Embedder()
    (layers): ModuleList(
      (0): DenseLayer(
        in_features=289, out_features=256, bias=True
        (activation): ReLU(inplace=True)
      )
  ...
  0%|           | 97/100000 [00:25<6:06:50,  4.54it/s, loss_img=0.135, loss_total=0.137, lr=0.000499]
  • setting 6&4:
  (radiance_net): RadianceNet(
    (embed_fn): Embedder()
    (embed_fn_view): Embedder()
    (layers): ModuleList(
      (0): DenseLayer(
        in_features=325, out_features=256, bias=True
        (activation): ReLU(inplace=True)
      )
...
0%|             | 107/100000 [00:28<6:07:23,  4.53it/s, loss_img=0.16, loss_total=0.164, lr=0.000499]

volsdf_nerfpp_blended.yaml

You may notice that the training iterations arise from 100k to 200k, which is the major reason of training time increase.

  • original -1&-1 setting:
0%|           | 131/200000 [00:34<14:43:28,  3.77it/s, loss_img=0.182, loss_total=0.185, lr=0.0005]
  • only setting embed_multires_view=4:
0%|           | 209/200000 [00:52<13:54:47,  3.99it/s, loss_img=0.215, loss_total=0.22, lr=0.0005]
  • setting 6&4:
0%|            | 121/200000 [00:32<14:50:09,  3.74it/s, loss_img=0.162, loss_total=0.163, lr=0.0005]

@ventusff
Copy link
Owner

ventusff commented Oct 1, 2021

As for whether training collapses or not, I'm running training tests on BlendedMVS, to be continued...

@shrubb
Copy link
Author

shrubb commented Oct 1, 2021

Thanks for sharing your experience and especially for linking the IDR code! 🙏 Now it makes more sense.

By "slower" I actually meant convergence speed and training stability. Like, when I apply positional encodings to radiance net's inputs (pink graph), losses/metrics/parameters go crazy, while with your default config training is smooth and stable (blue graph). I'll continue to investigate this.
изображение
изображение
изображение

@ventusff
Copy link
Owner

ventusff commented Oct 1, 2021

Hi @shrubb ,
After running some training tests, I think a possible explanation can be made, which is also something I found out earlier:

At early training stage, the dominating representing branch needs to be the geometry_feature, for the network to quickly find initial clues about a roughly correct shape to render correct images.

If the embeded 3D coordinates are feeded to radiance network instead of raw 3D coordinates (dim=39 instead of dim=3), then the representing capability is branched with ambiguity, the network may assign more representing capability to the radiance itself instead of learning a rough shape, thus leading to very slow shape convergence (and may not even converge at all)

That is to say, the dominating representing branch needs to be the first one among the following three at early stages:

x -> embedder -> SDF -> geometry_feature -> radiance
x -> (embedder) -> radiance
v -> (embedder) -> radiance

Embedding location and view direction input of radiance network introduce larger gradients and "prempt" more gradients flows to the radiance net, "sharing" relatively less gradients to the surface network, as shown below:

微信截图_20211001210127

微信截图_20211001210139

Pratical comparison of normals validation when training: (You can see that no meaningful shapes are learned in the latter two cases)

  • embed_x=-1 & embed_v=-1 @ 0k,1k,2k
    00000000_0
    00001000_0
    00002000_0

  • embed_x=-1 & embed_v=4 @ 0k,1k,2k
    00000000_0
    00001000_0
    00002000_0

  • embed_x=6 & embed_v=4 @ 0k,1k,2k
    00000000_0
    00001000_0
    00002000_0

@ventusff
Copy link
Owner

ventusff commented Oct 1, 2021

But still, in VolSDF, embedding view_direction even leads to no convergence. This is still weird, since it's OK with NeuS.

The VolSDF paper mentioned nothing about whether the input of radiance network is embedded or not. I'm looking forward to their official implementation for code comparison.

@shrubb
Copy link
Author

shrubb commented Oct 1, 2021

Makes perfect sense, thank you for the insight and the experiments! 🎉

@shrubb shrubb closed this as completed Oct 1, 2021
@ventusff
Copy link
Owner

ventusff commented Oct 1, 2021

Glad to know it helps 😄

@ventusff ventusff pinned this issue Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants