Positional encoding in `RadianceNet` #2

shrubb · 2021-09-30T22:14:32Z

Hi, and thanks a lot for the implementation!

neurecon/configs/volsdf_nerfpp_blended.yaml

Lines 41 to 42 in 972e810

    
           embed_multires: -1 
        
           embed_multires_view: -1

I was wondering why we are not using positional encoding and instead are feeding raw 3D coordinates and view directions here? Especially because IDR is not doing so and the defaults are 6 and 4... 🤔

I tried changing these from -1 to 6 and/or 4, and training collapses or at least goes much slower... To me, this seems extremely weird!

ventusff · 2021-10-01T12:12:23Z

Hi @shrubb,
Yes, I agree that the embed_multire_view should be 4 in every case.
Sorry that this is not carefully configured before, because in my practice embedding for radiance network seems to not have noticeable influence.

However, the raw 3D coordinates may still be directly feeded, as in IDR, in a way of respecting official choice of implementation. But of course feeding embeded input may lead to better results.

As for the training speed, in my test:

volsdf.yaml

only setting embed_multires_view=4:

  ....
  (radiance_net): RadianceNet(
    (embed_fn): Identity()
    (embed_fn_view): Embedder()
    (layers): ModuleList(
      (0): DenseLayer(
        in_features=289, out_features=256, bias=True
        (activation): ReLU(inplace=True)
      )
  ...
  0%|           | 97/100000 [00:25<6:06:50,  4.54it/s, loss_img=0.135, loss_total=0.137, lr=0.000499]

setting 6&4:

  (radiance_net): RadianceNet(
    (embed_fn): Embedder()
    (embed_fn_view): Embedder()
    (layers): ModuleList(
      (0): DenseLayer(
        in_features=325, out_features=256, bias=True
        (activation): ReLU(inplace=True)
      )
...
0%|             | 107/100000 [00:28<6:07:23,  4.53it/s, loss_img=0.16, loss_total=0.164, lr=0.000499]

volsdf_nerfpp_blended.yaml

You may notice that the training iterations arise from 100k to 200k, which is the major reason of training time increase.

original -1&-1 setting:

0%|           | 131/200000 [00:34<14:43:28,  3.77it/s, loss_img=0.182, loss_total=0.185, lr=0.0005]

only setting embed_multires_view=4:

0%|           | 209/200000 [00:52<13:54:47,  3.99it/s, loss_img=0.215, loss_total=0.22, lr=0.0005]

setting 6&4:

0%|            | 121/200000 [00:32<14:50:09,  3.74it/s, loss_img=0.162, loss_total=0.163, lr=0.0005]

ventusff · 2021-10-01T12:19:27Z

As for whether training collapses or not, I'm running training tests on BlendedMVS, to be continued...

shrubb · 2021-10-01T12:27:21Z

Thanks for sharing your experience and especially for linking the IDR code! 🙏 Now it makes more sense.

By "slower" I actually meant convergence speed and training stability. Like, when I apply positional encodings to radiance net's inputs (pink graph), losses/metrics/parameters go crazy, while with your default config training is smooth and stable (blue graph). I'll continue to investigate this.

ventusff · 2021-10-01T13:05:51Z

Hi @shrubb ,
After running some training tests, I think a possible explanation can be made, which is also something I found out earlier:

At early training stage, the dominating representing branch needs to be the geometry_feature, for the network to quickly find initial clues about a roughly correct shape to render correct images.

If the embeded 3D coordinates are feeded to radiance network instead of raw 3D coordinates (dim=39 instead of dim=3), then the representing capability is branched with ambiguity, the network may assign more representing capability to the radiance itself instead of learning a rough shape, thus leading to very slow shape convergence (and may not even converge at all)

That is to say, the dominating representing branch needs to be the first one among the following three at early stages:

x -> embedder -> SDF -> geometry_feature -> radiance
x -> (embedder) -> radiance
v -> (embedder) -> radiance

Embedding location and view direction input of radiance network introduce larger gradients and "prempt" more gradients flows to the radiance net, "sharing" relatively less gradients to the surface network, as shown below:

Pratical comparison of normals validation when training: (You can see that no meaningful shapes are learned in the latter two cases)

embed_x=-1 & embed_v=-1 @ 0k,1k,2k
embed_x=-1 & embed_v=4 @ 0k,1k,2k
embed_x=6 & embed_v=4 @ 0k,1k,2k

ventusff · 2021-10-01T13:17:01Z

But still, in VolSDF, embedding view_direction even leads to no convergence. This is still weird, since it's OK with NeuS.

The VolSDF paper mentioned nothing about whether the input of radiance network is embedded or not. I'm looking forward to their official implementation for code comparison.

shrubb · 2021-10-01T14:01:14Z

Makes perfect sense, thank you for the insight and the experiments! 🎉

ventusff · 2021-10-01T14:06:04Z

Glad to know it helps 😄

shrubb closed this as completed Oct 1, 2021

ventusff pinned this issue Oct 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Positional encoding in `RadianceNet` #2

Positional encoding in `RadianceNet` #2

shrubb commented Sep 30, 2021 •

edited

Loading

ventusff commented Oct 1, 2021

ventusff commented Oct 1, 2021

shrubb commented Oct 1, 2021

ventusff commented Oct 1, 2021 •

edited

Loading

ventusff commented Oct 1, 2021

shrubb commented Oct 1, 2021

ventusff commented Oct 1, 2021

Positional encoding in RadianceNet #2

Positional encoding in RadianceNet #2

Comments

shrubb commented Sep 30, 2021 • edited Loading

ventusff commented Oct 1, 2021

volsdf.yaml

volsdf_nerfpp_blended.yaml

ventusff commented Oct 1, 2021

shrubb commented Oct 1, 2021

ventusff commented Oct 1, 2021 • edited Loading

ventusff commented Oct 1, 2021

shrubb commented Oct 1, 2021

ventusff commented Oct 1, 2021

Positional encoding in `RadianceNet` #2

Positional encoding in `RadianceNet` #2

shrubb commented Sep 30, 2021 •

edited

Loading

ventusff commented Oct 1, 2021 •

edited

Loading