You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thanks for the awesome ideas and work.
I have a question about the way you're computing pgm_natgrad. Specifically here. I'm copying the relevant lines:
# this expression for pgm_natgrad drops a term that can be computed using# the function autograd.misc.fixed_points.fixed_pointpgm_natgrad=-natgrad_scale/num_datapoints* \
(flat(pgm_prior) +num_batches*flat(saved.stats) -flat(pgm_params))
If I understand correctly, the dropped term is this:
Which in the paper you mention "is computed automatically as part of the backward pass for computing the gradients with respect to the other parameters".
Can you clarify why that term is dropped? Also, I don't understand the minus sign, right in the beginning of the assignment, line 33.
Again, congrats on the awesome work!
On a side note, I think I spotted 2 errors in the paper:
In section 4.2 (and then again in the appendix), where you define \eta_x to be a partial local optimizer of the surrogate objective:
I believe this should be argmax, rather than argmin. Can you confirm?
In the second expression of proposition 4.2:
I think the gradient should be w.r.t. \theta, rather than x. Is that correct?
The text was updated successfully, but these errors were encountered:
About the minus sign in the assignment, I think I figured it out. In the implementation you're minimizing "-objective" rather than maximizing "objective", right?
Hey Matthew!
First of all, thanks for the awesome ideas and work.
I have a question about the way you're computing
pgm_natgrad
. Specifically here. I'm copying the relevant lines:If I understand correctly, the dropped term is this:
Which in the paper you mention "is computed automatically as part of the backward pass for computing the gradients with respect to the other parameters".
Can you clarify why that term is dropped? Also, I don't understand the minus sign, right in the beginning of the assignment, line 33.
Again, congrats on the awesome work!
On a side note, I think I spotted 2 errors in the paper:
In section 4.2 (and then again in the appendix), where you define \eta_x to be a partial local optimizer of the surrogate objective:
I believe this should be argmax, rather than argmin. Can you confirm?
In the second expression of proposition 4.2:
I think the gradient should be w.r.t. \theta, rather than x. Is that correct?
The text was updated successfully, but these errors were encountered: