The gradient of h_estimate for compute hv #5

peizhaoli05 · 2020-01-31T19:39:04Z

Thanks for the great code.

I change the loss function into binary cross entropy but it runs really slow when the recursion_depth rises. Is it reasonable to use h_estimate.detach() before we put h_estimate into hv = hvp(loss, list(model.parameters()), h_estimate)? Since we should't incorporate the gradient of h_estimate in terms of w in function hvp.

I am not sure about it and appreciated it if you can take a look. Thank you.

The text was updated successfully, but these errors were encountered:

andrewsilva9 · 2020-02-05T16:34:54Z

Hey @peizhaoli05 ,

I edited my last issue but it was already closed so I guess it isn't very public. I actually do keep the h_init parameter to pass to the hvp() function. So all told, my function looks like:

    h_estimates = v.copy()
    h_init = v.copy()
    for i, samples in enumerate(train_loader):  
        samples = torch.autograd.Variable(samples).to(device=DEVICE)
        loss_val = model(samples)
        hv = hvp(loss_val, list(model.parameters()), h_init)
        h_estimates = [
            _v + (1 - damp) * h_estimate - _hv / scale
            for _v, h_estimate, _hv in zip(v, h_estimates, hv)]

with certain break conditions (after like 5000 samples, I break from the loop, but the above code is a bit simplified because what I'm doing is not super straightforward)

As to your specific question, could we detach h_estimate when passing it in... I think so? I haven't really debugged/tested it, but it seems like there's no need to bring the v or h_estimate parameter's gradient into that function.

peizhaoli05 · 2020-02-05T22:16:16Z

Hi @andrewsilva9 ,

Thanks for your reply.

Actually I assume that put h_init into the function hvp is not right for calculating hv according to the paper. When I change h_init to h_estimate ( the author revises this part of codes ), I found that we need .detach(), and h_estimate does not converge when I rise the round of recursive. So probably there is some mistake in it.

andrewsilva9 · 2020-02-06T04:11:05Z

If you leave it as h_init, do you notice that it converges? My current problem doesn't really permit tracking some of this convergence.

andrewsilva9 · 2020-02-06T15:37:02Z

Okay @peizhaoli05, I looked back at the original implementation and I see here: https://github.com/kohpangwei/influence-release/blob/578bc458b4d7cc39ed7343b9b271a04b60c782b1/influence/hessians.py#L50

math_ops.multiply(grad_elem, array_ops.stop_gradient(v_elem))
For me, h_estimates is a list, so I can't .detach() it. I could just iterate through and do that early I guess, but instead I'm just copying the original author's way of doing things and so I've changed the line in hvp() to now do this:

for grad_elem, v_elem in zip(first_grads, v):
    if grad_elem is not None and v_elem is not None:
        elemwise_products += torch.sum(grad_elem * v_elem.detach())

nimarb · 2020-02-06T18:35:16Z

Hey, thanks a lot for the input. Unfortunately, I'm currently busy with exams and look into it in more depth at the end of Feb.

ryokamoi mentioned this issue Jun 12, 2020

The gradient of h_estimate is not cut down. #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The gradient of h_estimate for compute hv #5

The gradient of h_estimate for compute hv #5

peizhaoli05 commented Jan 31, 2020

andrewsilva9 commented Feb 5, 2020

peizhaoli05 commented Feb 5, 2020

andrewsilva9 commented Feb 6, 2020

andrewsilva9 commented Feb 6, 2020

nimarb commented Feb 6, 2020

The gradient of h_estimate for compute hv #5

The gradient of h_estimate for compute hv #5

Comments

peizhaoli05 commented Jan 31, 2020

andrewsilva9 commented Feb 5, 2020

peizhaoli05 commented Feb 5, 2020

andrewsilva9 commented Feb 6, 2020

andrewsilva9 commented Feb 6, 2020

nimarb commented Feb 6, 2020