Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The gradient of h_estimate for compute hv #5

Open
peizhaoli05 opened this issue Jan 31, 2020 · 5 comments
Open

The gradient of h_estimate for compute hv #5

peizhaoli05 opened this issue Jan 31, 2020 · 5 comments

Comments

@peizhaoli05
Copy link

Thanks for the great code.

I change the loss function into binary cross entropy but it runs really slow when the recursion_depth rises. Is it reasonable to use h_estimate.detach() before we put h_estimate into hv = hvp(loss, list(model.parameters()), h_estimate)? Since we should't incorporate the gradient of h_estimate in terms of w in function hvp.

I am not sure about it and appreciated it if you can take a look. Thank you.

@andrewsilva9
Copy link

Hey @peizhaoli05 ,

I edited my last issue but it was already closed so I guess it isn't very public. I actually do keep the h_init parameter to pass to the hvp() function. So all told, my function looks like:

    h_estimates = v.copy()
    h_init = v.copy()
    for i, samples in enumerate(train_loader):  
        samples = torch.autograd.Variable(samples).to(device=DEVICE)
        loss_val = model(samples)
        hv = hvp(loss_val, list(model.parameters()), h_init)
        h_estimates = [
            _v + (1 - damp) * h_estimate - _hv / scale
            for _v, h_estimate, _hv in zip(v, h_estimates, hv)]

with certain break conditions (after like 5000 samples, I break from the loop, but the above code is a bit simplified because what I'm doing is not super straightforward)

As to your specific question, could we detach h_estimate when passing it in... I think so? I haven't really debugged/tested it, but it seems like there's no need to bring the v or h_estimate parameter's gradient into that function.

@peizhaoli05
Copy link
Author

Hi @andrewsilva9 ,

Thanks for your reply.

Actually I assume that put h_init into the function hvp is not right for calculating hv according to the paper. When I change h_init to h_estimate ( the author revises this part of codes ), I found that we need .detach(), and h_estimate does not converge when I rise the round of recursive. So probably there is some mistake in it.

@andrewsilva9
Copy link

If you leave it as h_init, do you notice that it converges? My current problem doesn't really permit tracking some of this convergence.

@andrewsilva9
Copy link

Okay @peizhaoli05, I looked back at the original implementation and I see here: https://github.com/kohpangwei/influence-release/blob/578bc458b4d7cc39ed7343b9b271a04b60c782b1/influence/hessians.py#L50

math_ops.multiply(grad_elem, array_ops.stop_gradient(v_elem))
For me, h_estimates is a list, so I can't .detach() it. I could just iterate through and do that early I guess, but instead I'm just copying the original author's way of doing things and so I've changed the line in hvp() to now do this:

for grad_elem, v_elem in zip(first_grads, v):
    if grad_elem is not None and v_elem is not None:
        elemwise_products += torch.sum(grad_elem * v_elem.detach())

@nimarb
Copy link
Owner

nimarb commented Feb 6, 2020

Hey, thanks a lot for the input. Unfortunately, I'm currently busy with exams and look into it in more depth at the end of Feb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants