Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Load 5% avg #42

Open
ttoinou opened this issue Jul 17, 2016 · 3 comments
Open

GPU Load 5% avg #42

ttoinou opened this issue Jul 17, 2016 · 3 comments

Comments

@ttoinou
Copy link

ttoinou commented Jul 17, 2016

Hi !

I'm on windows 10 64bit, I got Caffe compiled and style.py works. But using GPU-Z I can see that the GPU load oscillate between 0 and 20 % (averaging at 5%) while my CPU is used at its max.

Is this a standard situation, do you have the same GPU load ?

PS : I have this warning in Caffe's common.cpp I don't know if it's relevant
style.py:main:20:29:23.556 -- Starting style transfer. WARNING: Logging before InitGoogleLogging() is written to STDERR I0717 20:29:24.659956 4260 common.cpp:36] System entropy source not available, using fallback algorithm to generate seed instead.

@fzliu
Copy link
Owner

fzliu commented Jul 21, 2016

That is unfortunately due to the expensive CPU-to-GPU memory copies employed by the code in the master branch.

If you'd like a bit more speed (on par with the Torch implementations out there), try this version of Caffe: https://github.com/fzliu/caffe/tree/gram-layer, along with the gram-layer branch of this repository. That Caffe contains dpaiton's code for Gramian computation, which should speed things up significantly and better utilize the GPU.

@ttoinou
Copy link
Author

ttoinou commented Jul 21, 2016

Hi, ok I see. I'm wondering if everything shouldn't be done in CUDA or C++ AMP ? Python & lua are good for prototyping but if someone wants the algorithm to run really fast one has to port every piece of code to the GPU, no ?.

I'll try to compile the gram-layer Caffe and keep you up to date. Thank you for the code :) !

@Darwin2011
Copy link

@fzliu I have tried what you suggest about gram layer, very huge speedup. Thank you!
But I find that when using python profiler to dig into it, Scipy optimizer costs quite a lot of times even though numpy now is compiled with multi-threads support.
Any suggestion about this performance issue?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants