-
Notifications
You must be signed in to change notification settings - Fork 86
Allow multiple devices to work on the same set of images #67
Comments
I cannot help the issue, but I just recently asked for a similar function on Snowshell's issue tracker and would like some insight into which specific CPU and GPU you are using, especially since I'm receiving a bit of push-back on the idea with comments along the lines of "most common modern GPUs would have finished all images on its own before the CPU would have even finished one" which flies right in the face of what you yourself are reporting. Also you mathed wrong - it'd actually be 50% faster, not 33% (if the CPU was the same speed as the GPU then it'd be 100% faster). Consider that your CPU is 1/2 (aka 50%) the speed of your GPU which itself can be said to have a speed of 2/2 (aka 100%), so 1/2+2/2 would then be 3/2 which means 150%, therefore 50% faster than 100%. |
On topic: Temporary solution could be a simple python script that splits the requested list of images into X (X being the amount of HW devices you have installed, iGPU, CPU, dGPU) Off topic:
wether one method is faster than the other depends highly on the hardware used, but generally GPU is faster. I'm sure even a 12core 3900x would struggle against a midrange GPU.
That is like 89% faster, even if you had that CPU 4 times it would still be slower.. |
On the contrary, a 3900X should perform around 3x as fast as your 6700k even when factoring in your overclock because, even though Zen2's single-threaded IPC is comparable to Skylake, Ryzen's implementation of SMT is stronger by like 10%. This means that a 3900X should complete the same task in around 76 seconds which is "only" 3 times slower than your 2070 Super, so against a midrange GPU I would expect it to be more like only twice as slow (if not less) which is what veikk0 says he experiences with his 8core CPU. And even if waifu2x cannot linearly scale performance from 4c/8t to 12c/24t to actually complete a single image 3 times faster, what it can do still is process three different images in the same 227 seconds - one image per every four cores, therefore still completing a batch conversion process 3 times faster than an overclocked 6700k. Now consider if someone is using one of those upcoming 24core 3rd gen Threadrippers. Even if they're pairing it with a 2080Ti, that GPU is still not going to be twice as fast as a 2070 Super (Anandtech's 2080 Super review shows the 2080Ti being anywhere from 27% to 57% faster in compute workloads than the 2070 Super) while a 24core Threadripper very well can be twice as fast as a Ryzen 3900X since it'll have the exact same 6core-per-chiplet design but instead with 4 chiplets on a considerably larger package which will help facilitate cooling as well to maintain just as high of clocks as a 3900X (much like how previous-gen Threadripper clocked just as high as previous-gen mainstream Ryzen). And that's without considering that a 32core 3rd gen Threadripper could very well happen as well (same 8core-per-chiplet design as the upcoming 3950X but with 4 chiplets of course). Nevertheless, I would imagine that being able to use the CPU and GPU would be ideally suited for processors with integrated graphics since the performance of said iGPUs is going to be much slower VS a discrete GPU than the CPU cores on said processor would be vs a higher-end version of the same CPU architecture, (like the Ryzen 3850U + Vega 9 iGPU vs Ryzen 2700 + Vega 56; the CPU on the 3850U would only be maybe around 40% of the performance of a Ryzen 2700 non-x without overclock, but the Vega 9 iGPU will be more like...I dunno, maybe only 15-20% of the performance of a Vega 56?) And this difference would be even more dramatic if you compared Intel CPU + Intel iGPU since their iGPUs are not exactly known for their performance. |
Your silly long response could've been summarized to: "Maybe less so on the upcoming 24core CPU's". |
Sorry...I think I might just be too used to dealing with Reddit arm-chair experts that need everything broken down and explained in an attempt to prove that I'm not actually spewing nonsense. |
It's fine. I was just saying that a lot of people will only run old i5 or i3's probably even just in a laptop. dGPU's will definitely outperform most of these CPU's easily and are hence a good option. And I even agreed that running all devices concurrently is a good idea.. hence this issue is not closed.. and has a temporary solution listed. |
One idea I just had regarding what to do when using both GPU and CPU and one processor is substantially slower is that, once the faster processor has finished all other images and the slower processor is still trying to finish, instead have the faster processor also start converting the same image that the slower processor is currently trying to finish. Then once either processor finishes, simply automatically terminate and discard the conversion that was still going on the remaining processor. So for example, if your GPU is 5x faster than your CPU and it takes 1 minute to convert single image on your GPU but 5 minutes to convert a single image on your CPU, then even if the CPU is half-way done with the final image and the GPU finishes all of its allocated images, it would still be faster if the GPU then starts converting the same image that the CPU was in the middle of converting and then, once the GPU is done, the CPU's conversion process is canceled. This method can also be quite useful for when only converting maybe two or three images and one of your processors is fast enough that it can blast through the entire batch before a your other processor has a chance to even finish one image. Another key point of this method is that it does not discriminate whether the CPU or the GPU is the faster processor which can be key in more server and/or offline video rendering style PCs that tend to have highly multi-core CPUs that are sometimes not even paired with a discrete GPU if not just a low-powered one. Also there's that upcoming 64core Threadripper 3990X (as if the current 32core 3970X wasn't enough of a beast). |
Another potential issue here is that different processing units might have different amounts of memory available, meaning that the |
When I need to convert a large set of images and it's going to take multiple hours, it would be nice if I could use multiple devices to get the job done. I don't have a second GPU, though some people do and would probably appreciate it if they could use them in this manner. I do have a pretty good 8-core CPU though, and according to some tests I did it runs waifu2x-converter-cpp at about half the speed of my GPU. If I could use my CPU at the same time I could get the conversion done about 33% faster.
I guess you could split the image set into multiple parts, put them into different folders, and just spin up a new converter process for each of them but that's a pretty hacky way of doing it.
The text was updated successfully, but these errors were encountered: