Message boards : Number crunching : Discussion of the merits and challenges of using GPUs
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
For those interested only in projects related to medical research, the only choice now appears to be Folding@home, which wasn't set up to be compatible with BOINC projects. It's possible, but difficult, to run it on a computer that has BOINC running at the same time. Their forums currently aren't working. I run Folding on the GPU on all my machines with BOINC on the CPU work units. It is no more difficult than the usual annoyances with Folding. That is, you have to set it up and then delete the "CPU" slot, or it will run by default (and check it again - you usually have to do it twice). And you of course have to reserve a CPU core to support the GPU, as with most setups. But they have a new version of their app recently, which may ease the setup. It won't take long to get the hang of it. And their forums are up, and have been for some time. Maybe you were not trying the SSL version? https://foldingforum.org/index.php If you are interested in other types of GPU projects, note that Asteroids@home currently has disk space problems interfering with uploads. I am about to post a comparison of how awful their GPU version is as compared to the CPU version for efficiency. It will be something like 40 watt-hours per work unit for the GPU (i.e., GTX 1060 or 1070), and about 14 watt-hours for the CPU. They should ban the GPU version to save the planet. (It has been stated by others before, but should be emphasized again.) |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
i don't like crunching on gpus, i do play with some python tensorflow stuff on the side, and watch how it works. the simple ones like if it is a pre-trained convolution neural network (CNN), it would run for a fraction of a second and one would not feel any different. but if it is they other way round say if you are training a complex CNN network with lots of data (say images) the gpu can run at full speeds (loud fans) maximum loads for hours consuming more than a hundred watts (the top tier ones probably consume many hundreds of watts ) . if electricity costs isn't after all cheap, doing such computation can be expensive in electricity bills. gpus are used where their use are relevant and appropriate, e.g. those CNN stuff, and a lot of those CNN models are rather huge, and the training / update process are so data intensive it would generate terabytes of network data if traiing distributed across the network even for a rather modest / small CNN model. so for those it would be more appropriate to just have it run in the GPU rather than spill terabytes of data in conventional inter-networks in minutes, flooding and choking the whole networks. |
![]() Send message Joined: 16 Jun 08 Posts: 1250 Credit: 14,421,737 RAC: 0 |
[snip]
I'm not sure if I was or not. However, that link allows me to read the forums, but I still can't log in to post anything there. |
![]() Send message Joined: 1 Dec 05 Posts: 2124 Credit: 12,426,657 RAC: 2,579 ![]() |
The Open Pandemics subproject at World Community Grid currently does COVID-19 work, using CPUs only, but is thinking of creating a GPU version of their software. 'cause the project is based on Autodock. And Autodock has a gpu version Also project Quarantine@Home (it's not a boinc project) is using gpu. |
![]() Send message Joined: 16 Jun 08 Posts: 1250 Credit: 14,421,737 RAC: 0 |
The Open Pandemics subproject at World Community Grid currently does COVID-19 work, using CPUs only, but is thinking of creating a GPU version of their software. I've read that Autodock development has gone in two different directions, producing one version that can use a GPU and another version with the changes needed for COVID-19 work. IF they can find someone who can merge the two sets of changes, THEN Open Pandemics should have a GPU version they can use. A Google search did not find Quarantine@Home. Can you give me a link to that project? Is it able to share a GPU with Folding@Home? |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
A Google search did not find Quarantine@Home. Can you give me a link to that project? Is it able to share a GPU with Folding@Home? https://quarantine.infino.me/ But the GPU version is only for Linux. The Windows version is only on the CPU at the moment. |
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,647,009 RAC: 287 |
The Open Pandemics subproject at World Community Grid currently does COVID-19 work, using CPUs only, but is thinking of creating a GPU version of their software. They are working on the GPU version: https://twitter.com/ForliLab/status/1261194223811887109 Edit: One of the people on the OPN research team is a CUDA/OpenCL developer. |
![]() Send message Joined: 1 Dec 05 Posts: 2124 Credit: 12,426,657 RAC: 2,579 ![]() |
Interesting article about C++/Sycl/OpenCl |
![]() Send message Joined: 1 Dec 05 Posts: 2124 Credit: 12,426,657 RAC: 2,579 ![]() |
Sycl 2020 provisional specification SYCL is a standard C++ based heterogeneous parallel programming framework for accelerating High Performance Computing (HPC), machine learning, embedded computing, and compute-intensive desktop applications on a wide range of processor architectures, including CPUs, GPUs, FPGAs, and AI processors. SYCL 2020 is based on C++17 and includes new programming abstractions, such as unified shared memory, reductions, group algorithms, and sub-groups to enable high-performance applications across diverse hardware architectures. |
![]() Send message Joined: 16 Jun 08 Posts: 1250 Credit: 14,421,737 RAC: 0 |
What we really need for GPU use is a compiler that can automatically identify groups of sections of the program that are running operations that CAN safely run without any of the sections within the group writing to any memory location used by any other section of the group. This would make it possible to just recompile the program with that compiler, with no programmer effort to modify the source code first. This is often, but not always, running the same operations on multiple sets of data. A few problems with this: GPU clock speeds are typically about a quarter of the clock speeds of CPUs produced at about the same time. This means that on the average, four threads on the GPU must be running at the same time just to make the GPU do the work as fast as a CPU-only program. At least for NVIDIA-based GPUs, the GPU cores come in groups (warps for NVIDIA). Within each group, if one core is doing an operation, all of the others must either be doing that same operation (probably on different data), or be doing nothing. That means that if there if an if-then-else in the GPU part of the program, the then part and the else part can only be doing different operations simultaneously if they are in different GPU core groups. I have not checked if this is also true for other brands of GPUs, but I suspect that it is. BOINC projects normally offer GPU versions of their programs only if those version will produce the outputs in no more than a tenth of the time required for the CPU versions to do it. The last time the Rosetta@home project tried to produce a GPU version, it gave outputs slightly faster than the CPU version for some users, and slightly slower for others. I've seen nothing since then about whether it has been tried again with the more recent versions of their program. This means using an average of at least 40 GPU cores at a time, which is impossible for GPUs that have less than 40 GPU cores. BOINC has a section to allow GPU work written in CUDA, and a section to allow GPU work written in OpenCL. Adding the capability to run GPU work written in any other computer language requires either a compiler that first transforms the source code to CUDA or OpenCL and then compiles that, or major modifications to BOINC to add yet another section to support GPU work written in that computer language. Such major modifications to BOINC have, in the past, taken a few years each. Unless you can hold your breath for a few years at a time, don't hold your breath waiting for such a major modification. |
![]() Send message Joined: 1 Dec 05 Posts: 2124 Credit: 12,426,657 RAC: 2,579 ![]() |
BOINC has a section to allow GPU work written in CUDA, and a section to allow GPU work written in OpenCL. Adding the capability to run GPU work written in any other computer language requires either a compiler that first transforms the source code to CUDA or OpenCL and then compiles that, or major modifications to BOINC to add yet another section to support GPU work written in that computer language. Such major modifications to BOINC have, in the past, taken a few years each. Unless you can hold your breath for a few years at a time, don't hold your breath waiting for such a major modification. Indeed, the idea of SYCL is to write app in C++ (does not need any change to boinc infrastructure) and runs it in heterogeneous hw (using Cuda/OpenCl like a sort of "dialect" of C++). Meantime i hold my breath :-P |
![]() Send message Joined: 1 Dec 05 Posts: 2124 Credit: 12,426,657 RAC: 2,579 ![]() |
Indeed, the idea of SYCL is to write app in C++ (does not need any change to boinc infrastructure) and runs it in heterogeneous hw (using Cuda/OpenCl like a sort of "dialect" of C++). And Sycl, often, is faster than Cuda!! Sycl and Cuda |
![]() Send message Joined: 1 Dec 05 Posts: 2124 Credit: 12,426,657 RAC: 2,579 ![]() |
oneAPI with support to Sycl 2020 |
mikey![]() Send message Joined: 5 Jan 06 Posts: 1898 Credit: 12,723,752 RAC: 682 ![]() |
What we really need for GPU use is a compiler that can automatically identify groups of sections of the program that are running operations that CAN safely run without any of the sections within the group writing to any memory location used by any other section of the group. This would make it possible to just recompile the program with that compiler, with no programmer effort to modify the source code first. This is often, but not always, running the same operations on multiple sets of data. So are you saying they need to break it down into small chunks of work that can then run independently of each other and report back to the core program that can combine them into the next batch of small chunks of data until the task is complete? Much like committes at workplaces do things? If that could happen several Boinc projects may be able to benefit from that. |
![]() Send message Joined: 16 Jun 08 Posts: 1250 Credit: 14,421,737 RAC: 0 |
[snip] So are you saying they need to break it down into small chunks of work that can then run independently of each other and report back to the core program that can combine them into the next batch of small chunks of data until the task is complete? Much like committes at workplaces do things? If that could happen several Boinc projects may be able to benefit from that. Not fully independently. The warps in Nvidia GPUs require an even smaller breakdown within each workunit where the cores within each warp must USUALLY be be doing the same operations on separate sets of data, or expect a major slowdown due to limits on how many GPU cores can be active at once. What you described is more like the main principal of BOINC works, whether for CPUs or for GPUs. |
mikey![]() Send message Joined: 5 Jan 06 Posts: 1898 Credit: 12,723,752 RAC: 682 ![]() |
[snip] Ok thanks. |
![]() Send message Joined: 1 Dec 05 Posts: 2124 Credit: 12,426,657 RAC: 2,579 ![]() |
The previous attempt at a GPU version gave one that ran at about the SAME speed as the CPU version - a little slower on some computers, and a little faster on others. This was not considered fast enough to make further development worthwhile. I'm wrong. The previous attempt was over 7 years ago. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6475&postid=76916#76916 |
![]() Send message Joined: 1 Dec 05 Posts: 2124 Credit: 12,426,657 RAC: 2,579 ![]() |
Indeed, the idea of SYCL is to write app in C++ (does not need any change to boinc infrastructure) and runs it in heterogeneous hw (using Cuda/OpenCl like a sort of "dialect" of C++). As i said, Nvidia released CUDA C++ standard library as open source. works with not only NVIDIA CUDA enabled configurations but also CPUs |
![]() Send message Joined: 1 Dec 05 Posts: 2124 Credit: 12,426,657 RAC: 2,579 ![]() |
Indeed, the idea of SYCL is to write app in C++ (does not need any change to boinc infrastructure) and runs it in heterogeneous hw (using Cuda/OpenCl like a sort of "dialect" of C++). As i said in Ralph's forum: Intel, with the Heidelberg University, is working on port oneAPI/DPC++ on AMD Gpu thanks to HypSycl Codeplay is working on port oneAPI/DPC++ on Nvidia Gpu thanks to SYCL |
![]() Send message Joined: 1 Dec 05 Posts: 2124 Credit: 12,426,657 RAC: 2,579 ![]() |
'cause is a simple rebrand of OpenCl 1.2. They abandoned OpenCl 2.x to his fate. And, today, OpenCL 3.0 is finalised, with initial SDK and a C++ Kernels |
Message boards :
Number crunching :
Discussion of the merits and challenges of using GPUs
©2025 University of Washington
https://www.bakerlab.org