Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 276 · 277 · 278 · 279 · 280 · 281 · 282 . . . 300 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,139,251 RAC: 16,277 |
On my systems, yes but it's a little deceptive. I have the 5800X at home which is my only PC now. I'm there half the week and do my daily stuff with Boinc in the gaps in the background. When I'm away it runs Boinc 100% which is what you're looking at. When I'm away (eg right now) I have the i5-9600K at the place I stay. It's 100% Boinc when I'm at home or work, but when I'm using it at night it usually takes 12h30 to 13h to run a 12hr task. I'm fine with that. And I've set up another PC at work with a different user name as part of my team - an old i7-4770 that does what it does and gets turned off each night. Losing 20, 30mins per task is fine by me. Boinc is, by definition, an occasional background job making use of downtime, not hogging uptime. Only losing seconds per task is mental. Losing 20/30/50mins per task shows they're working machines - just as it should be. Other GPU tasks with other projects may be different, but I don't run them.The practical difference is if a GPU application needs a full CPU core/thread for each running Task, and it has to share that core/thread with another Task being processed on the CPU, not only will the CPU processing times suffer, but the GPU output can tank massively. Only in the self-contained context of that individual task. If another task is also completing work over those same cores/threads in the same time, you have to add their processing together, not view them separately as, say, two badly running tasks. Thousands of tasks only sounds bad, because they're 4mins v 40mins each. I'd want to know the system-wide totals over 24hrs, with contention and without, to know if there were any losses at all. Are you trying to tell me it's 90% losses? Because I don't believe that at all. Is it rather single-digit %? I'd guess that's much nearer the mark. If someone worked it out and it was 1% losses or less, because every second of the day is doing something, I wouldn't automatically disbelieve it before going through the workings. In short, I think you're making a mountain out of a molehill. I wouldn't take an entire core out of Boinc processing to give CPU support to a GPU task, because that's 12.5% of an 8C/T or 6.25% of a 16C/T machine and I wouldn't guess the losses from contention would be as high as that. I'd let them all run - overcommit the CPU in your terms - and let the PC fight it out, knowing it'll do as much as it possibly can without me making assumptions about what it can or can't do that I'll never know in advance. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,523,781 RAC: 8,309 |
The screensaver of "ROSETTAVS_SAVE_ALL_OUT" wus crashes everytime on my Win11 machines... |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 258 Credit: 483,503 RAC: 133 |
It resolves database path incorrectly. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,473 RAC: 22,408 |
I have to tell you, I'm absolutely amazed that you think Boinc scheduling being wrong by 50% one way or 2-300% the other way for the bulk of the time a task is processing - and 100% of the time it's sitting waiting in the cache - is no kind of problem,And i am absolutely amazed & astounded you would think something that at no stage have i ever said or i suggested. No where have is said it is not a problem. What i have said is that it is not as big a problem as you make it out to be. What i have said is it is that it is not the root cause for the High Priority issues. It contributes to it, but it is not the cause. How on earth do you turn "it is not as big a problem as you make it out to be" in to "is no kind of a problem?" Seriously? How on earth can you think that??? It is a problem for Scheduling. But as i keep on repeating because you don't appear to be listening, it's not the cause of the High Priority issue. It's a contributing factor, but not the cause. The cause is the huge discrepancy between CPU time and Run time. but losing the odd few seconds or minutes during processing is a big issue. (Talking about my PCs here).Again, seriously??? Did you actually read what i posted there? I'll repost it. Your two systemsHow the hell does "but still not large" become "a big issue." Seriously- how??? Please do not try putting works in my mouth or attribute to me things that i have not said in any way, shape or form. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,473 RAC: 22,408 |
First thing to say is I didn't appreciate Folding runs at a dfferent (normal compared to low I assume) priority to Rosetta or other projects. I assumed they were all low priority.I hear by give up- as you've said above, how efficient your system is (ie how many Tasks are done each day), is of no importance. All that matters is not missing the deadlines. It's obvious you don't understand what i'm saying, no matter how many ways i try to present it. And what i do say, as other posts you have quoted show, you mis-interpret what is posted (i for the life me cannot understand how "it is not as big a problem as you make it out to be" could be interpreted to mean it "is no kind of a problem" or "but still not large" becomes "a big issue."). But for anyone else that's been reding these posts- BOINC makes use of unused computing time. Running other heavy CPU usage programmes isn't an issue, and if you set BOINC to recognise that there are other heavy usage processes running it won't have an adverse impact on your BOINC processing. If you limit the number of cores/threads available to BOINC, you will maximise your BOINC processing. You will get the maximum possible amount of work done each day that your system is capable of, you won't have issues with deadlines (unless of course you have inappropriate cache settings), or Panic Mode or any of those types of issues. So whether you have 2 cores/threads or 256, if you're running other CPU intensive Tasks then set your "Use at most 100 % of the CPUs" to an appropriate value. If you've got 2 cores/threads, set it to 50%, 256 cores/threads set it to 0.5% (or 1% if it won't accept 0.5), 7% if you have 16 cores/threads. It's not hard to work out. Then your Tasks will run for as long as they needed to- ie Run time will match (or be damn close to) CPU time, not 1.5, 2 or 4 or more times longer than they need to. If you don't do much CPU heavy stuff with your system, then there's no need to reserve some cores/threads. If you're doing considerable non-BOINC work, and how efficient your system is at doing BOINC work is of no importance at all (ie how many Tasks you actually process each day), along with the occasional missed deadline, then don't bother with reserving any cores/threads for non-BOINC work. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,473 RAC: 22,408 |
And now i've done all that, i'll just wait for the next Rosetta server crash to occur. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,473 RAC: 22,408 |
Oh, and in case no one had noticed, we now have a batch of Beta work that is running for 8 hours, and takes roughly 1GB of RAM per Task, the RosettaVS_ Tasks. So those with large multicore/thread systems & low amounts of system RAM may have some issues if they get a full load of them. Grant Darwin NT |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
The root cause for the panic mode is highy misconfugured client, too large cache is just a small part of it.Given panic-mode means Boinc realises tasks can't be completed within deadline, preventing Panic mode occurring is the entire solution.Eliminating the reason for the panic mode is the entire solution, everything else is a workaround, which might fail as soon as something changes (new WU type, new project, whatever) or even before. This isn't a problem, because Adrian (in this case) said both projects are important to him.Than he should configure BOINC properly so it can coexist with Folding without any issues, currently it seems he doesn't really care if BOINC works properly or not. . |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,473 RAC: 22,408 |
Oh, and in case no one had noticed, we now have a batch of Beta work that is running for 8 hours, and takes roughly 1GB of RAM per Task, the RosettaVS_ Tasks.Getting a few of those Tasks using 2GB of RAM each. Grant Darwin NT |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 187 Credit: 6,368,269 RAC: 5,669 |
in case no one had noticed, we now have a batch of Beta work that is running for 8 hours, and takes roughly 1GB of RAM per Task, the RosettaVS_ Tasks. Mine look like this. (This is one of them.) Is it one of the ones to which you refer? RosettaVS_ Tasks If not, how are the ones to which you refer identified? Application Rosetta Beta 6.05 Name 7a_hal_l_hal_7aa_391_d694_ce_0001_SAVE_ALL_OUT_2977935_67 State Running Received Fri 26 Apr 2024 02:37:53 AM EDT Report deadline Mon 29 Apr 2024 02:37:53 AM EDT Estimated computation size 80,000 GFLOPs CPU time 05:15:37 CPU time since checkpoint 00:17:21 Elapsed time 05:19:11 Estimated time remaining 02:44:47 Fraction done 65.667% Virtual memory size 468.18 MB Working set size 364.18 MB Directory slots/11 Process ID 2777585 Progress rate 12.240% per hour Executable rosetta_beta_6.05_x86_64-pc-linux-gnu |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,473 RAC: 22,408 |
Mine look like this. (This is one of them.) Is it one of the ones to which you refer? RosettaVS_ Tasks If not, how are the ones to which you refer identified?Exactly the way i posted- they start with RosettaVS_ The one you posted starts with 7a_hal_l_hal_ Application Grant Darwin NT |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 258 Credit: 483,503 RAC: 133 |
Mine look like this. (This is really one of them.)
|
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 187 Credit: 6,368,269 RAC: 5,669 |
OK. I now have three of the RosettaVS_ Tasks and they are as you say. Since I have 128 GBytes of RAM, I do not expect problems. Application Rosetta Beta 6.05 Name RosettaVS_SAVE_ALL_OUT_NOJRAN_KCa2_homology_fulldb_IGNORE_THE_REST_vF8nFW_8_1999_2977959_2 Estimated computation size 80,000 GFLOPs Virtual memory size 1.19 GB Working set size 1.03 GB Progress rate 10.440% per hour Executable rosetta_beta_6.05_x86_64-pc-linux-gnu Mine look like this. (This is one of them.) Is it one of the ones to which you refer? RosettaVS_ Tasks If not, how are the ones to which you refer identified? |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,523,781 RAC: 8,309 |
The validation server is down... |
mrchips Send message Joined: 11 Nov 09 Posts: 10 Credit: 14,603,578 RAC: 16,696 |
issues State: All (3339) · In progress (163) · Validation pending (154) · Validation inconclusive (0) · Valid (2933) |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,473 RAC: 22,408 |
The validation server is down...Not again... At least the rest are still up (for now). Yep, boinc-process is down again. It wouldn't be a big ask to run a Cron job on a system remote from the servers to check if they're there & running or not, and send an email and text to someone to let them know if they've go MIA... Looking at the hardware list, it is getting on (and the OS is 8 years old!). Even a single socket mid-range CPU of the lower end EPYC systems could replace all of the existing systems, with not only significantly more performance, but all while using way, way, way less power. Price wise they're a bargain for what they can do, but they're still not exactly cheap in absolute terms. Grant Darwin NT |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,523,781 RAC: 8,309 |
Yep, boinc-process is down again. Insert, during the boinc project server creation/configuration, a MANDATORY e-mail to use for emergency (daemon crash, problem with queues, etc) But i think it needs to be done by the boinc developers... Looking at the hardware list, it is getting on (and the OS is 8 years old!). I also noticed that os and hw is old. But another volunteer said to me that, maybe, the status server page is not updated and that, maybe, the hw and os is updated. I don't think so. P.S. Now, over 200k wus pending validation!! |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,523,781 RAC: 8,309 |
P.S. Now, over 200k wus pending validation!! Now 270k And no news from admins |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,473 RAC: 22,408 |
Server is still dead. Grant Darwin NT |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 187 Credit: 6,368,269 RAC: 5,669 |
Server is still dead. It seem mostly up for me. top - 20:51:09 up 2 days, 12:17, 2 users, load average: 13.33, 13.65, 13.72 Tasks: 474 total, 14 running, 460 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.9 us, 0.2 sy, 80.3 ni, 18.4 id, 0.0 wa, 0.2 hi, 0.0 si, 0.0 st MiB Mem : 128074.1 total, 33544.1 free, 6219.7 used, 88310.2 buff/cache MiB Swap: 15992.0 total, 15992.0 free, 0.0 used. 120200.2 avail Mem PID PPID USER PR NI S RES %MEM %CPU P TIME+ COMMAND 469545 2039 boinc 39 19 R 1.4g 1.2 98.8 15 287:51.62 ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.05_x86_64-pc-li+ 504299 2039 boinc 39 19 R 444456 0.3 98.8 5 26:25.33 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-g+ 482867 2039 boinc 39 19 R 213072 0.2 98.6 13 208:50.81 ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4G_1.33_x86_64-pc+ 504592 2039 boinc 39 19 R 212384 0.2 99.1 6 24:10.34 ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4G_1.33_x86_64-pc+ 2039 1 boinc 30 10 S 73336 0.1 0.1 6 44900:08 /usr/bin/boinc |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org