Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 149 · 150 · 151 · 152 · 153 · 154 · 155 . . . 300 · Next
Author | Message |
---|---|
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,526,853 RAC: 8,525 |
Same here - I just found 5 tasks that are all at 99.999% after 3-4 days each. They are aaai, aaad, and abai tasks. I've tried suspending them and then letting them run again but that doesn't help so I'm going to abort them now. Anyone have any idea why this happens? It happens on some machines much more than others- this one is a dual Sandy Bridge Xeon is my worst offender: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3632346 |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,526,853 RAC: 8,525 |
Actually, it looks like the problem might be disk access. I've just had a look at Task Manager on that machine, which is showing that the SSD (120GB Kingston A400) is at 100%. It's only using 6.2GB of 16GB RAM, so I'd be surprised if it's smashing the page file. Stopping BOINC drops disk access to ~0%, and stopping other BOINC projects helped briefly but drive usage is back at 100%. Having aborted a batch of failed VBox tasks, there were a load of new tasks starting up. I presume that start-up requires a lot of disk activity and they're all fighting for it at the same time. EDIT: The disk was full. Windows finally popped a notice up to tell me. I've ordered a new SSD to put BOINC on. The problem is the huge size of these VBox tasks. If one VBox could run multiple threads /tasks then that might save a lot of disk space, assuming they're working from the same dataset. |
gbayler Send message Joined: 10 Apr 20 Posts: 14 Credit: 3,069,484 RAC: 0 |
@dcdc: Thank you for your answer! In my case, there are ~14 GB free on the disk. That's too little to get additional tasks, I can see entries like this in the syslog: Dec 30 14:57:40 i5-be-quiet boinc[2340611]: 30-Dec-2021 14:57:40 [Rosetta@home] Sending scheduler request: To fetch work. Dec 30 14:57:40 i5-be-quiet boinc[2340611]: 30-Dec-2021 14:57:40 [Rosetta@home] Requesting new tasks for CPU Dec 30 14:57:42 i5-be-quiet boinc[2340611]: 30-Dec-2021 14:57:42 [Rosetta@home] Scheduler request completed: got 0 new tasks Dec 30 14:57:42 i5-be-quiet boinc[2340611]: 30-Dec-2021 14:57:42 [Rosetta@home] No tasks sent Dec 30 14:57:42 i5-be-quiet boinc[2340611]: 30-Dec-2021 14:57:42 [Rosetta@home] rosetta python projects needs 5292.79MB more disk space. You currently have 13780.69 MB available and it needs 19073.49 MB. Not sure whether this interferes with the running tasks. In addition to the 3 problematic tasks there are 2 other tasks (also VBox tasks) on this machine that seem to run normally. I'm using Ubuntu 21.10 on an i5-8400, if that makes a difference. The system created now another task for the workunit that wasn't finished in time. I'm curious whether the next computer processing this WU will experience the same problems! |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
The jobs that run forever and use very little CPU power ("0 CPU") are only on Linux that I have found. They have been around since half the age of the universe, not that anyone at Rosetta is around to care. As I mention somewhere, they are easy to spot using BoincTask. I just abort them. But they do not seem to be a problem on Windows. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6893&postid=103883#103883 https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6893&postid=103823#103823 https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6893&postid=103689#103689 https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6893&postid=103659#103659 https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6893&postid=103493#103493 |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,721,087 RAC: 9,415 |
Actually, it looks like the problem might be disk access. I've just had a look at Task Manager on that machine, which is showing that the SSD (120GB Kingston A400) is at 100%. It's only using 6.2GB of 16GB RAM, so I'd be surprised if it's smashing the page file. Stopping BOINC drops disk access to ~0%, and stopping other BOINC projects helped briefly but drive usage is back at 100%. I run LHC and this on a 24 core machine. When this started Vbox aswell, I had to move Boinc to the rotary drive. I can't afford an SSD that big. |
Charles Tomaras Send message Joined: 18 Aug 09 Posts: 11 Credit: 25,233,682 RAC: 30,426 |
I haven't gotten any work units in at least a week now. I've tried resetting the project. I've now got other stuff running instead of Rosetta. I see no news that it's been down. Anything else I can do to figure out why I'm not receiving work units? |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,526,853 RAC: 8,525 |
Is anyone getting any work? I'm not picking up any python tasks at the moment. I see I'm not the only one! I've been getting work most of the week until now, but the server status shows there should be work available. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,123 |
I haven't gotten any work units in at least a week now. I've tried resetting the project. I've now got other stuff running instead of Rosetta. I see no news that it's been down. Anything else I can do to figure out why I'm not receiving work units? Check if you have virtualization enabled, and check if BOINC was installed with vbox. Some of this information appears near the start of the BOINC log file, if it was started recently enough. Also, check the server status at: https://boinc.bakerlab.org/rosetta/server_status.php The number of available tasks is currently rather low, and it's possible that all of these require different hardware than you have. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,526,853 RAC: 8,525 |
Mine was because those machines all produced errors so I had to go to details and hit "Allow" for each one. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,526,853 RAC: 8,525 |
Mine was because those machines all produced errors so I had to go to details and hit "Allow" for each one. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,526,853 RAC: 8,525 |
Mine was because those machines all produced errors so I had to go to details and hit "Allow" for each one. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
The jobs that run forever and use very little CPU power ("0 CPU") are only on Linux that I have found. I just had my first 0 CPU job on Win10, so I aborted it. Too bad. I was going to convert another Ubuntu machine to Windows, but I don't think so. I will wait until they fix it. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Is anyone getting any work? I'm not picking up any python tasks at the moment. I got hit with the same thing. A load of BS if you ask me. Any errors is on the RAH team. I run other VM projects no problems. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,721,087 RAC: 9,415 |
The jobs that run forever and use very little CPU power ("0 CPU") are only on Linux that I have found. I run them in about 4 to 7 hours on my Ryzen 9 3900XT. But one of them got up to 2 days and slowed right down past 99% complete. I aborted it. I've given up running them on slower machines, I just use the Ryzen and the i5. The others keep going over the deadline. |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
@dcdc: Thank you for your answer! I have been getting the `disk space` messages for some time even with 200GBG "free available to boinc" I have spent a lot of time messing about with the the thing to try and fix it with no affect. That "19073.49MB" message is always the same size on either system I run, whatever the other `want more` / `have got` variable size (MB) is. I think it has to be something written into the app itself, I also have difficulty getting more tasks if the `disk space` message is recently in the event log. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
The jobs that run forever and use very little CPU power ("0 CPU") are only on Linux that I have found. Ryzen 7 3700x (auto overclock) takes 4.8 hrs max to chew through Python. |
Pepino65 Send message Joined: 30 Jan 12 Posts: 1 Credit: 728,690 RAC: 0 |
The estimated time of 8 hours was completely unrealistic. I aborted 3 wu from 6 wu. Checkpoints are missing, I do not consider using the virtual box happy. The units below run a second time, the first time after 4 hours and a subsequent restart, the crunching has started again from the beginning. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,721,087 RAC: 9,415 |
How come my i5 is running them continuously, but when my Ryzen asks for them It says got no new tasks? There seems to be a continuous supply in server status. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,526,853 RAC: 8,525 |
Have you checked on the Ryzen computer's page to see whether the "allow" button is showing? It should say "skip". |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,526,853 RAC: 8,525 |
Duplicate... |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org