Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 212 · 213 · 214 · 215 · 216 · 217 · 218 . . . 300 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,155,895 RAC: 16,061 |
I've got one weird Python task that's been running now for 26hrs, but it is using the CPU 25.5hrs and has checkpointed regularly - most recently 8 minutes ago. I've got no idea why it won't end itself. Does the watchdog no longer work? CPU time 1d 02:32:21 I'm going to abort it now and see what it reports It should show here aagb-HPR_pp-NMPHE-GPN_pp-BPRO_pp_6_2605012_6_1 |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 258 Credit: 483,503 RAC: 133 |
does .out file in c:programdataboincslots[slot number here]shared change? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,155,895 RAC: 16,061 |
CPU time 1d 02:32:21 Apologies, it's this task, not the one shown above aagb-PHE_pp-mPIP-GGLY-mB3LEU_3_2686388_6_0 Run time 1 days 2 hours 37 min 11 sec Can anyone spot the error in the task? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,155,895 RAC: 16,061 |
does .out file in c:programdataboincslots[slot number here]shared change? Sorry, I didn't see this, but neither do I know what .out file I should look at, nor what slot it was running in, nor know if or how it might've changed. Task aborted now - I assume the info has gone now? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
CPU time 1d 02:32:21 No error I can spot before this line, then several: Hypervisor System Log: However, these can be due to the abort. It may be a task that ran much longer than expected, without anything going wrong. If so, just letting it run enough longer would have let it finish. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
does .out file in c:programdataboincslots[slot number here]shared change? To find the slot number click on the task in the tasks column, them on properties. The info is gone shortly after the output files are uploaded and the task is reported as finished. The probable change to look for is any change to the dates and size of the .out file. If there is more than out .out file in the slot directory, look for changes in the dates or size in all of them. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 258 Credit: 483,503 RAC: 133 |
you can copy out file twice waiting several minutes between copies and then compare two copies with winmerge . |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Looks like Rosetta 4.2 just got a batch of `miniprotein in , grab them while they iz hot front page job que went up by millions . |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 187 Credit: 6,375,290 RAC: 5,699 |
Looks like Rosetta 4.2 just got a batch of `miniprotein in , grab them while they iz hot I just got 25 4.2 work units and five are currently running. Mine are regular work units, not Rosetta mini work units. But the tasks look like this: Tue 24 May 2022 09:35:07 PM EDT | Rosetta@home | Starting task miniprotein_relax_v2_1_SAVE_ALL_OUT_IGNORE_THE_REST_5yb7eb8g_2914917_13_0 |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
It may be a task that ran much longer than expected, without anything going wrong. If so, just letting it run enough longer would have let it finish.I always leave them running unless the CPU is not actually being used. In that one, "CPU time 1d 02:32:21" I assume refers to real calculations, and "Elapsed time 1d 01:32:50" refers to actual time taken. I'm not familiar with wherever that came from, I use Boinctasks. So I think that one was calculating on a whole CPU core. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
I just got 25 4.2 work units and five are currently running.Same here, I have about 80, some are rb (I think I got those by chance just before the onslaught) some are miniprotein, all labelled Rosetta 4.2 as the application though. So a small protein but not a small work unit? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
It may be a task that ran much longer than expected, without anything going wrong. If so, just letting it run enough longer would have let it finish.I always leave them running unless the CPU is not actually being used. In that one, "CPU time 1d 02:32:21" I assume refers to real calculations, and "Elapsed time 1d 01:32:50" refers to actual time taken. I'm not familiar with wherever that came from, I use Boinctasks. So I think that one was calculating on a whole CPU core. CPU time is probably time used according to the small operating system inside the vbox64 emulation, which is usually close but not identical to the elapsed time,, or actual time used. That task would be calculating on a whole or physical core if nothing else was trying to use the other virtual core for that physical core. Multiple small proteins at once could give a long workunit. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
Every single one I've had failed has had bugger all CPU time compared to wall time. I usually notice 27 seconds of work has been done in 5 hours and cancel it. Everything else has run to completion. I wonder if there's an automated way to detect suss CPU time ratios? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
Ah, this is the problem. The Python book has only one use: |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
My quick analysis of desktop items in the photo , I see with python tasks they realy are comparing oranges with almonds . . . . |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
What surprises me is they couldn't afford a monitor with adjustable height. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
I also like the way the width of the monitors is of no use. Bring back 4:3! 16:9 is for TVs! |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
With some of the long work unit names rosetta has it gives a better chance to fit them on the screen , Save_aall_the_squishy_bIt5-and -puT_the_rest0uT_for_the_traj5.rAbid_raBit names All this digital tecknology creating a paperless society . not More and more bits of dead tree pulverized and squashed flat and skribled on to remind us WTF all that stuff on screen is about . |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 11,974 |
I once had a colleague with 30 post it notes all around her monitor with all the passwords she used. Paper is a renewable resource (and isn't it trapping that "evil" carbon?). At my work they said to stop using so much paper. People were alledgedly printing at 14p a page in colour. The management produced colour photocopiers that could do it for 6p a page. I pointed out we were actually using Brother printers with fake ink at 1p a page. The paper cost more than the ink even for a full colour page. Then I found out the "survey" on cost was done by the company (Xerox) renting us the photocopiers, using the cost of HP printers with genuine rip off ink. Then the arguments started. |
Felicia Send message Joined: 8 May 22 Posts: 7 Credit: 117,534 RAC: 0 |
I also like the way the width of the monitors is of no use. Bring back 4:3! 16:9 is for TVs! I love 16:9 on my 25 inch, it's better than 16:10 for running 2 programs side by side (or three when troubleshooting logs, webclient and server side) . That said, I've got a weird scheduling issue with my client. I have jobs that need to report before x but those jobs are not always the ones that get initiated when another job finishes. This leads to jobs reporting past their due date and I'm not sure whether that invalidates them. Screenshot (sorted by report before date): https://imgur.com/a/wbHnfzf There's 2 jobs that need to report before 28-5 6 am, and 2 that need to report before 11:30 am but there are 4 jobs running that need to report before 28-5 3:30pm and later. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org