Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 151 · 152 · 153 · 154 · 155 · 156 · 157 . . . 300 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,721,087
RAC: 9,415
Message 104036 - Posted: 5 Jan 2022, 1:47:18 UTC - in response to Message 104033.  
Last modified: 5 Jan 2022, 1:52:15 UTC

Your Ryzen is showing all 0's, so there is no way to see what's going on from this side.
Maybe includes some links to the failed tasks or run some more and post the links to those tasks before they disappear.
Sorry, you're probably looking at my own account, the Ryzen is currently operating through grcpool.com to earn gridcoins. This is the host ID for it currently: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=6167614 - and under application details, https://boinc.bakerlab.org/rosetta/host_app_versions.php?hostid=6167614 which shows for Python "Number of tasks completed 89" and "Consecutive valid tasks 0", and all the 4.2s worked ok, although I can't see the tasks. I can't run more as the server seems to have banned me from doing Python.

What tasks are you trying to run? 4.2 or Python?
Both. The Pythons failed. They looked ok here but did not validate.

Are you doing any sort of OC on your cores?
No.

One other option to try, kind of a catch all hail mary idea, run out all your RAH work and then reset the project.
Tried that, the project won't give me any. It says "no tasks sent".
ID: 104036 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,721,087
RAC: 9,415
Message 104037 - Posted: 5 Jan 2022, 1:50:18 UTC - in response to Message 104034.  

I looked at his setup and didn't see anything obviously wrong with it.
How long are they running? It should be around 2 to 4 hours or so.
Something like that. They show as completed ok here, but don't validate.

As for validation, I would suspect an antivirus might be interfering with them.
Just excluding the project does not always work, since "real time protection" still inspects the packets, and maybe changes them.
I would temporarily disable the AV and see what happens.
Same AV (AVG free) as on all 7 PCs, only this one failed. The one (the i5) that has run Python ok also has AVG with no exceptions set for Rosetta.
ID: 104037 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 104040 - Posted: 5 Jan 2022, 13:25:14 UTC - in response to Message 104037.  

Same AV (AVG free) as on all 7 PCs, only this one failed. The one (the i5) that has run Python ok also has AVG with no exceptions set for Rosetta.

Then you might have some corrupted files. I would set No New Work, abort all the ones you have in progress, and detach from the project.
Then, re-attach. It will download new files.
ID: 104040 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,721,087
RAC: 9,415
Message 104042 - Posted: 5 Jan 2022, 17:41:23 UTC - in response to Message 104040.  

Same AV (AVG free) as on all 7 PCs, only this one failed. The one (the i5) that has run Python ok also has AVG with no exceptions set for Rosetta.

Then you might have some corrupted files. I would set No New Work, abort all the ones you have in progress, and detach from the project.
Then, re-attach. It will download new files.

Already tried. I give up. If I leave gridcoin, I can get to the allow button. It seems to pay bugger all so I probably will.
ID: 104042 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104044 - Posted: 5 Jan 2022, 22:44:56 UTC - in response to Message 104036.  

Your Ryzen is showing all 0's, so there is no way to see what's going on from this side.
Maybe includes some links to the failed tasks or run some more and post the links to those tasks before they disappear.
Sorry, you're probably looking at my own account, the Ryzen is currently operating through grcpool.com to earn gridcoins. This is the host ID for it currently: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=6167614 - and under application details, https://boinc.bakerlab.org/rosetta/host_app_versions.php?hostid=6167614 which shows for Python "Number of tasks completed 89" and "Consecutive valid tasks 0", and all the 4.2s worked ok, although I can't see the tasks. I can't run more as the server seems to have banned me from doing Python.

What tasks are you trying to run? 4.2 or Python?
Both. The Pythons failed. They looked ok here but did not validate.

Are you doing any sort of OC on your cores?
No.

One other option to try, kind of a catch all hail mary idea, run out all your RAH work and then reset the project.
Tried that, the project won't give me any. It says "no tasks sent".



I checked the links you sent, but I see no tasks.
Which version of BOINC and Vbox are you running?
You can always try what Jim has said in the past and downgrade to 5.x in Vbox and its add on package and see what happens. That will only affect python. Why 4.2 doesn't work is odd. That's based on the root program they started with years ago. Usually no problems.

Perhaps you getting data corruption as he says through GRCpool. I have never operated through that, so no idea if that could affect things or not.

You might as a last resort go to the BOINC forum as ask there.

It's quite late in the EU, so I don't have time to investigate.
RAH will take you off Python if you send out to many errors. I had that happen and was able to reconnect.
But first try working out where the problem is at with your 4.2 tasks when you get them.
And maybe try running a clean version (no outside websites or whatever, just straight comms to RAH) and see what happens.
ID: 104044 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,721,087
RAC: 9,415
Message 104045 - Posted: 6 Jan 2022, 17:27:23 UTC - in response to Message 104044.  

I checked the links you sent, but I see no tasks.
I think they're not shown after a certain time, I was looking at https://boinc.bakerlab.org/rosetta/host_app_versions.php?hostid=6167614 which shows 89 pythons completed, but 0 consecutive valid tasks.

Which version of BOINC and Vbox are you running?
Latest Boinc, latest Vbox 6.1, although the computer which works ok is on 5.2.

You can always try what Jim has said in the past and downgrade to 5.x in Vbox and its add on package and see what happens. That will only affect python. Why 4.2 doesn't work is odd. That's based on the root program they started with years ago. Usually no problems.
I downgraded Vbox but am unable to reset the ban as I don't have access to the grcpool Boinc account.

Perhaps you getting data corruption as he says through GRCpool. I have never operated through that, so no idea if that could affect things or not.
It didn't stop my other computer running it, it's still doing them now.

And maybe try running a clean version (no outside websites or whatever, just straight comms to RAH) and see what happens.
I could do, but I've decided to stick to Gridcoin as I've worked out it pays for the hardware in 1.5 years of crunching, so not to be sneezed at.
ID: 104045 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,721,087
RAC: 9,415
Message 104046 - Posted: 6 Jan 2022, 17:39:10 UTC - in response to Message 104045.  
Last modified: 6 Jan 2022, 17:39:28 UTC

Just got an email from the grcpool admin about the previous ID problem which I've fixed anyway, so I've asked him if he could press the allow button for me now I'm on the earlier Vbox version.
ID: 104046 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104047 - Posted: 6 Jan 2022, 18:53:19 UTC - in response to Message 104045.  

I could do, but I've decided to stick to Gridcoin as I've worked out it pays for the hardware in 1.5 years of crunching, so not to be sneezed at.


I've had a glance at it, but never really dug into it at all.
After 1.5 years. Slow but pretty good investment.

I killed a CPU after a year of hard flat rate OC. So I don't do that anymore, that was an expensive lesson that even Gridcoin could not pay in 1.5 years. 2 places with a bunch of tech time. New MOBO and new CPU (that hurt). Since I want to run a bunch of projects I just dropped $150 (aprox) for 2 sticks of 16GB of memory, since python is such a memory hog and RAH does not allow us to control the amount of cores used. 2 sticks of 4 have been with me since they were some of the largest memory on the market many moons ago. I know expense. Now if GC could pay for my electric, then that might be something to look at.
ID: 104047 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,721,087
RAC: 9,415
Message 104048 - Posted: 6 Jan 2022, 19:06:07 UTC - in response to Message 104047.  
Last modified: 6 Jan 2022, 19:37:23 UTC

I don't OC because of crashes and data corruption. Never killed a CPU though, I thought they lasted forever. But then I never experimented too harshly with overvoltage, which sounds a very nasty thing to do to a chip. I don't actually see the point, Intel/AMD test these chips thoroughly to see what speed can go at reliably. I'm sure they know what they're doing.

There are ways of getting cheap electricity, some of which are legal :-) Discounts for direct debit, paperless billing, duel fuel, read your own meter, etc. And choosing a supplier that's 30% cheaper. Or using night time rates. Or installing solar panels and taking absurd government subsidies.

Just connected two dual xeons (I needed a proprietary cable to make the stupid things boot up), then fixed the same duplicated ID cloning problem I had before. They total 48 cores, and they're taking pythons :-)
ID: 104048 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,721,087
RAC: 9,415
Message 104049 - Posted: 6 Jan 2022, 21:00:41 UTC - in response to Message 104048.  

It's just a pity you can't use LHC with gridcoin. The reason being the stupid creditnew system screws up with multicore tasks, and is very easy to cheat, so people were managing 10x the coins they were due and taking money from the rest of us. LHC refuse to fix it and say it's Boinc's fault, which I agree with.
ID: 104049 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,721,087
RAC: 9,415
Message 104050 - Posted: 6 Jan 2022, 23:35:44 UTC - in response to Message 104049.  
Last modified: 6 Jan 2022, 23:36:28 UTC

If it's not my Vbox version, the only other difference is AMD vs Intel. Do AMDs run the Rosetta Python ok? They have different virtualization technology. And by ok I mean check if they are validated on the server, since they appeared alright on my end.
ID: 104050 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104051 - Posted: 6 Jan 2022, 23:53:01 UTC - in response to Message 104050.  

Shouldn't be.
Nobody else has complained about Intel.
I'm a AMD user, so really can't be of any help.

Again, via our view all the tasks on your computer have disaoeared.

If I can find some time tomorrow [Friday] (EU time) I will try to dig into validate errors.
Everything from my system is chugging along just fine.

You know there is one other thing you can check...look at the task itself and see if it was sent to another computer and what that computer got out of it. Valid or invalid.
If you both got invalid, then there is something wrong with the data.
If your #1 and invalid and then you look at it again and #2 is valid, then there is something wrong with your data.

You are completing the tasks, but get validation inconclusive?
Can you copy the readout from Stderr output on the task page if it is anything other than something like this:
<core_client_version>7.16.20</core_client_version>
<![CDATA[
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pdblite_boinc_998_10_tfirst--fuse--predictor_v11_boinc_fix--fuse--tslp_design_v1_boinc_fix_tyr.xml @tau_site_altern_row2_V_gggraft_bcov_flags -in:file:silent tau_site_altern_row2_V_gggraft_bcov_v1_xaa_SAVE_ALL_OUT_IGNORE_THE_REST_2oa4rj8j.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip tau_site_altern_row2_V_gggraft_bcov_v1_xaa_SAVE_ALL_OUT_IGNORE_THE_REST_2oa4rj8j.zip @tau_site_altern_row2_V_gggraft_bcov_v1_xaa_SAVE_ALL_OUT_IGNORE_THE_REST_2oa4rj8j.flags -nstruct 100 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2406449
Using database: database_357d5d93529_n_methylminirosetta_database
======================================================
DONE :: 100 starting structures 11805.6 cpu seconds
This process generated 100 decoys from 100 attempts
======================================================
BOINC :: WS_max 6.34278e+08
12:22:14 (16996): called boinc_finish(0)

</stderr_txt>
]]>

This was a valid task....I haven't had any invalids in so long....
ID: 104051 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,721,087
RAC: 9,415
Message 104052 - Posted: 7 Jan 2022, 0:36:49 UTC - in response to Message 104051.  

Shouldn't be.
Nobody else has complained about Intel.
I'm a AMD user, so really can't be of any help.
Have you misread my post? I'm having problems with AMD, and not Intel. If your AMDs are working fine, this shows AMD is ok. Rosetta recommend the latest Vbox. So if it isn't either, I can't think why my Ryzen has a problem. They all completed successfully here, but failed to validate on the server.

Again, via our view all the tasks on your computer have disaoeared.
Don't know why that happened. You can see the totals of consecutive validations, but not individual tasks, see https://boinc.bakerlab.org/rosetta/host_app_versions.php?hostid=6167614 under rosetta python projects - "Number of tasks completed 89", "Consecutive valid tasks 0"

If I can find some time tomorrow [Friday] (EU time) I will try to dig into validate errors.
Everything from my system is chugging along just fine.

You know there is one other thing you can check...look at the task itself and see if it was sent to another computer and what that computer got out of it. Valid or invalid.
If you both got invalid, then there is something wrong with the data.
If your #1 and invalid and then you look at it again and #2 is valid, then there is something wrong with your data.
I would assume if there was something wrong with the data, I was very unlucky. Assuming the grcpool admin resets my computer, I should be lucky next time.

You are completing the tasks, but get validation inconclusive?
Can you copy the readout from Stderr output on the task page if it is anything other than something like this:
<core_client_version>7.16.20</core_client_version>
<![CDATA[
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pdblite_boinc_998_10_tfirst--fuse--predictor_v11_boinc_fix--fuse--tslp_design_v1_boinc_fix_tyr.xml @tau_site_altern_row2_V_gggraft_bcov_flags -in:file:silent tau_site_altern_row2_V_gggraft_bcov_v1_xaa_SAVE_ALL_OUT_IGNORE_THE_REST_2oa4rj8j.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip tau_site_altern_row2_V_gggraft_bcov_v1_xaa_SAVE_ALL_OUT_IGNORE_THE_REST_2oa4rj8j.zip @tau_site_altern_row2_V_gggraft_bcov_v1_xaa_SAVE_ALL_OUT_IGNORE_THE_REST_2oa4rj8j.flags -nstruct 100 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2406449
Using database: database_357d5d93529_n_methylminirosetta_database
======================================================
DONE :: 100 starting structures 11805.6 cpu seconds
This process generated 100 decoys from 100 attempts
======================================================
BOINC :: WS_max 6.34278e+08
12:22:14 (16996): called boinc_finish(0)

</stderr_txt>
]]>

This was a valid task....I haven't had any invalids in so long....
Can't get to such things on my machine, due to grcpool owning the account.
ID: 104052 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 104053 - Posted: 7 Jan 2022, 1:56:48 UTC - in response to Message 104052.  

I'm having problems with AMD, and not Intel

That is interesting because I was working with the intel Vs AMD idea
Except i have more problems with my intel xeon cruncher than my AMD opteron,
pop go`s another theory as to why this stuff happens.
I have tried 5xx and 6xx Vbox and it seemed to make no difference to my problems.
ID: 104053 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,721,087
RAC: 9,415
Message 104054 - Posted: 7 Jan 2022, 2:16:41 UTC - in response to Message 104053.  
Last modified: 7 Jan 2022, 2:18:51 UTC

I'm having problems with AMD, and not Intel

That is interesting because I was working with the intel Vs AMD idea
Except i have more problems with my intel xeon cruncher than my AMD opteron,
pop go`s another theory as to why this stuff happens.
I have tried 5xx and 6xx Vbox and it seemed to make no difference to my problems.
So far I've only proved an Intel i5 works, and an AMD Ryzen 9 doesn't. I have 4 old Intel Xeons (X5650, 3 years older than yours) running overnight on python, I'll find out tomorrow if they work and post here.
ID: 104054 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104057 - Posted: 7 Jan 2022, 9:18:59 UTC - in response to Message 104054.  

I'm having problems with AMD, and not Intel

That is interesting because I was working with the intel Vs AMD idea
Except i have more problems with my intel xeon cruncher than my AMD opteron,
pop go`s another theory as to why this stuff happens.
I have tried 5xx and 6xx Vbox and it seemed to make no difference to my problems.
So far I've only proved an Intel i5 works, and an AMD Ryzen 9 doesn't. I have 4 old Intel Xeons (X5650, 3 years older than yours) running overnight on python, I'll find out tomorrow if they work and post here.



Why would Intel process the data any differently than AMD?
Data is data, a program is a program.
Or is Intel garbling the data?


And Peter, I am responding late at night and reading fast, so I might misread some details of your post. Sorry.
ID: 104057 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,721,087
RAC: 9,415
Message 104060 - Posted: 7 Jan 2022, 16:07:50 UTC - in response to Message 104057.  

I'm having problems with AMD, and not Intel

That is interesting because I was working with the intel Vs AMD idea
Except i have more problems with my intel xeon cruncher than my AMD opteron,
pop go`s another theory as to why this stuff happens.
I have tried 5xx and 6xx Vbox and it seemed to make no difference to my problems.
So far I've only proved an Intel i5 works, and an AMD Ryzen 9 doesn't. I have 4 old Intel Xeons (X5650, 3 years older than yours) running overnight on python, I'll find out tomorrow if they work and post here.



Why would Intel process the data any differently than AMD?
Data is data, a program is a program.
Or is Intel garbling the data?
Anything using virtualbox, like Rosetta's Python, or anything from LHC, requires hardware virtualisation, which is done differently with Intel and AMD. I can't find any info on what is different other than it's not very significant, but there may be something that causes a bug in one and not the other. But my Intels all work as far as I know (haven't had a validation from the xeons yet) and my AMD doesn't, which is the opposite of what you get, so perhaps it's nothing to do with AMD/Intel. I do notice however that if I have virtualbox on all the AMD's cores, the Windows interface slows to a crawl, and I've not seen that with an Intel, so something is different.

And Peter, I am responding late at night and reading fast, so I might misread some details of your post. Sorry.
I have trouble sleeping, so my hours are weird, I'm possibly half dozed off sometimes too.
ID: 104060 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104062 - Posted: 7 Jan 2022, 16:30:27 UTC - in response to Message 104060.  

I have trouble sleeping, so my hours are weird, I'm possibly half dozed off sometimes too.

Not me, 1am is my limit. Then I am off to bed and need the 8 hour recharge. 7.5 is the minimum.
But I have a very physically demanding job.
ID: 104062 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,721,087
RAC: 9,415
Message 104063 - Posted: 7 Jan 2022, 16:51:48 UTC - in response to Message 104062.  

I have trouble sleeping, so my hours are weird, I'm possibly half dozed off sometimes too.

Not me, 1am is my limit. Then I am off to bed and need the 8 hour recharge. 7.5 is the minimum.
But I have a very physically demanding job.
Lucky you, I have chronic fatigue :-(
ID: 104063 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 104066 - Posted: 7 Jan 2022, 19:22:01 UTC - in response to Message 104057.  

Why would Intel process the data any differently than AMD?
Data is data, a program is a program.
Or is Intel garbling the data?

This isn't any data this is `Python` data ,
and it will wot funky stuff it wants.
[that is a skit on the M&S adverts of uk tv]
ID: 104066 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 151 · 152 · 153 · 154 · 155 · 156 · 157 . . . 300 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org