Message boards : Number crunching : rosetta python projects (vbox64)
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
ATLAS has been up since 2014 and they have been doing VM for how long? My Italian friend, have you not read what has been posted here just a little bit ago? We will be lucky if SID gets through to the team. If so, it is now just Monday approaching 4am (- 9 hrs from here) so the earliest someone might see his email is 5-6 hours from now. And then to delegate it to the right person, at least the rest of the day if at all. If you read my post and SID's post, we both stated that the project has very little monitoring from the lab. DEK was a tech person who moved on years ago and watched things here in the forum. We then had MOD. SENSE who was a interface to the team, but he has moved on. These were probably Graduate Students who completed their research and left the project. Since MOD. SENSE moved on we don't have anyone that monitors the forums for the project. The only time something is done about a bug is when the results come back with tons of errors, then someone fixes it without saying anything and the project goes on. As I have suggested before, the best thing to do is just stop the project for now and wait until the bug is fixed or go research how to write a command to isolate Python and keep doing 4.20 work. Those are the only options. So for now, sit back and wait and watch this area for more details. But there is nothing we can do for now. This projects lab is a monday through friday operation. No one reads emails on the weekend. I understand you frustration, but again, there is nothing we as users/volunteers as they call us can do to solve your problem. We have told you everything we can find on the web and some have told you more advanced things. Since nothing works, your out of luck for a few days or the rest of the week or until the scheduler at the University gives up on your machine and sends you only the 4.20 work instead. There is nothing you can change in your preference either. You get what you get. That's it. Sorry my friend, but that is just the way it goes here. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1991 Credit: 9,520,400 RAC: 12,860 |
My Italian friend, have you not read what has been posted here just a little bit ago? Yes, i read it :-P So for now, sit back and wait and watch this area for more details. But there is nothing we can do for now. This projects lab is a monday through friday operation. No one reads emails on the weekend. Meantime i'm crunching Tn-Grid. I understand you frustration, but again, there is nothing we as users/volunteers as they call us can do to solve your problem. We have told you everything we can find on the web and some have told you more advanced things. Since nothing works, your out of luck for a few days or the rest of the week or until the scheduler at the University gives up on your machine and sends you only the 4.20 work instead. There is nothing you can change in your preference either. You get what you get. That's it. I know, i know You and SID are great! (and i'm crunching this project with notebook, that doesn't download VM, but only 4.20) |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,473,844 RAC: 11,503 |
Does anyone know/have any idea what is causing the MD5 checksum error? Some of my machines are running VBox tasks without issue, and others are racking up hundreds of gigabytes of failed downloads. The Bakerlab servers must have pushed out petabytes of failed tasks over the weekend by now! Is it that a bad MD5 hash was downloaded early on, and is being repeatedly used? Or is there a file that contains a bad version of the MD5 algorithm that needs deleting and re-downloading? If it were just a server-side issue then surely all PCs would either download the tasks or would all fail? |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,473,844 RAC: 11,503 |
Ah, so this is in my log on a machine that fails to download the vdi file: 20/09/2021 18:23:27 | | [http_xfer] [ID#7] HTTP: wrote 16384 bytes 20/09/2021 18:23:27 | | [http_xfer] [ID#7] HTTP: wrote 16384 bytes 20/09/2021 18:23:27 | | [http_xfer] [ID#7] HTTP: wrote 15050 bytes 20/09/2021 18:23:28 | Rosetta@home | Finished download of AIMNet_vm_v2.vdi 20/09/2021 18:23:28 | | [statefile] set dirty: pers_file_xfer_set poll 20/09/2021 18:24:09 | | [slot] removed file projects/boinc.bakerlab.org_rosetta/AIMNet_vm_v2.vdi.gz 20/09/2021 18:24:09 | Rosetta@home | [error] MD5 check failed for AIMNet_vm_v2.vdi 20/09/2021 18:24:09 | Rosetta@home | [error] expected d41d8cd98f00b204e9800998ecf8427e, got 61fef19456bb58ec941845ef08d8c5ef 20/09/2021 18:24:09 | Rosetta@home | [error] Checksum or signature error for AIMNet_vm_v2.vdi 20/09/2021 18:24:09 | | [statefile] Writing state file 20/09/2021 18:24:09 | | [statefile] Done writing state file And the MD5 hash d41d8cd98f00b204e9800998ecf8427e is the hash for an empty file. So is the file deleted before it is hashed? Maybe by my antivirus (Avast)? Will try that next. It is suspicous that it says it has removed the file and then that the MD5 fails... Any idea why BOINC would remove that file? |
Admin Project administrator Send message Joined: 1 Jul 05 Posts: 4805 Credit: 0 RAC: 0 |
The MD5 hash values were indeed incorrect possibly due to a filesystem issue on our end when the jobs were created. I've fixed the MD5 values in our database so these errors should no longer be an issue for newly issued jobs. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2114 Credit: 41,100,175 RAC: 22,181 |
If you read my post and SID's post, we both stated that the project has very little monitoring from the lab. Just to clarify, DEK is very much still around at the project, just not in the forums. When we started on all the Covid tasks 18 months ago he posted quite a lot in the forums for a few weeks, but aiui the Project team is very small and being here killed their productivity just when stacks of results were coming back to them, so had to back off completely. The way I remember it, when an issue came up that needed someone's attention, Mod.Sense wrote a brief summary of the problem and where the solution may lie, and it seemed to me like he must've emailed a link to that message so the Admins didn't have to wade through pages of whining to find what the issue was. And then Mod.Sense disappeared and that link got lost. And here we now are. On this VBox problem, I have so little idea what people are talking about I had to send a link to the whole thread rather than a specific summary message, so I haven't been able to be as clear as I'd like. But I now see a post from Admin just above, so if they've fixed that "MD5 hash value" that should make a big difference. Feed back on how new downloads are going and whether tasks now run ok or not |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1991 Credit: 9,520,400 RAC: 12,860 |
But I now see a post from Admin just above, so if they've fixed that "MD5 hash value" that should make a big difference. It works!! No more "download error" |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1991 Credit: 9,520,400 RAC: 12,860 |
My first correct VM wu correct!! |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
If you read my post and SID's post, we both stated that the project has very little monitoring from the lab. DEK is still here? This problem should have gone to him, but ADMIN seems to be more in tune with the specifics. But they should pay attention to the forums every now and then or at least monitor the results and if a huge amount comes back with errors, then look and see what the problem is. Did they test Python on RALPH or is it just a rumor that it was dumped here with no testing? |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,473,844 RAC: 11,503 |
Why do you think it should have gone to DEK? And yes - some vbox tasks were tested on Ralph, but I don't believe any were from this batch. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1991 Credit: 9,520,400 RAC: 12,860 |
Did they test Python on RALPH or is it just a rumor that it was dumped here with no testing? Not tested on Ralph. And, on Ralph, the version of app is 0.21, here is 1.03, so i don't know if it is the same app.... |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Did they test Python on RALPH or is it just a rumor that it was dumped here with no testing? That's unusual |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1991 Credit: 9,520,400 RAC: 12,860 |
And, on Ralph, the version of app is 0.21, here is 1.03, so i don't know if it is the same app.... Here and here you can see. Also for me it's strange, but maybe the code is the same and the difference is only the numbering. Or maybe they are different. Usual lack of communication about project |
Admin Project administrator Send message Joined: 1 Jul 05 Posts: 4805 Credit: 0 RAC: 0 |
The specific VM image was not tested on ralph but there is little change between them. The jobs were tested on Ralph. The ralph test would not have caught the MD5 checksum error unfortunately. Coincidentally UW IT moved some of our hardware while the big batch was submitted and this may have caused the error. - DEK |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2114 Credit: 41,100,175 RAC: 22,181 |
DEK is still here? This problem should have gone to him, but ADMIN seems to be more in tune with the specifics Iirc he used to post here as dekim but just as likely he posts as Admin sometimes, because, why not? I have no idea about testing or Ralph as I'm not using either Edit: And there you are Edit2: My main PC has been crashing and blue-screening for ages, so I don't even have any idea if normal tasks are going through ok atm, but I updated my BIOS and other low-level stuff last night and I may finally be stable enough to find out what the hell's been going on |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
The specific VM image was not tested on ralph but there is little change between them. The jobs were tested on Ralph. The ralph test would not have caught the MD5 checksum error unfortunately. Coincidentally UW IT moved some of our hardware while the big batch was submitted and this may have caused the error. - DEK Hi DEK, nice to see you back here on the forums! |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
DEK is still here? This problem should have gone to him, but ADMIN seems to be more in tune with the specifics Have you looked for chipset updates and ran a windows cleaner lately to clear the registry and the drive? Have you done CHKDSK at all? Just a few more things to check. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1991 Credit: 9,520,400 RAC: 12,860 |
The specific VM image was not tested on ralph but there is little change between them. The jobs were tested on Ralph. The ralph test would not have caught the MD5 checksum error unfortunately. Coincidentally UW IT moved some of our hardware while the big batch was submitted and this may have caused the error. - DEK Thanks for the informations!!!! P.S. Maybe some infos about WHAT we are crunching with this new app.... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2114 Credit: 41,100,175 RAC: 22,181 |
Edit2: My main PC has been crashing and blue-screening for ages, so I don't even have any idea if normal tasks are going through ok atm, but I updated my BIOS and other low-level stuff last night and I may finally be stable enough to find out what the hell's been going on It's the relatively new Ryzen 5800X I got last December. I updated everything in March with new chipset drivers, which haven't had a new version issued since, but there have been lots of new BIOS versions with stability and performance fixes, so it seems like I'm not alone with issues. Updated VGA & Audio drivers too while I was there and it's looking good in the first 24hrs. Running a touch faster, a lot cooler and no crashes, which I was experiencing daily if not more often. I'll give it until the weekend as I'll be away again from tomorrow until Sunday - fingers crossed. |
Message boards :
Number crunching :
rosetta python projects (vbox64)
©2024 University of Washington
https://www.bakerlab.org