Message boards : Number crunching : why is this machine failing so much?
Author | Message |
---|---|
zombie67 [MM] Send message Joined: 11 Feb 06 Posts: 316 Credit: 6,621,003 RAC: 0 |
https://boinc.bakerlab.org/rosetta/results.php?hostid=336493 Most, but not all error out. Why? Reno, NV Team: SETI.USA |
FluffyChicken![]() Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
https://boinc.bakerlab.org/rosetta/results.php?hostid=336493 The 131 error is a file size to big ... more detail on this says an output file was bigger than max_nbytes http://boinc-wiki.ath.cx/index.php?title=Error_Code So I would guess somthing is wrong ;-) , lol Team mauisun.org |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Have you tested the hardware with Memcheck86+ and a HD diagnostic from the manufacturer of the HD.. (the whole collection of current manufacturer's HD diagnostics are on the Ultimate boot CD). And temporarily setting up a clean install of the OS on another HD, install Boinc and let it load a fresh copy of Rosetta - to see if the problem disappears with a clean install? |
![]() Send message Joined: 26 Sep 06 Posts: 7 Credit: 536,631 RAC: 0 |
Mine's the same. The last several units from this machine have been rejected as "Compute Error" amd "Client Error". After some days of computer time and power, that's not welcome. SETI just plugs on with never an error. I'm disconnecting this machine from Rosetta. |
zombie67 [MM] Send message Joined: 11 Feb 06 Posts: 316 Credit: 6,621,003 RAC: 0 |
It is an old PII Inspiron 7000 laptop. I have reinstalled the OS several times, the most recent reinstall was Friday evening. It does not error out for Docking or SETI. So it is something unique to Rosetta. I will try running the test referenced anyway, probably tomorrow when I get a chance. Where does one get Memcheck86+? A quick google didn't turn anything up. This is a linux box, will it run on linux? Reno, NV Team: SETI.USA |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Hi Zombie, Keith, others: The overall error rates for all recent workunits is similar to what its been in the past... so this definitely looks like something specific to your clients. Its useful to know that the same faults aren't occurring for Docking or SETI, and that it might be a large file size -- however, nothing in these workunits should result in a large file size, to my knowledge. Could I possibly bother you to attach your client to RALPH? Its our test server, and we get back more detailed info from those results. Thanks! It is an old PII Inspiron 7000 laptop. I have reinstalled the OS several times, the most recent reinstall was Friday evening. It does not error out for Docking or SETI. So it is something unique to Rosetta. |
James Thompson Send message Joined: 13 Oct 05 Posts: 46 Credit: 186,109 RAC: 0 |
It is an old PII Inspiron 7000 laptop. I have reinstalled the OS several times, the most recent reinstall was Friday evening. It does not error out for Docking or SETI. So it is something unique to Rosetta. Hi Zombie, I don't know much about memcheck86, but I have used memtest86 many times in the past. I would run it from a Linux LiveCD (such as Knoppix, http://www.knoppix.org), as that won't rely on you installing any programs on your laptop. Here's a link to an article that describes the process in detail: http://software.newsforge.com/software/06/06/27/206209.shtml?tid=91&tid=132 |
casio7131 Send message Joined: 10 Oct 05 Posts: 35 Credit: 149,748 RAC: 0 |
you get memtest86+ from here: http://www.memtest.org/ |
Tiago Send message Joined: 11 Jul 06 Posts: 55 Credit: 2,538,721 RAC: 0 |
I'm getting the same problem in some computers, i think this is something related or with the boinc version, or with the wu. |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Where does one get Memcheck86+? [...]This is a linux box, will it run on linux? Argh. It would help if I used the right name for the program. (Memtest86+). I've always used the iso from the linked web page for bootable cds, or floppies. So it doesn't care what OS you have installed on your HD. It's also included on The Ultimate Boot CD 3.4 (although there's probably a newer version out by now.) |
FluffyChicken![]() Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Where does one get Memcheck86+? [...]This is a linux box, will it run on linux? Hirens BootCD also conatins many memorytest programs. There is also microsoft test http://oca.microsoft.com/en/windiag.asp program, the last one is also accesible on every Vista DVD during the initial DVD boot. Team mauisun.org |
zombie67 [MM] Send message Joined: 11 Feb 06 Posts: 316 Credit: 6,621,003 RAC: 0 |
I have detached this machine from Rosetta, and attached it to RALPH. I am running the memory test now. Looks like it will take some time to complete. Reno, NV Team: SETI.USA |
zombie67 [MM] Send message Joined: 11 Feb 06 Posts: 316 Credit: 6,621,003 RAC: 0 |
How many cycles should I let the test run? FYI, there are no jobs on RALPH to run. Reno, NV Team: SETI.USA |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Once should be enough to catch the memory errors; but I leave it running overnight to verify that there aren't any intermittent errors. With the number of errors your WUs are having in less than 3 hours, 1 pass should be enough. |
zombie67 [MM] Send message Joined: 11 Feb 06 Posts: 316 Credit: 6,621,003 RAC: 0 |
Okay, I stopped it sometime during the 4th pass. It accumulated 15 failures: 1x for test 3 13x for test 4 1x for test 7 The rest of the tests passed without failure. The location of the failures is "000ba73624 - 186.1mb". So...what now? Is this what is causing my WUs to fail so often? And why is it not also happening for SETI? I can try swapping out the DIMMs to find the problem. Hopefully it is not the 64mb of on-board memory. It had 2x 64mb DIMMs, which I replaced with 2x 128mb DIMMs. If one of those is bad, it would drop me down to 64mb on-board + 128mb + 64mb = 256mb. Is that enough to run rosetta (or SETI)? Reno, NV Team: SETI.USA |
![]() ![]() Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
How many cycles should I let the test run? Unless you suspect intermittent problems, once is enough. If you do suspect intermittient memory problems, run it for as many days as you think is necessary to eliminate that possibility - for example if the symptom happens twice a week then you would need to run memtest for something like 5 days to be sure :( edit-added: I agree with Benny - once would be OK in your case, overnight to be sure. By the way, going back to how to run memtest (asked earlier in the thread). All the well known Linux distros let you include memtest86 as a boot option, try specifying it as a package during the Linux install, or adding it as a package later. The package manager should do all the necessary changes for you, so that you simply see it on the boot menu every time you boot. That is my favoured way of running it, no hunting for CDs or floppies and let the package manager figure out where to download it from. Adding it to a bootable usb stick should be possible to anyone who can already get Linux booting from usb - hint: treat memtest86 as another operating system. You'd use Linux to create the stick, but you don't need Linux (or any OS at all) to run it. River~~ |
![]() Send message Joined: 3 Nov 05 Posts: 1834 Credit: 124,260,318 RAC: 9 |
It had 2x 64mb DIMMs, which I replaced with 2x 128mb DIMMs. If one of those is bad, it would drop me down to 64mb on-board + 128mb + 64mb = 256mb. Is that enough to run rosetta (or SETI)? You can run Rosetta on 256MB - one of the other threads here suggests that there's a bug in how the memory requirements are handled, which may leave you with down periods, but I think if you run with a decent cache of jobs (>1 day) this shouldn't affect you. |
EW-3 Send message Joined: 1 Sep 06 Posts: 27 Credit: 2,561,427 RAC: 0 |
Have also started to get more failing wu's. Is there a running log kept of failures to identify a pattern in the making? |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Remove the Ram and retest one stick at a time. Stop on the first failure. That module is bad on that particular motherboard. Test any remaining sticks. Seti, etc must have a smaller memory footprint, and not touch the affected memory area that is bad. We ran into problems like this when people started working with memory hog software after using their system problem free for a year. |
zombie67 [MM] Send message Joined: 11 Feb 06 Posts: 316 Credit: 6,621,003 RAC: 0 |
Thanks for all the help everyone. The bad 128mb DIMM is in the RMA loop now. I will be running with 256mb until it returns. I tested the 64mb DIMM just to be sure everything all memory is good. I have also reattached it to rosetta, to confirm the issue is resolved. I run 4 hour jobs, so I should have several to look at in the morning. THANKS AGAIN! Reno, NV Team: SETI.USA |
Message boards :
Number crunching :
why is this machine failing so much?
©2025 University of Washington
https://www.bakerlab.org