Many client errors

Questions and Answers : Unix/Linux : Many client errors

To post messages, you must log in.

AuthorMessage
Profile Pushkin
Avatar

Send message
Joined: 10 Mar 07
Posts: 14
Credit: 7,068,050
RAC: 0
Message 74881 - Posted: 11 Jan 2013, 5:36:57 UTC

Hi guys,
a few days ago I had a look at my Rosetta account and I found out, that it generated (and generates) many and many client errors - since December 2012 I had no successful workunit. The output in task details looks like this:

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<stderr_txt>
[2013- 1-10 10:50:47:] :: BOINC:: Initializing ... ok.
[2013- 1-10 10:50:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully. 
Registering options.. 
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok 
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize()  End reached
Loaded options.... ok 
Processed options.... ok 
Initializing random generators... ok 
Initialization complete. 
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev52077.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_rb_01_09_35680_67579__t000__0_C2_robetta.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup. 
Starting watchdog...
Watchdog active.
======================================================
DONE ::     1 starting structures  5422.32 cpu seconds
This process generated      1 decoys from       1 attempts
======================================================
BOINC :: WS_max 5.25381e-287

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

I don't see any kind of error there but the result of the task is always Client Error. Unfortunately I cannot find when Rosetta started behave like this, since all recent workunits have this error.

I am running Debian Wheezy 64-bit on a PC with Intel Core i5 3570, BOINC Client 7.0.27:
root@pushkin:/home/pushkin# lshw -short
H/W path           Device      Class          Description
=========================================================
                               system         HP Compaq Elite 8300 MT (QV994AV)
/0                             bus            3397
/0/0                           memory         64KiB BIOS
/0/4                           processor      Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz
/0/4/5                         memory         256KiB L1 cache
/0/4/6                         memory         1MiB L2 cache
/0/4/7                         memory         6MiB L3 cache
/0/2b                          memory         16GiB System Memory
/0/2b/0                        memory         4GiB DIMM DDR3 Synchronous 1600 MHz (0,6 ns)
/0/2b/1                        memory         4GiB DIMM DDR3 Synchronous 1600 MHz (0,6 ns)
/0/2b/2                        memory         4GiB DIMM DDR3 Synchronous 1600 MHz (0,6 ns)
/0/2b/3                        memory         4GiB DIMM DDR3 Synchronous 1600 MHz (0,6 ns)
/0/100                         bridge         Xeon E3-1200 v2/3rd Gen Core processor DRAM Controller
/0/100/1                       bridge         Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port
/0/100/1/0                     display        NVIDIA Corporation
/0/100/1/0.1                   multimedia     NVIDIA Corporation
/0/100/1a                      bus            7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2
/0/100/1b                      multimedia     7 Series/C210 Series Chipset Family High Definition Audio Controller
/0/100/1d                      bus            7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1
/0/100/1e                      bridge         82801 PCI Bridge
/0/100/1f                      bridge         Q77 Express Chipset LPC Controller
/0/100/1f.2        scsi0       storage        7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode]
/0/100/1f.2/0      /dev/sda    disk           500GB Hitachi HDS72105
/0/100/1f.2/0/1    /dev/sda1   volume         100MiB Windows NTFS volume
/0/100/1f.2/0/2    /dev/sda2   volume         97GiB Windows NTFS volume
/0/100/1f.2/0/3    /dev/sda3   volume         368GiB Extended partition
/0/100/1f.2/0/3/5  /dev/sda5   volume         65GiB Linux filesystem partition
/0/100/1f.2/0/3/6  /dev/sda6   volume         65GiB Linux filesystem partition
/0/100/1f.2/0/3/7  /dev/sda7   volume         19GiB Linux filesystem partition
/0/100/1f.2/0/3/8  /dev/sda8   volume         41GiB Linux swap / Solaris partition
/0/100/1f.2/0/3/9  /dev/sda9   volume         175GiB Linux filesystem partition
/0/100/1f.2/1      /dev/cdrom  disk           CDDVDW SH-216BB                                                                                                                                                                                                                  
/0/100/1f.3                    bus            7 Series/C210 Series Chipset Family SMBus Controller                                                                                                                                                                             
/0/100/14                      bus            7 Series/C210 Series Chipset Family USB xHCI Host Controller                                                                                                                                                                     
/0/100/16                      communication  7 Series/C210 Series Chipset Family MEI Controller #1                                                                                                                                                                            
/0/100/16.3                    communication  7 Series/C210 Series Chipset Family KT Controller                                                                                                                                                                                
/0/100/19          eth0        network        82579LM Gigabit Network Connection                                                                                                                                                                                               
/0/1               scsi7       storage                                                                                                                                                                                                                                         
/0/1/0.0.0         /dev/sdb    disk           500GB SCSI Disk                                                                                                                                                                                                                  
/0/1/0.0.0/1       /dev/sdb1   volume         19GiB Windows NTFS volume                                                                                                                                                                                                        
/0/1/0.0.0/2       /dev/sdb2   volume         445GiB W95 FAT32 (LBA) partition                                                                                                                                                                                                 
/1                             power          To Be Filled By O.E.M.  


Can you please kick me where I could try to find a solution?

Thank you,
Pushkin
ID: 74881 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Pushkin
Avatar

Send message
Joined: 10 Mar 07
Posts: 14
Credit: 7,068,050
RAC: 0
Message 75006 - Posted: 28 Jan 2013, 13:12:50 UTC - in response to Message 74881.  

Today I have let Rosetta calculate another workunit ... again it ended with Client error (see WU558909674). Really no idea what to do with this issue?
ID: 75006 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 75014 - Posted: 29 Jan 2013, 19:03:16 UTC

Sorry for the lack of response. I'm afraid this section of the forums is pretty underutilized and frequently overlooked.

Looking at your lshw output and the symptoms you describe, two things stand out.

/0/4 processor Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz

and

/0/100/1/0 display NVIDIA Corporation
/0/100/1/0.1 multimedia NVIDIA Corporation

Many users have reported the same symptoms, but the pattern is hard to establish. DK is actively working on it. There is much speculation to the cause, but I'm not sure if we've nailed it down completely yet.

At this point, a possibility is that it is related to NVIDIA drivers; some users have reported success with downgrading. I want to say that it seems to happen only with Ivy Bridge processors, although this could be incorrect. I have been unable to reproduce the problem with my Nehalem i7 and GTX460.

Please see this thread: https://boinc.bakerlab.org/forum_thread.php?id=6177 for the latest. It is mentioned in other threads in under the "number crunching" section as well. Again, sorry for not noticing your post sooner.

ID: 75014 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Pushkin
Avatar

Send message
Joined: 10 Mar 07
Posts: 14
Credit: 7,068,050
RAC: 0
Message 75018 - Posted: 30 Jan 2013, 6:20:15 UTC - in response to Message 75014.  

Hi,
thank you for your answer anyway, at least I know that I am not alone with this problem. I'll start following the thread you have linked, we'll see how things will continue.

Thanks,
Pushkin
ID: 75018 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Questions and Answers : Unix/Linux : Many client errors



©2024 University of Washington
https://www.bakerlab.org