Jump to content

MDK 9.2 server freezing. Hardware problem?


gregor
 Share

Recommended Posts

A machine I'm using for a server is freezing (tried ssh / ping / locally). Every week or so.

 

This is with kernel2.4-marcelo:

http://qa.mandrakesoft.com/twiki/bin/view/...2_4_marcelo_2_4

(I'm using this kernel because when I used this machine for LTSP server it froze every time I logged into KDE from a LTSP client. If I use kernel2.4-marcelo this doesn't happen.)

 

BTW I have set up software RAID1.

 

With this script from:

http://www.bitwizard.nl/sig11/

I wanted to test if there is something wrong with RAM:

#!/bin/sh
#set -x
t=1
while [ -f log.$t ]
do
       t=`expr $t + 1`
done

while true
       do
       make clean
       make -k bzImage > log.cur 2>&1
       mv log.cur log.$t
       t=`expr $t + 1`
done

 

Does the following indicate hardware problem, or is there something else going on:

[root@mandrake linux]# diff log.1 log.2
1d0
< scripts/split-include include/linux/autoconf.h include/config
1069,1070d1067
< gcc  -o gen_crc32table gen_crc32table.c
< ./gen_crc32table > crc32table.h

Files from log.2 to log.9 are the same.

[root@mandrake linux]# diff log.9 log.10
1231c1231
< Setup is 4778 bytes.
---
> Setup is 4779 bytes.

Files from log.10 to log.15 are the same.

Link to comment
Share on other sites

I am not sure what the script does, but the only way of checking ram is from outside of an OS. Check this link out:

 

http://www.memtest86.com/

 

This is a link for memtest86, about the best memory test program available. Take the program, burn it to a floppy and let it run for about 5 hours. This will give a very good idea if you have any problems with the ram.

 

As for the rest of the hardware, the only real way to test the different parts is by switching new parts in and out of the computer and seeing what is leading to the instability.

 

What hardware are you using? Another thing might give you a clue of what is going on inside the hardware is to run gkrellm with lm_sensors installed. This will allow you to see the temp of your processor, motherboard, fan speeds, etc. However, your motherboard must support the sensors.

 

Good luck and more info would help.

Link to comment
Share on other sites

This is a link for memtest86, about the best memory test program available.

 

As for the rest of the hardware, the only real way to test the different parts is by switching new parts in and out of the computer and seeing what is leading to the instability.

I have used memtest86 before. Hardware would be easier to test if computer would freeze more often.

 

From the linked page I included in my message:

Why isn't "memtest86" the first to try if I suspect memory problems?

 

Feel free to do so. Some of this is black magic. However, when "memtest86" tells you that your RAM is ok, you might be tempted to believe it. It's telling you that it couldn't find any problems. It's not telling you that your RAM is flawless.

 

In my experience, RAM related problems are sometimes not found using a memory tester. The patterns are all nice and regular. Some problematic RAM simply works well under that kind of stress, but fails under the more erratic stress patterns caused by "gcc" or "zip".

 

The script that I included compiles kernel and writes output to log file. Output should be the same every time. Why is it not?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...