Jump to content

Experimental problem


Ixthusdan
 Share

Recommended Posts

Recently I had someone on another board question my logic on a computer problem (imagine somebody questioning my logic! :lol:) and it gave me an idea. I have a hypothetical situation. Since it is hypothetical, there is no real solution. But what would be the troubleshooting path? I think it would be a good discussion because, as most of you know, computer fixin' is not a fine science! So here it is:

 

There is a linux box with an ATI 7000 card. Sometimes, it will just freeze and lock up. The keyboard is unresponsive and cannot do anything on the machine. But, one can ssh into it and reboot the machine to fix the problem. For this reason, it seems that if the x server crashes, the whole thing is taken out, just like winows. What might be wrong? Is linux like windows? :lol:

 

Rather than say what I said, I want to hear what you all think. Don't ask for anymore details becausae these are all that were offered! Speculate city!

Link to comment
Share on other sites

Some quick thoughts:

 

First: Per excellent IX advice from a past message: "Preparation is the Key." Make a list of the various steps and options you are going to take to try to identify the problem and solve it. Plan your problem-solving action steps.

 

Second: Keep a detailed log of each trouble-shooting thing you try. Your notes may prove handy later on.

 

Action Steps:

 

1. Read Tom Berger's FAQ: What do I do when my system stops responding?. Also, the article on the Magic SysRq Key.

 

2. Check logs -> /var/log/messages, Xorg.0.log, etc. Any hints?

 

3. Run memory checks.

 

4. Have all upgrades/bug fixes/security fixes etc. been applied?

 

5. Test hypothesis that the problem is X-related, demonstrate that problem does not occur when operating solely in command line mode. Boot to runlevel 3, operate machine for a day or so in this mode. Make sure you use network - use links or lynx for Internet, ssh into and out, rsync, etc. If no freeze problems, then the working hypothesis that it's X-related is looking good.

 

6. If machine froze in CLI-only mode, open box and check all data and power connections. Reseat boards, etc.

 

7. Is it an X-problem or a DM problems. Try some different DMs to see if problem is specific to DM you were using.

 

8. If machine stable in CLI-only mode and problem does not seem related to DM, try a different video driver. Try VESA as last resort.

 

9. If different video driver works without freeze re-occuring, either go with that or try playing around with the X config options on the (presumed) ATI driver the owner was using originally.

Link to comment
Share on other sites

oddly enough I've had this happen to me.

 

Problem was I had papers stacked on my box.

a couple of those papers had slipped down the back, and inside the main power supply fan.

The fan had then ceased up, and I could get a good 7-8 hours of use before it would freeze up, unless I was playing games or running seti. but I could always ssh into it.

 

New case/power supple .. and the problem went away.

Link to comment
Share on other sites

You suspect the graphics card so why not just go with that

If you can ssh in then just kill X ...

 

if you can then its either a HW or SW fault and probably in the graphics card so I d swap graphics card and check

 

if its not the graphcis card it might be a power prob. I just changed a PSU in my GF's machine which was not delivering constant power and this fixed it... prior to that I had lame dying every time she tried to rip a mp3 and the problem wass the CPU drawing too much power..when using the mmx part.

Link to comment
Share on other sites

Actually, in over all hardware problems, the power supply is becoming more and the more culprit than say 5 years ago. Even new machines by the big manufacturers are having this problem. (Dell)

Link to comment
Share on other sites

My Dad has six Dell's in his office, and two PSU's have failed recently, and one more is likely to go. These aren't exactly old machine, about two years old I would say.

 

It was odd as well, because one was referring to battery problems, which we thought the system board battery was going. As soon as the PSU died, we replaced the PSU, and no more errors asking us to replace system board battery!!

Link to comment
Share on other sites

I had a similar problem, although with an Nvidia card. Occasional hangup here and there, most of the times with a possibility to ssh in. It was actually a memory problem - one of the DIMMs was faulty. Replaced it - and no hangups since. There, of course, are problems here and there ( I'm pretty heavy tweaker with strong inclination to break things sometimes), but hangups - none.

So, running memtest86 for some day or two should be actually one of the items in the troubleshooting list.

Link to comment
Share on other sites

Actually, in over all hardware problems, the power supply is becoming more and the more culprit than say 5 years ago. Even new machines by the big manufacturers are having this problem. (Dell)

The problems are simpy increasing as power requirements increase, as a PSU gets more connections .. AT -> ATX -> P4

Link to comment
Share on other sites

Following on from Gowaters excellent comment, this is why I always use the largest rated affordable power supply possible. It is not a waste of resource to have a supply greatly in excess of what is routinely consumed by your computer because it means that every component in the supply is under-used and therefore not stressed. This stress is mostly from heat. Not only will your power supply components run cooler but the voltage regulation components are more stable and better able to handle sudden load changes whether short term or long term, i.e.smoother regulation.

NOTE: (before the nitpickers dive in) A bigger power supply does not mean less overal heat is produced (this is determined by the load that the Computer draws. What is does mean is that components in large power supplys components are designed to handle higher loads and temperatures than components in smaller power supplies. In the case of components in large power supplys the components are cruising wheras the components in smaller PWs are running close to their limit.

 

Have you ever noticed how many of the power supplys in old computers are still running today???. This is because electronic engineers in the past allowed for up to a 50% excess margin, and often 75%, on their design usage. ( My computer could get by with a 300watt PS but I have a top quality 450watt unit in my machine.......home built).

Dells problem is showcasing that they are cutting costs by cutting corners and using the cheapest barely adequate power supplys.......that is not good engineering no matter how much they advertise about their quality !!!!.

 

Jboy. Good post. Follows good technicians thinking.

Great topic theme Lxthusdan.

Cheers. John.

Link to comment
Share on other sites

It's a shame, because they tend to put the PSU that equates to what hardware is in the machine.

 

Then you start adding more and more stuff yourself, and then suddenly you can't turn the damn thing on! Need larger PSU to power what you've just added.

 

So, an example for you. You have 300w PSU. Each item in the machine takes a draw on the system, so, eg:

 

CPU - can be around 80-90w

Memory - about 20w

Video Card - can be around 80-90w

CDROM/HDD - 30w each

Sound Card - 20w to 30w perhaps

 

So, if we assume one HDD and one CDROM drive, the total on the above system comes to 280/290w. I used 90w on both items CPU and Video Card.

 

So, you add another hard disk, maybe a DVD Writer drive, and you've put yourself over what your PSU can handle. The problem is, when you turn the machine on, every single component fires up at the same time, so you get a hit on all wattage at the same time. If this exceeds the rating of your PSU, you can't turn the machine on.

 

If you run AMD 64 chips, you should not be running a PSU less than 430w.

Link to comment
Share on other sites

AussieJohn has a great point about power supplies. They really should be over-stated when building a pc. I also believe that the quality of the power supplies has dwindled. Ironically, from a cost perspective, it's not all that much more to increase the size of the power suuply. Yet, margins are more important for the commercial producers than quality, and that is the unfortunate fact.

Link to comment
Share on other sites

This thread is an eye-opener and makes me think that I've been lucky. I've always built my own boxes, always had 2 HDs and 2 optical drives, always put in extra fans, and ran einstein@home and seti@home (which raised the CPU temperature 5 degrees F, so that's drawing more power), and never had a glitch. However, I never have had a real high end video card either. When I bought the cases, I paid attention to the PSU rating but never really had a clear idea of how much wattage I actually needed. I just always assumed I'd be ok with 350 - 400 watts. I sure will pay more attention to this in the future.

 

I guess the mechanism here is that with the gui running, a spike in power requirements overloads the psu, cpu instructions get garbled, and gui crashes, but the kernel itself and network and other services somehow (by pure random luck?) do not crash. So power requirements then drop but with the kernel itself still functional, the ssh into the box works ok to shut the machine down. Is that what's going on?

 

So for this machine in question that Ix gave us, can anyone think of a way to more directly test the PSU theory other than by process of elimination (memtests, other diagnostics, taxing the gui with power-hungry CPU and disk intensive tasks, swapping out the psu for a higher rated one, etc)?

Edited by jboy
Link to comment
Share on other sites

You'd prob have to rip out the PSU and try it separately, with some sort of electric tester.

 

I should speak to my mate, he's good at this electronic sort of stuff. I'm sure he's got one of these tester gadgets.

Link to comment
Share on other sites

Have you ever noticed how many of the power supplys in old computers are still running today???. This is because electronic engineers in the past allowed for up to a 50% excess margin, and often 75%,  on their design usage.  ( My computer could get by with a 300watt PS but I have a top quality 450watt unit in my machine.......home built).

Dells problem is showcasing that they are cutting costs by cutting corners and using the cheapest barely adequate power supplys.......that is not good engineering no matter how much they advertise about their quality !!!!.

This is also indicitive of the expected lifetime of a machine.

Back in the 80's you damned well expected that machine to last... and people bought machines to last.

 

Today we live in a throw away society added to the rate of progress on PC's ever increasing but we largely buy disposable items. Back in the 80's you could build your own PC for 20% less than a pre-made on because they were put together with care... now it costs 50% more to make your own.

Back then an entry level PC cost $1000++ now its $300+

 

 

If you pay peanuts then you will get monkey's doing your computing...

 

A big difference is that in contrast to the solod state components in a PC the PSU has moving fans and coils ... whereby the cost of a chip is the price of sand + energy once its being produced the cost of making coils and fans involves labor ..

 

Hence as John says the PSU has become the easiest part to cut corners and costs.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...