dma timeouts on ATA harddisk

arctic · November 14, 2005

Hi there. This time I need your help. :D

One of my laptops seems to have some problems with the dma options. I have searched the net a lot (I stopped counting the hours) and didn't find a solution yet. Maybe someone of you knows the real cause of this problem and a solution to it.

Case scenario: The laptop has a Hitachi ATA 40GB harddisk. It works quite well, but sometimes (irregulary times), the system temporarily freezes (sometimes four seconds, sometimes ten or more seconds) and very rarely completely locks up. First I thought that it might be realted to a bug in a distro, but Mandriva2006, Ubuntu 5.10 and Debian all reacted the same way. Thus, I checked my /var/log folder and in the "messages" file, I stumbled over this:

Nov 12 23:36:30 localhost kernel: [4295186.931000] hda: dma_timer_expiry: dma status == 0x20
Nov 12 23:36:30 localhost kernel: [4295186.931000] hda: DMA timeout retry
Nov 12 23:36:30 localhost kernel: [4295186.931000] hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
Nov 12 23:36:30 localhost kernel: [4295186.931000] 
Nov 12 23:36:30 localhost kernel: [4295186.931000] ide: failed opcode was: unknown
Nov 12 23:36:30 localhost kernel: [4295187.217000] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 12 23:36:30 localhost kernel: [4295187.217000] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
Nov 12 23:36:30 localhost kernel: [4295187.217000] ide: failed opcode was: unknown

Now, I am no genius but I read that this might have something to do with hdparm (which I never had to deal with) settings, so I decided to give it a shot (I still need to read the man page).

Nonetheless I ran "hdparm -i /dev/hda" and got:

 Model=HITACHI_DK23FB-40, FwRev=00M1A0A1, SerialNo=4CY679
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=off
CurCHS=4047/16/255, CurSects=16511760, LBA=yes, LBAsects=78140160
IORDY=yes, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes:  pio0 pio1 pio2 pio3 pio4
DMA modes:  mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=yes: mode=0x80 (128) WriteCache=enabled
Drive conforms to: ATA/ATAPI-5 T13 1321D revision 3:
* signifies the current active mode

Could it be, that reducing the udma level might fix those timeouts? Or a complete disabling of dma? How can I reduce the udma level?

I know.. questions over questions, but those freezes are annoying. And if some of our geeks could come up with an answer I would really appreciate it.

Oh, and I checked the drive and it is 1. new (not even one month old) and 2. passed all sanity checks.

jboy · November 14, 2005

The hdparm output shows: AdvancedPM=yes

Could this be a BIOS Power Management setting that maybe you could disable? Is the BIOS set to perhaps power down the hard drive after a certain inactive period to save battery power?

Just a wild guess. Might be worth checking around in the BIOS settings.

EDIT: The BIOS settings might also have options to disable DMA access or change access mode.

Edited November 14, 2005 by jboy

arctic · November 14, 2005

The idea of powering down is unrealistic, as it happens especially when there is workload (e.g. using OOo). I will check my BIOS later, as I am currently trying to rebuild the base hdd cluster structure from within, in case there are some damaged clusters on the drive (yes, and I backed up my data first). If this should be the problem, a dying drive (unlikely but still possible), I will return that laptop.

ianw1974 · November 14, 2005

You can boot Linux with the ide=nodma option, but your system will probably run really slow afterwards.

Someone had a prob recently when booting that it was taking ages, and using the switch did fix it, but I can't remember what else they did to get around the problem.

The post was in the last week or two though, I remember that bit :P

ianw1974 · November 14, 2005

This was the post:

https://mandrivausers.org/index.php?showtopic=29142&

I think they were looking into an option in the BIOS relating to UDMA.

arctic · November 14, 2005

with the no dma option at boot it still froze, btw. (oops)

currently I am running the following command (got that information from another guru in another forum):

dd if=/dev/zero of=/dev/hda

I hope the box will survive that... :unsure:

ianw1974 · November 14, 2005

Have a look at the Legacy USB Support in your BIOS, and disable this. I just found another post regarding freezing systems. Might be relevant.

arctic · November 14, 2005

EEEKS! The system seems to be racting abnormally. I get:

*cannot execute "/sbin/getty"

*Id "2" respawning too fast: disabled for 5 minutes

*cannot execute "/sbin/getty"

*Id "3" respawning too fast: disabled for 5 minutes

*cannot execute "/sbin/getty"

*Id "4" respawning too fast: disabled for 5 minutes

*cannot execute "/sbin/getty"

*Id "5" respawning too fast: disabled for 5 minutes

*cannot execute "/sbin/getty"

and so on and so on, till it reaches 6, then starts again at "Id 2"and again and again and again. Google didn't help me a lot... what does that mean? I think I ran into BIG trouble with that f**** laptop.

Umm... PS: Currently the Laptop runs with Ubuntu.

jboy · November 14, 2005

During startup, the init process runs the commands in /etc/inittab. One set of these commands starts a login process on each of the virtual terminals you can login to (accessible thru CTRL-ALT-F1, etc), and these just sit there waiting around for a user to initiate a login. If you look at /etc/inittab, you'll see something like the below. It's saying that for run levels 2 thru 5, start a login process on each of 6 virtual terminals, and when the process dies (e.g., the user logs out), then respawn the login process again. What is happening is that for some reason /sbin/getty cannot be executed on your system, and it keeps trying to start them.

# Run gettys in standard runlevels

1:2345:respawn:/sbin/getty tty1

2:2345:respawn:/sbin/getty tty2

3:2345:respawn:/sbin/getty tty3

4:2345:respawn:/sbin/getty tty4

5:2345:respawn:/sbin/getty tty5

6:2345:respawn:/sbin/getty tty6

Why can't /sbin/getty be executed? I dunno, maybe something got borked during that dd command execution. First of all, see if /sbin/getty exists. If not, maybe the package that installs the getty binary needs to be re-installed. Or, does /sbin/mingetty exist - getty and mingetty are different implementations of the same thing. Perhaps you could create a symbolic link to mingetty and name it getty, or edit /etc/inittab to use mingetty.

However, could it be that getty is the only thing that's borked? Seems doubtful. Are there other system problems?

Despite all these respawnings, your system should still be usable, but of course you won't have any virtual terminals. Can you open a console window from within gnome? Opening a console window might not be affected by this, I'm not sure.

Edited November 14, 2005 by jboy

arctic · November 14, 2005

Holy crap... It is a bug in Ubuntu! I just discovered that. And then, all partitions and the data on it was gone. I started the CD again in rescue mode and ran /bin/dd if=/dev/zero of=/dev/hda. and it completely locked up. No response. Now I fired up the Mandriva CDs and reinstalled 2006 after formatting te whole drive. I hope the hdd will be stable now. If not, I will add more info.

I am glad I backed up my data... hehehe. :D

pmpatrick · November 14, 2005

I'm not sure what you're trying to do but:

# if=/dev/zero of=/dev/hda

should zero fill hda, i.e. it should competely wipe all data from hda and write zeros to every sector of the drive. The fact that the dd command is locking up now could mean that it is encountering a lot of bad sectors which is usually indicative of the hard drive starting to go.

If you can determine the manufacturer of the hard drive, I would recommend paying a visit to their website and downloading the manufacturer's diagnostic utilities; all the major hard drive makers now have them on their websites. They usually come in the form of a bootable iso image or a bootable floppy. Just boot up with the diagnostic utility and put the drive through it's paces. Most of them also have a zero fill utility built in which is a lot more robost than using "dd" as it can handle bad sectors much better.

arctic · November 14, 2005

Okay, I did a complete check of the drive again with the Hitachi tools and the drive is 100% okay.

What I was trying to do was to zero fill the drive in order to lock bad clusters and later restore the data with the /bin/dd etc. command. But as it seems, something went completely wrong in Ubunutu. I launched it again using my DSL CD and it worked this time, before firing up Mandriva. As no bad clusters were found, I hope that a complete sweep and rebuilt will stabilize the drive now. We will see...

arctic · November 17, 2005

Okay, today the drive died completely today while writing in OOo. Work is lost. :( (No recovery of data with live-cds possible) Thus, the lappy will go back to the vendor tomorrow.

Man, I am lucky... :juggle:

dma timeouts on ATA harddisk

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation