ffrr Posted September 13, 2005 Report Share Posted September 13, 2005 I have had this happen a few times. Today when I logged in (the computer runs 24 hours a day) I found that mysql had stopped running. I soon discovered this was because the whole partition where I had /var had become readonly. I had to reboot. The reboot process found errors on that partition, and fixed them, so it is up and running again. I've had this happen to a data partition as well. In fact I find the filesystem (I'm using ext3) under Mandriva quite flaky. Is it because I leave it on all the time? I do that because I run Mythtv on it. Anyway, why does it do this (become mounted readonly)? Is it because of the errors, and if so, why is it getting these errors. Should there be a log somewhere about what happened? This is the second hard disk I have tried (both were brand new 80 GB SATA drives). I also have a large 250GB SATA drive that I store the mythtv data on. It played up when I had XFS on it, but seems more stable now I have ext3 on it. Is linux not very stable on SATA drives perhaps? [moved from Software by spinynorman] Quote Link to comment Share on other sites More sharing options...
AussieJohn Posted September 13, 2005 Report Share Posted September 13, 2005 It sounds like your memory (the machines' that is) is getting flakey. I think you need to run memtest for a day or so to be sure. When faulty memory starts to corrupt files It seems to produce read only conditions but I do not know if this planned design for such situations or not. Check the memory. Cheers. John. Quote Link to comment Share on other sites More sharing options...
ffrr Posted September 13, 2005 Author Report Share Posted September 13, 2005 Thanks John. This has been very worrying. Could you recommend a memory test I should use please? Quote Link to comment Share on other sites More sharing options...
xxbeanxx Posted September 13, 2005 Report Share Posted September 13, 2005 Thanks John. This has been very worrying. Could you recommend a memory test I should use please? <{POST_SNAPBACK}> urpmi memtest86+ and it should show up in lilo on next reboot. Quote Link to comment Share on other sites More sharing options...
ffrr Posted September 13, 2005 Author Report Share Posted September 13, 2005 urpmi memtest86+ and it should show up in lilo on next reboot. Brilliant. I had just googled and found memtest86, but this saves me booting from a CD-rom Thanks Quote Link to comment Share on other sites More sharing options...
ffrr Posted September 13, 2005 Author Report Share Posted September 13, 2005 Yep, sure enough, there are 2 stuck bits in the RAM, both down low, at about 31MB and 81MB. I have the badRAM= line that memtest generated, but no idea how to use it (and it seems to repeat itself so I'd like to sanity check it first). Is it supposed to be added to the kernel command line in lilo ? Anyway, still looking for doco on badRAM... Quote Link to comment Share on other sites More sharing options...
ffrr Posted September 14, 2005 Author Report Share Posted September 14, 2005 Further developments. I borrowed a spare memory module from work to see if it was my RAM or motherboard causing the memory errors. Well, the surprise was that the one from work has an error in an entirely different place - at about 124 MB. So I put mine back in, and slowed down the memory timings in the BIOS (it's an ABIT NF7S board and fairly configurable). One of the 2 errors went away. I have put a badram entry on the kernel command line to stop the remaining error causing problems. I may go back and restore the proper memory timings, and mask out the second error as well. For now, we'll see what happens. I think I'll get a few memory sticks at work and see how many have errors. Maybe the cheap RAM being sold around town is not so good!!! Quote Link to comment Share on other sites More sharing options...
ffrr Posted September 14, 2005 Author Report Share Posted September 14, 2005 Here's what happened (including my remount). Please help with any ideas what is happening.... EXT3-fs error (device sda10): ext3_free_blocks: Freeing blocks not in datazone - block = 53796864, count = 1 Aborting journal on device sda10. EXT3-fs error (device sda10) in ext3_free_blocks_sb: Journal has aborted EXT3-fs error (device sda10) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device sda10) in ext3_truncate: Journal has aborted EXT3-fs error (device sda10) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device sda10) in ext3_orphan_del: Journal has aborted EXT3-fs error (device sda10) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device sda10) in ext3_delete_inode: Journal has aborted __journal_remove_journal_head: freeing b_committed_data ext3_abort called. EXT3-fs error (device sda10): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only end_request: I/O error, dev fd0, sector 0 end_request: I/O error, dev fd0, sector 0 end_request: I/O error, dev fd0, sector 0 end_request: I/O error, dev fd0, sector 0 kjournald starting. Commit interval 5 seconds EXT3-fs warning (device sda10): ext3_clear_journal_err: Filesystem error recorded from previous mount: IOfailure EXT3-fs warning (device sda10): ext3_clear_journal_err: Marking fs in need of filesystem check. EXT3-fs warning: mounting fs with errors, running e2fsck is recommended EXT3 FS on sda10, internal journal EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. Quote Link to comment Share on other sites More sharing options...
spinynorman Posted September 14, 2005 Report Share Posted September 14, 2005 ffrr - there was no need to start a new thread, so I've merged it into this one. :) Quote Link to comment Share on other sites More sharing options...
ffrr Posted September 15, 2005 Author Report Share Posted September 15, 2005 ffrr - there was no need to start a new thread, so I've merged it into this one. :) <{POST_SNAPBACK}> Ok, sorry. Right then, to explain the above, what happened was, although I have excluded the badram, I still ran into a problem later that night. However, on further thought, maybe this part of the disk was damaged while I still had some bad ram. Does this indicate I should fsck all my partitions for a clean start now. If so, what's the best way - boot off the install DVD and go to the repair prompt, then check all the partitions before they are mounted? fsck -f forces a check on supposedly clean filesystems doesn't it? Given the amount of keypresses I suffered while cleaning that one last night, i think fsck -f -a might be needed... Quote Link to comment Share on other sites More sharing options...
ffrr Posted October 3, 2005 Author Report Share Posted October 3, 2005 ffrr - there was no need to start a new thread, so I've merged it into this one. :) <{POST_SNAPBACK}> Ok, sorry. Right then, to explain the above, what happened was, although I have excluded the badram, I still ran into a problem later that night. However, on further thought, maybe this part of the disk was damaged while I still had some bad ram. Does this indicate I should fsck all my partitions for a clean start now. If so, what's the best way - boot off the install DVD and go to the repair prompt, then check all the partitions before they are mounted? fsck -f forces a check on supposedly clean filesystems doesn't it? Given the amount of keypresses I suffered while cleaning that one last night, i think fsck -f -a might be needed... <{POST_SNAPBACK}> Hopefully a final followup on this. Even with new memory that tests OK, I was getting file system corruptions, and I think I have finally tracked it down. My motherboard is an NF7-S from abit, and there is apparently a 'known problem' with the SIL 3112 Sata raid chip on it, especially when working with early Seagate drives. A new BIOS supplied a new parameter called 'Ext P2P' which can be set to various times, most in the order of 20 or 30 uS but one setting, recommended if disk problems still occur, is a whopping big 1ms. I have set it to this and it seems to be OK now, - touch wood. Strangely I have noticed no performance hit, and someone else said the difference was only 5% or so. It'll be nice to have a stable system for a while... but not happy with abit. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.