MandrakeUser.Org - Your Mandrake-Linux Knowledge Base!

 
 

* DocIndex - Administration

Emergency Recovery III

* Scenario I: System Doesn't Boot
* Scenario II: System Stops During Boot

Related Resources:

Mdk-Ref 13
LIGS, 4.11
man chroot
man lilo.conf
info grub
man gpart
man sfdisk
man mkinitrd
man e2fsck


Revision / Modified: Jan. 28, 2002
Author: Tom Berger

 

* Scenario I: System Doesn't Boot

Usually this error is due to a simple boot loader misconfiguration. Your main priority is getting the system to boot again so that you can adapt yourboot loader configuration.

If you have a current, working boot disk for your system, you are lucky ;-). If not, I'd suggest you create one right away. You can do that very easily via the Mandrake Control Center (Boot - Boot Disk).
If you prefer the command line:

mkbootdisk $(uname -r)

will do the same.
Boot with it to make sure it works.

If you are faced with a boot loader failure without having a boot floppy at hand, you have to start one of the external systems, preferably the Mandrake Linux rescue system, described on the previous page of this article and repair the boot loader configuration from outside (or at least create a working boot floppy).

When you are changing the configuration of the LiLo boot loader by editing '/etc/lilo.conf', you have to run the lilo afterward. But it has to be the 'lilo' on the hard drive, because you want to update the boot sector on that device. How to do that?
Simple. Enter the '/mnt' directory where the 'root' directory of your disk system is mounted to. Now change the 'root' directory with

chroot .

What does this do? When you are on the rescue system, your 'root' directory is that on the CD, with the system on the disk mounted to '/mnt'. With 'chroot' you basically switch your root directory to that on the disk. If you issue a command now, the disk version of this command will be executed, not the CD version. Execute

/sbin/lilo

and a new boot sector with the current configuration will be written to the master boot record. For GRUB, you'd likely execute something like

grub-install /dev/hda

although the device name might be different depending on your hardware setup. To switch your root directory back to the CD, type:

exit

Partition Table Destroyed Or Corrupted

If you can't fix your booting problems and 'cfdisk' as well as 'fdisk' tell you that there just isn't any partition table to read on your hard disk, chances are that table has been corrupted.

If the botched up table is not on the hard disk which contains your Linux installation, install the gpart partition table rescue utility and run it on the disk with the defunct boot record:

gpart /dev/device

where device is a whole disk device (e.g. hda or sda). This is just a scan to find out if 'gpart' can find any partitions at all (it usually does). Notice that this test can take quite a while and use up a considerable amount of system resources. If the findings of gpart look reasonable to you, tell it to write them to the boot record:

gpart -W /dev/device

Do not turn the computer off or kill the program until it is finished writing the table. 'gpart' may look sometimes like it was hanging, but it doesn't. Just wait. When finished, reboot.

If the partition table of the system disk has become unreadable, start from the Mandrake Linux rescue system. It contains a (undocumented) utility called 'rescuept' which ... well, I guess you can tell that by its name ;-). The first step is just like with 'gpart':

rescuept /dev/device

This will print 'rescuept's findings to the console. If these findings look reasonable, pipe them to another disk utility, 'sfdisk', which will write them to the boot sector of the hard disk:

rescuept /dev/device | sfdisk /dev/device

You want to make absolutely sure here that you use the samedevice name in both parts of the pipe ... When finished, knock on wood and reboot.

Super Block Damaged

Now that is really a rare emergency. From all the scenarios listed in this article, this is probably the only one which hasn't happened to me so far in my six years with Linux ;-).

The super block is the first block of each extfs2 partition. It contains important data about the file system like size, free space etc (it's similar to the File Allocation Table on FAT partitions). A partition with a damaged super block can't be mounted. Fortunately extfs2 keeps several super block backup copies.

  1. Boot your preferred emergency system.

  2. The backup copies are usually located at the beginning of each 8 KB (8192 bytes) block. So the next backup copy is in byte No. 8193.

  3. To restore the super block from this copy, enter the command

    e2fsck -b 8193 /dev/device

    If that block is damaged, too, try the next one at byte No. 16384 etc.

  4. Reboot.

* section index * top

* Scenario II: System Stops During Boot

There are several critical steps where booting can fail.

Kernel Doesn't Load Properly

If this happens after a kernel upgrade, either a wrong boot loader configuration or misplaced symlinks in '/boot' are to blame. Boot another kernel or a rescue system and perform the steps outlined in the Kernel Upgrade article.

Boot Hangs On Rebuilding RPM database Or Finding Module Dependencies

If the system hangs during 'Rebuilding RPM database' or 'Finding module dependencies', just hit <CTRL> <c> simultaneously. This will skip this step and continue to boot.
Issue rpm --rebuilddb as 'root' if the hang was at 'Rebuilding RPM database'.
If your machine hangs at 'Finding module dependencies', you have most likely been through a kernel upgrade from source but haven't done it properly. Check if the files in '/boot' and the '/lib/modules' directory match the current kernel-version (i.e. have the current version number attached). Read the article on Upgrading The Kernel From Source for more.

Boot Hangs On RAMDISK: Compressed image found at block 0'

The system tries to load a RAM disk for a different kernel. Your boot loader configuration file points to a wrong or non-existent RAM disk (optioninitrd=). Boot another entry from your boot loader and create a RAM disk for your new kernel with 'mkinitrd' or use the 'Boot Config' module from the Mandrake Control Center, which automatically generates 'initrd' images and corresponding entries in the configuration file of your boot loader.
If you don't have another working entry to boot, use an external rescue system. See scenario I.

Boot Hangs On Kernel panic: VFS: Unable to mount root fs on xx:yy

The kernel tries to mount the 'root' partition but either doesn't find the necessary drivers or doesn't find the root partition.
If drivers necessary to access the root file system are built as kernel modules, these modules must be loaded via an 'init RAM disk' ('initrd'), referenced in your boot loader configuration file. Notice that access to non-ext2 filesystems like Reiserfs, XFS or JFS also requires modules and thus a RAM disk. See previous entry.

If the kernel can't find the root partition, check your boot loader configuration, especially the 'root' option.

File System Check Fails

If the system encounters a medium which hasn't been properly unmounted, it will run a routine file system check (fsck) or, if you use a journaling file system (default in Mandrake Linux 8), a journal recovery during the next mount of that medium.

If the file system does not feature a journal, 'fsck' will check it for consistency and delete or move empty or inconsistent data. You will find that data later in the 'lost+found' directory of the fsck'ed partition.

'fsck' will fix most errors by itself. If it comes to deleting data, however, 'fsck' will quit and you will be dropped to a root shell. Run 'fsck' again by hand on the device, where the automatic 'fsck' failed

e2fsck /dev/device

This will start 'fsck' in interactive mode and you will be prompted for each action 'fsck' wants to make. If you are not a file system guru, you might be better off to let 'fsck' do what it thinks is best:

e2fsck /dev/device

The '-p' option tells 'e2fsck' to do all the necessary repairs without asking, '-y' assumes the answer 'yes' to all questions.
When the check and repair is over, hit CTRL-D to leave the emergency console. The system will reboot.
The first thing you should do when the system has rebooted is backing up all important data to an external medium immediately. Have a look at the 'lost+found' directories on your system. These might contain '#' files. These files have been moved to these directories to improve the consistency of the file system. Which means that these files can be important system configuration files.

* section index * top

* Emergency scenarios II


 
Legal: All texts on this site are covered by the GNU Free Documentation License. Standard disclaimers of warranty apply. Copyright LSTB (Tom Berger) and Mandrakesoft 1999-2002.