January 2007 Archives

Resize a live root FS - a HOWTO

It is possible, though difficult, to resize a Linux root partition while it's still mounted. What's more, it can be done remotely, without having to be at the console. You'll need 2GB of RAM, but here is how:
  1. Stop all services other than the network and SSH, and stop SELinux interfering:
    # telinit 2
    # for SERVICE in \
    `chkconfig --list | grep 2:on | awk '{print $1}' | grep -v -e sshd -e network -e rawdevices`; \
    do service $SERVICE stop; done
    # service nfs stop
    # service rpcidmapd stop
    # setenforce 0

  2. Unmount all filesystems:
    # umount -a

  3. Create a temporary filesystem:
    # mkdir /tmp/tmproot
    # mount none /tmp/tmproot -t tmpfs
    # mkdir /tmp/tmproot/{proc,sys,usr,var,oldroot}
    # cp -ax /{bin,etc,mnt,sbin,lib} /tmp/tmproot/
    # cp -ax /usr/{bin,sbin,lib} /tmp/tmproot/usr/
    # cp -ax /var/{account,empty,lib,local,lock,nis,opt,preserve,run,spool,tmp,yp} /tmp/tmproot/var/
    # cp -a /dev /tmp/tmproot/dev
    Note that this used up about 1.6GB of ramdisk on my Red Hat Enterprise Linux (AS) 4 server.

    Also note that on 64-bit systems you will also need to copy /lib64 and /usr/lib64 as well, otherwise you will see errors like "lib64/ld-linux-x86-64.so.2: bad ELF interpreter: No such file or directory".

  4. Switch the filesystem root to the temporary filesystem:
    # pivot_root /tmp/tmproot/ /tmp/tmproot/oldroot
    # mount none /proc -t proc
    # mount none /sys -t sysfs (this may fail on 2.4 systems)
    # mount none /dev/pts -t devpts

  5. Restart the SSH daemon to close the old pty devices:
    # service sshd restart
    You should now try to make a new connection. If that succeeds, close your old one to release the old pty device. If it fails, get the SSH daemon properly restarted before proceeding.

  6. Close everything that's still using the old filesystem:
    # umount /oldroot/proc
    # umount /oldroot/dev/pts
    # umount /oldroot/selinux
    # umount /oldroot/sys
    # umount /oldroot/var/lib/nfs/rpc_pipefs
    Now try to find other things that are still holding on to the old filesystem, particularly /dev:
    # fuser -vm /oldroot/dev
    Common processes that will need killing:
    # killall udevd
    # killall gconfd-2
    # killall mingetty
    # killall minilogd
    Finally, you will need to re-execute init:
    # telinit u

  7. Unmount the old filesystem:
    # umount -l /oldroot/dev
    # umount /oldroot
    Note that we use the umount -l ("lazy") option, available only with kernels 2.4.11 and later, because /oldroot is actually mounted using an entry in /oldroot/dev, so it would be difficult if not impossible to unmount either of them otherwise.

  8. Now resize the root filesystem:
    # e2fsck -C 0 -f /dev/VolGroup00/LogVol00
    # resize2fs -p -f /dev/VolGroup00/LogVol00 8G
    # lvresize /dev/VolGroup00/LogVol00 -L 8G
    # resize2fs -p -f /dev/VolGroup00/LogVol00
    # e2fsck -C 0 -f /dev/VolGroup00/LogVol00
    In this example the root partition is /dev/VolGroup00/LogVol00 and it is being shrunk to 8GB. You don't necessarily have to run resize2fs twice, I just do in case my idea of the size differs from what lvresize thinks.

  9. We're done, so start putting everything back:
    # mount /dev/VolGroup00/LogVol00 /oldroot
    # pivot_root /oldroot /oldroot/tmp/tmproot
    # umount /tmp/tmproot/proc
    # mount none /proc -t proc
    # cp -ax /tmp/tmproot/dev/* /dev/
    # mount /dev/pts
    # mount /sys
    # killall mingetty
    # telinit u
    # service sshd restart
    Now make a new SSH connection, and if it works, close the old one. Note that sshd may still be running in the temporary filesystem at this point because of the way the service scripts work - check this with fuser, and if this is the case, kill the oldest sshd process and then do service sshd start. Then log in again and disconnect all other connections.

    Final steps to unmount the temporary filesystem:
    # umount -l /tmp/tmproot/dev/pts
    # umount -l /tmp/tmproot
    # rmdir /tmp/tmproot
    Now to re-mount our original filesystems and start services back up:
    # mount -a
    # umount /sys
    # mount /sys
    # for SERVICE in \
    `chkconfig --list | grep 2:on | awk '{print $1}' | grep -v -e sshd -e network -e rawdevices`; \
    do service $SERVICE start; done
    # telinit 3
    Replace 3 with your preferred runlevel. You may also want to start SELinux up again with setenforce.

The above has only been tested on RHEL AS 4, but something like it should work on most Linux variants that have pivot_root, tmpfs, and umount -l, so long as you can replace the chkconfig and service parts with whatever is appropriate for your distribution.



Update: Lucas Chan says, for CentOS 4.4, "I was not able to login after restarting sshd in step 5 until I did this: mount none /dev/pts -t devpts".



Update: Simetrical suggests that 64-bit systems also need to copy /lib64 and /usr/lib64, and that after pivot_root 2.6 kernels will also need mount none /sys -t sysfs and mount none /dev/pts -t devpts. (The above steps have been modified accordingly).

Don't release often

Until I released QSF 1.2.5 the other day, I'd forgotten one of the reasons I don't subscribe to the motto "release early, release often" - it's a pain in the arse. SourceForge really don't make it easy to release projects with multiple files, and they've also managed to hose the Compile Farm again so I can't produce anything other than Fedora Core 6 i386 binaries.

The other reason I don't follow The Great Prophet ESR is because I dislike wasting my time. Far too often I have looked for a package to do a particular job and ended up installing seven different half-finished pieces of complete garbage, none of which work, or have installed something that works one week and then, on upgrading to fix its many bugs, fails the next. Since the general philosophy of "be excellent to each other" implies that one should not do to others what you don't like having done to you, I would rather at least attempt to run some tests on my code before dumping it on the Internet, so as to not waste the time of users with buggy releases. Whether those tests work or not is another matter, but at least I give it a shot.

Then there's the co-operative development aspect. None of the projects I have ever worked on have particularly attracted other developers. I've had the odd bit of feedback, even the occasional patch, and have a couple of people managing Debian ports for some of my projects, for which I'm grateful - but I'm the only person making major changes to the codebase. I'm happy with this - it suits my temperament - but since "release often" is geared towards getting feedback and development, without developers or many users (and project mailing lists infested with tumbleweeds), it's a bit pointless in my case.

QSF 1.2.5 released

QSF version 1.2.5 has been released. This version fixes a bug in the new list backend which causes tokens to slowly be randomly deleted on update. This can include the special token that keeps track of token aging, so databases may grow uncontrollably.

Although version 1.2.5 fixes this bug it cannot restore the lost data, so unless you rebuild your databases by retraining them from scratch they will continue to be inaccurate until the newly accumulated data starts outweighing the old.

I had wondered why certain users' databases were getting so large, but just assumed that it was due to the massive volume of email those users' accounts were processing.

These graphs show training and classification times using QSF 1.2.5 with various different backends:

xwd-graph-train-758941.png

xwd-graph-class-750014.png
The "training" graph shows how much CPU time it takes to build a new database from a set of emails, displaying CPU time versus the number of emails in the training set. The "classification" graph shows the CPU time it takes to decide whether a certain number of messages are spam or not.

As you can see, the list backend seems to be the quickest, so it's a shame it had this great big bug in it. There are further optimisations still to do - in particular, deleting multiple tokens at once (such as during a pruning cycle) is very inefficient - but they will have to wait, as it isn't critical and I'm more than busy enough.