ACUCOBOL SEGV on RHEL4

This is one of those problems that searching the Internet throws up zero results on, so I thought I'd ensure that in 2 years when it happens to one other person in the world, they can find out how to fix it and save themselves a couple of weeks of head-scratching.

A virtual server I maintain runs Red Hat Enterprise Linux 4 (as a paravirtualised Xen guest), and is used to develop a COBOL application central to business operations. For various reasons, the version of COBOL is ACUCOBOL 6.2.

We'd acquired several new servers to be used as Xen hosts, all much newer and more powerful than the original host, so I decided to move the development VM over to one of them.

50GB+ of data transfer later, I fired up the domU in its new home, and it booted fine. Ran the application's main menu to test it, and - nothing. All I got was

Memory access violation
COBOL error at 000000 in [[program]]

On investigation with strace I found that the runtime interpreter was segfaulting immediately after loading the program, before even trying to run any COBOL code at all.

Running in HVM mode instead of PV, with a regular kernel in the domU instead of a Xen kernel, caused the problem to go away - but at the cost of terrible performance and only 1 CPU available. However, running an SMP or PAE kernel in HVM mode made the error come back.

It turns out that the problem was caused by the non-executable pages feature available on the newer Xen hosts' CPUs, which is enabled automatically in PAE and Xen kernels (and presumably the SMP kernel for RHEL4).

To fix it, even in PV mode with a Xen kernel, just add this to the domU's kernel command line in GRUB and reboot:

noexec=off

This disables the feature, enabling ACUCOBOL to run.

Samba auditing

Today I tried doing something I'd put off for ages because I thought it was going to be really tricky: enabling auditing on a Samba share so there is a log of who is creating, deleting, and editing each file (to track down mysteriously disappearing files).

One quick search and I found this: "Samba: Logging User Activity".

It turns out to be a case of adding this to the share definition in smb.conf:

vfs objects = full_audit
full_audit:priority = INFO
full_audit:facility = LOCAL1
full_audit:failure = none
full_audit:success = mkdir rename unlink rmdir pwrite
full_audit:prefix = %u|%U|%I|%m|%S

And that's all. Works fine in RHEL5's Samba (3.0.33). Change the syslog settings to whatever makes sense and update /etc/syslog.conf accordingly, and you have an audit trail.

PV 1.2.0

Version 1.2.0 of PV is released today. I've been putting it off for months and months, because there are still many things left to do, but it's been over 2 years since the last release so that was just silly. It's also not fair to the people who have contributed to be made to wait so long for their changes to be put into the main release.

The change log is:

  • integrated improved SI prefixes and --average-rate (Henry Gebhardt)
  • return nonzero if exiting due to SIGTERM (Martin Baum)
  • patch from Phil Rutschman to restore terminal properly on exit
  • fix i18n especially for --help (Sebastian Kayser)
  • refactored pv_display
  • we now have a coherent, documented, exit status
  • modified pipe test and new cksum test from Sebastian Kayser
  • default CFLAGS to just "-O" for non-GCC (Kjetil Torgrim Homme)
  • LFS compile fix for OS X 10.4 (Alexandre de Verteuil)
  • remove DESTDIR / suffix (Sam Nelson, Daniel Pape)
  • fixed potential NULL deref in transfer (Elias Pipping / LLVM/Clang)

The main user-visible changes are the new -a / --average-rate option from Henry Gebhardt which gives a much more sensible display for "bursty" traffic, and a consistent, documented exit status so it's easier for scripts to tell when there has been an error.

Term::VT102 0.91

It's been a while, but now there's a new version of Term::VT102. A few people have contacted me about the module over the past few weeks, and then Jörg Walter sent a patch to fix Unicode handling, which resurrected my interest in clearing a few of the TODOs from the list.

So, I cleaned it up a bit and extended the example scripts enough that I could effectively use Term::VT102 as a terminal emulator, and ran things like top and mutt within it to see how it handled. As a result I've fixed a few bugs in escape sequence handling and line wrapping as well as adding TAB stop support and callbacks for title changes and other private message strings.

There is also now an example script to show scrollback buffer processing for things like converting script logs or screen history into a flat file you can read with less without all the cursor positioning stuff getting in the way.

PV 1.1.4

I've finally got around to releasing version 1.1.4 of PV. Elias Pipping and Patrick Collison have been sending patches to improve compilation on Mac OS X, and there are a couple of minor cleanups: left-over IPC resources are cleaned up on termination thanks to Laszlo Ersek, and if you supply a non-numeric argument to an option that needs a number, you now get an error thanks to Boris Lohner.

RHEL 5 intermittent segfaults

For the past couple of months, on 12 servers, I have been seeing intermittent segmentation faults happening with the ssh, scp, and ntpstat commands. Those servers that weren't brand new had not exhibited that behaviour with RHEL 4 in the past, it was only when Red Hat Enterprise Linux 5 was installed that it began.

2 additional servers running RHEL 5 were not showing the same fault, but they weren't of the same type - all affected servers were IBM xSeries or System X with multiple processors and various model numbers, and all had ServeRAID cards.

I couldn't find any mention of such a fault anywhere except for on a CentOS bug tracker, bug ID 0002241.

After a few tests, it turned out that ntpstat would fail to run about 10 times in every 50000, or 0.02% of the time. Each failure, according to strace, was not actually with the program itself but with the attempt to run it - the execve call, which causes the program to be executed, was failing with an EINVAL error code, indicating some sort of problem to do with the ELF interpreter.

The only thing I could think of that would modify that sort of thing, and which would be nullified by the "replace RPMs with new ones, then replace new RPMs with old ones again" fix that the reporter described in the CentOS bug report, was prelink.

So, I turned it off, by editing /etc/sysconfig/prelink and by running prelink -au. Immediately after doing that, ntpstat worked 100% of the time instead of 99.98%.

I'm presuming that something to do with prelink's address space randomisation was breaking stuff on the servers I'm using, but I am not in a position to test that or to try to find a proper fix, so for now it remains disabled.

In summary then, if you're having weird random segmentation faults and you're sure it's not a fault with your RAM (having tried Memtest86 and Lucifer to check), then run:
prelink -au
sed -i s/^PRELINKING=yes/PRELINKING=no/ /etc/sysconfig/prelink
...and see if your problems disappear.



Update: I now have the results of testing with different parameters to prelink:

Options to prelinkTest resultsSuccess
-au50000/50000100.00%
-amR49988/5000099.98%
-aR49986/5000099.97%
-am50000/50000100.00%

Each test run had prelink -au run after it followed by another test to make sure success went back to 100%.

Basically, the -R option to prelink seems to be the one that's cocking everything up.



Update: Kernel 2.6.18-53.1.4.el5 appears to fix this problem.

PV 1.1.0

Version 1.1.0 of PV has been released. This release incorporates some fixes for Mac OS X, a couple of packaging cleanups, a dramatic improvement in the resource usage of the --rate-limit (-L) option, and two new features.

The first new feature to be added, --line-mode (-l) was a Debian wishlist request. This causes PV to count lines instead of bytes. While it's not something I have ever particularly wanted myself, it does sound like it might come in handy occasionally (and, more importantly, it didn't require much to be added to the code to make it work).

The second was one that I have occasionally found myself wanting, particularly during long network data transfers. The --remote (-R) option allows the settings of an already-running PV to be altered. This can be used to change the rate limit while a transfer is in progress, for example, or set PV's idea of the total size of all data to something different.

QSF 1.2.7

Version 1.2.7 of QSF has been released. Like the recent PV release, this was prompted by inclusion in the Fedora Project and the resultant need to change the license to Artistic 2.0.

QSF's development is, again like PV, moving from SourceForge to Google Code.

PV 1.0.1

Version 1.0.1 of PV has been released. This is a code cleanup release, prompted by the discovery that PV has been included in the Fedora Project - version 0.9.9 is available now in FC7 and as an "extra" package in FC6.

It can be interesting to go back to old code and see how the style has changed over time. With a fresh perspective, a few oddities were more obvious, so the occasional untidy section was rewritten and a few more comment blocks were added. The organisation of the functions was changed a bit so that the "command-line program" part is now distinct from the "PV functionality" part, which means if I decide in future to create a library to add progress indicators to other command-line programs it will be significantly easier. Not that it's likely, but it seemed to make things neater.

Packages in use

A command line to find which RPM packages are in use by the system at this very moment. This can be useful if you are in the process of determining which packages to remove from a system that has a lot of unnecessary software installed, but you're also running nonstandard software such as the Sun JRE so you can't be sure that RPM's dependency tracking is enough.

To do this, we look at all libraries currently mapped in place by all running processes, as well as the file each process is executing, and then look at which RPMs those files belong to.
awk '{print $NF}' /proc/*/maps \
| sort - <(for A in /proc/*/exe; do readlink $A; done) \
| uniq \
| grep / \
| while read FILE; do rpm -q --queryformat='%{NAME}\n' -f $FILE 2>/dev/null; done \
| grep -v 'is not owned' \
| sort \
| uniq

You can omit the --queryformat='%{NAME}\n' part if you want the RPM version numbers to be included, in case you have multiple versions of some packages installed.

This has only been tested on Red Hat Enterprise Linux 5. If you don't have readlink, you could try ls -l $A | awk '{print $NF}' instead.

Note that this will only catch dependencies that are memory mapped, such as system libraries. It won't catch files that are only read occasionally or which aren't memory mapped.