Process accounting enables you to keep detailed
accounting information for the system resources used, their allocation among
users, and system monitoring.
(Enabling Process
Accounting on Linux HOWTO)
You will need to install the 'psacct' package from the
first Mandrake Linux CD which will in turn install a system service of the same name. Documentation (and
an amusing bit of Unix lore) can then be found in info accounting.
Both 'Webmin' and 'Linuxconf' offer graphical modules to configure and monitor
process accounting.
Notice that the 'psacct' daemon script in Mandrake Linux
8.1 has a smallbug.
section index top
Estimating the resource usage, especially the memory
consumption of processes is by far more complicated than it looks like at
a first glance.
How much resources a process needs, depends on many factors, most of them
varying. One is the amount of physical RAM in the system. If there is much
free RAM available, more caching will be performed and thus more memory consumed.
However this doesn't really count as resource usage, since this cached memory
is available in case some other process needs it. Up to 20% of system memory
can be used for buffering and caching. This dramatically improves disk accesses,
so you don't 'loose' anything when memory is assigned to this task. Only unused
memory is wasted ;-).
Graphical applications depend on certain collections
of code libraries for displaying their interface (buttons, menus etc). Common
collections are GTK+ (GIMP, GNOME), Qt (KDE), Tcl/Tk and others. If you use
applications of many different collections, all these libraries have to be
loaded into system memory, thus using more resources. An application written
with Qt for KDE will use more system resources in GNOME than on KDE. If you
start a second Qt application, however, it will use about as much resources
as on KDE, since the first application already did load the needed interface
libraries. If you close both applications, not all resources will be freed
up again, instead the loaded libraries will be cached (if the amount of free
RAM permits it).
The numbers for each process presented by process monitors
tend to mislead and are easily misinterpreted. On the previous page you've
already read about threads. But even if the process doesn't spawn threads,
it is in most cases almost impossible to determine how much system memory
an applicationreally consumes.
Let me give a simple example:
- top: Mem: 900040K av, 411864K used, 488176K free,
1048K shrd, 58592K buff, 166816K cac
- Starting the 'bluefish' HTML editor.
- top entry for bluefish: 4716 (size) 4716 (real) 3580
(shared).
top entry for system: Mem: 900040K av, 414268K used, 485772K free, 1432K shrd,
58620K buff, 167592K cac
According to the process table entry, 'bluefish' uses
4716 KB of real memory and 3580 KB of shared memory. The system however only
reports an increase of 2404 KB general memory usage, of these being
400 KB more in shared memory, 30 KB more in buffers and 600 KB more in cache.
Regardless of how you add or subtract these numbers, you will never get the
numbers from the process entry and the system numbers in sync.
I rather trust the system numbers, but even they can
be misinterpreted quite easily:
42 processes: 41 sleeping, 1 running, 0 zombie, 0 stopped CPU states: 0.9% user, 1.3% system, 0.0% nice, 97.6% idle Mem: 257676K av, 252940K used, 4736K free, 202852K shrd, 7464K buff Swap: 130748K av, 256K used, 130492K free, 197620K cached
What usually confuses people here is the cache and buffer
management of Linux. Let's have a look at it:
Mem: 257676K av, 252940K used, 4736K free, 202852K shrd, 7464K buff Swap: 130748K av, 256K used, 130492K free 197620K cached
'Mem' shows some 252 MB RAM installed. In fact it's 256
MB. What happened to the remaining 4 MB? Simple: they are used for 'shadowing'
the system's BIOS and contain the GNU/Linux kernel.
The next fields seem to indicate that the system is on the brink of collapse:
almost all of the memory seems to be used up. Is that so? No, it isn't. Look
at the last entries of those lines. These tell you that about 200 MB of system
memory are used for caching and buffering. This memory is available for every
application which needs it.
The 'free' command line tool gives a more comprehensible overview:
$ free
total used free [...]
Mem: 257676 253624 4052 [...]
-/+ buffers/cache: 50360 207316
Swap: 130748 256 130492
Have a look at the third line:
-/+ buffers/cache: 50360 207316
displaying again the actual amount of memory in use by
applications (50 MB) and free (202 MB). The rest is used for caching and buffering,
i.e. to make your system faster.
To estimate the resource consumption of a process, it's
easiest to run 'free' before, during and after running the process. The figures
will vary because of the already mentioned reasons, so you should do this
several times to get an average.
section index top
PAM (Pluggable Authentication Modules, man 8 pam)
offers a mechanism to set limits to resource usage, setting limits via shell
mechanisms ('ulimit' in bash, 'limit' in csh etc) is deprecated.
You configure these limits as 'root' in '/etc/security/limits.conf'. Among
other things you can set the maximum number of processes per user or group,
process priorities, maximum CPU time and more.
Parameters and examples are supplied in the configuration file.
More information can be found in the
appropriate chapter of the L-PAMSAG.
section index top
If a process behaves badly, the kernel sends it a certain
signal (man 7 signal) to terminate it.
The most infamous signal is SIGNAL
11 (alias SIGSEGV, alias 'segmentation fault'). The process has tried
to access a memory segment which hasn't been allocated to it.
Reasons for a segfault might be a programming error, a wrong library version
or a hardware failure. The default action of a process when receiving a Signal
11, is to terminate itself and write the contents of the system memory to
a 'core' file (in Unix slang, the process 'dumps the core'). This core file
can be useful to a programmer for debugging. Notice that Mandrake Linux by
default disables the creation of core files.
Things get a bit hairy however when the process is that
bad that it ignores this signal and starts to hoard all the available processor
time on the system, thus slowing down ('starving out') all the other processes.
The kernel marks this process as 'uninterruptible' and its status in the process
table will be switched from 'R' or 'S' to 'D'. If you see a process entry
with a 'D' in its STATUS row, it's definitely time for you to do something.
Almost all signals can be handled by processes which
means that they can also be ignored by them. There is however one signal which
can't be ignored by any process and that's SIGNAL 9 (SIGKILL), the appropriately
named 'kill signal'. If you send that signal to a process, you will quite
literally kill it by taking away all its resources and removing it from the
process table. You can then safely go on in your daily work. All unsaved
data of that process will be lost, too, though.
Most system monitors offer you a possibility to kill
a process via the right mouse click context menu (in console 'top', press
the<k> key). There is one problem though: processes run with
the permissions of the user who started them. A user can only kill processes
he himself has started, only 'root' can kill all processes. Your system monitoring
program will and should usually run under your user account which means you
can't kill processes from other users or 'root' via it.
If a process started by root or another user misbehaves,
you have to switch to the 'root' account on a console using the su
command. You can then either kill the process by its process ID (PID) or by
its name.
If you don't know the PID of the process yet, you can find out with
# ps aux --sort %cpu
which lists the most CPU hungry process last, the PID
being the leftmost number. You can then go on to actually kill the process
with the 'kill' command:
# kill -s 9 PID
You can also kill a process by name. Notice that this
can be dangerous, since you either might accidentally kill processes you didn't
intend to kill (or none at all). The command is
# killall -s 9 name
If you ever happen to come across a non-Linux Unix system,
makesure to read the man page for 'killall' on that system. Some
Unixes take this command literally which can get very ugly ...
A rather harmless type of a bad process is the so-called
'Zombie' (STATUS: Z, 'defunct'). If you've read the previous page, you know
that processes can have 'children', i.e. spawn new processes. If the parent
process is terminated normally, it will send a termination signal to all its
children signals, too. If however the parent is terminated abnormally, it
might not have gotten around to send its children the termination signal.
Without their parent process these processes turn into 'living dead': they
still appear in the process table, but they don't use any resources and there
is no way of 'reaching' these processes. Unlike their namesakes, process zombies
don't do any harm. They will vanish from the process table upon the next
reboot.
section index top
|