31.7.16

Linux Processes: Part 2

Now it's time to talk about signals that are part of POSIX speicfication. This is very important, because we cannot go any further without signals theory.

2. Signals

Signals are mechanisms for communicating with and manipulating processes in Linux. The topic of signals is a large one; here we discuss some of the most important signals and techniques that are used for controlling processes.

A signal is a special message sent to a process. Signals are asynchronous; when a process receives a signal, it processes the signal immediately, without finishing the current function or even the current line of code. There are several dozen different signals, each with a different meaning. Each signal type is specified by its signal number, but in programs, you usually refer to a signal by its name. In Linux, these are defined in /usr/include/bits/signum.h. (You shouldn’t include this header file directly in your programs; instead, use "signal.h".)

When a process receives a signal, it may do one of several things, depending on the signal’s disposition. For each signal, there is a default disposition, which determines what happens to the process if the program does not specify some other behavior. For most signal types, a program may specify some other behavior—either to ignore the signal or to call a special signal-handler function to respond to the signal. If a signal handler is used, the currently executing program is paused, the signal handler is executed, and, when the signal handler returns, the program resumes.

The Linux system sends signals to processes in response to specific conditions. For instance, SIGBUS (bus error), SIGSEGV (segmentation violation), and SIGFPE (floating point exception) may be sent to a process that attempts to perform an illegal operation. The default disposition for these signals it to terminate the process and produce a core file.

A process may also send a signal to another process. One common use of this mechanism is to end another process by sending it a SIGTERM or SIGKILL signal. What’s the difference between SIGTERM and SIGKILL? The SIGTERM signal asks a process to terminate; the process may ignore the request by masking or ignoring the signal. The SIGKILL signal always kills the process immediately because the process may not mask or ignore SIGKILL.

Another common use is to send a command to a running program. Two "userdefined" signals are reserved for this purpose: SIGUSR1 and SIGUSR2. The SIGHUP signal is sometimes used for this purpose as well, commonly to wake up an idling program or cause a program to reread its configuration files.

The sigaction function can be used to set a signal disposition. The first parameter is the signal number. The next two parameters are pointers to sigaction structures; the first of these contains the desired disposition for that signal number, while the second receives the previous disposition.The most important field in the first or second sigaction structure is sa_handler. It can take one of three values:

  • SIG_DFL, which specifies the default disposition for the signal.
  • SIG_IGN, which specifies that the signal should be ignored.
  • A pointer to a signal-handler function.The function should take one parameter, the signal number, and return void.

Because signals are asynchronous, the main program may be in a very fragile state when a signal is processed and thus while a signal handler function executes. Therefore, you should avoid performing any I/O operations or calling most library and system functions from signal handlers.

A signal handler should perform the minimum work necessary to respond to the signal, and then return control to the main program (or terminate the program). In most cases, this consists simply of recording the fact that a signal occurred.The main program then checks periodically whether a signal has occurred and reacts accordingly.

It is possible for a signal handler to be interrupted by the delivery of another signal. While this may sound like a rare occurrence, if it does occur, it will be very difficult to diagnose and debug the problem. Therefore, you should be very careful about what your program does in a signal handler.

Even assigning a value to a global variable can be dangerous because the assignment may actually be carried out in two or more machine instructions, and a second signal may occur between them, leaving the variable in a corrupted state. If you use a global variable to flag a signal from a signal-handler function, it should be of the special type sig_atomic_t. Linux guarantees that assignments to variables of this type are performed in a single instruction and therefore cannot be interrupted midway. In Linux, sig_atomic_t is an ordinary int; in fact, assignments to integer types the size of int or smaller, or to pointers, are atomic. If you want to write a program that’s portable to any standard UNIX system, though, use sig_atomic_t for these global variables.

This program skeleton listed below, for instance, uses a signal-handler function to count the number of times that the program receives SIGUSR1, one of the signals reserved for application use.

#include "signal.h"
#include "stdio.h"
#include "string.h"
#include "sys/types.h"
#include "unistd.h"

static sig_atomic_t sigusr1_count = 0;

void handler (int signal_number) {
    ++sigusr1_count;
}

int main (int argc, char** argv) {
    struct sigaction sa;
    memset (&sa, 0, sizeof (sa));

    sa.sa_handler = &handler;
    sigaction (SIGUSR1, &sa, NULL);

    /* Do some cool stuff here. */
    /* ... */

    printf ("SIGUSR1 was raised [%i] times\n", sigusr1_count);
    return 0;
}

2.1 Process termination

Normally, a process terminates in one of two ways. Either the executing program calls the exit function, or the program’s main function returns. Each process has an exit code: a number that the process returns to its parent. The exit code is the argument passed to the exit function, or the value returned from main.

A process may also terminate abnormally, in response to a signal. For instance, the SIGBUS, SIGSEGV, and SIGFPE signals mentioned previously cause the process to terminate. Other signals are used to terminate a process explicitly.The SIGINT signal is sent to a process when the user attempts to end it by typing Ctrl+C in its terminal.The SIGTERM signal is sent by the kill command. The default disposition for both of these is to terminate the process. By calling the abort function, a process sends itself the SIGABRT signal, which terminates the process and produces a core file.The most powerful termination signal is SIGKILL, which ends a process immediately and cannot be blocked or handled by a program.

Any of these signals can be sent using the kill command by specifying an extra command-line flag; for instance, to end a troublesome process by sending it a SIGKILL, invoke the following, where pid is its process ID:

 % kill -KILL pid
Or more generic way just use number of signal instead of name:
 % kill -9 pid

To send a signal from a program, use the kill function.The first parameter is the target process ID. The second parameter is the signal number; use SIGTERM to simulate the default behavior of the kill command. For instance, where child pid contains the process ID of the child process, you can use the kill function to terminate a child process from the parent by calling it like this:

 kill (child_pid, SIGTERM);

Always include the "sys/types.h" and "signal.h" headers if you use the kill function.

By convention, the exit code is used to indicate whether the program executed correctly. An exit code of zero indicates correct execution, while a nonzero exit code indicates that an error occurred. In the latter case, the particular value returned may give some indication of the nature of the error. It’s a good idea to stick with this convention in your programs because other components of the GNU/Linux system assume this behavior. For instance, shells assume this convention when you connect multiple programs with the && (logical and) and || (logical or) operators. Therefore, you should explicitly return zero from your main function, unless an error occurs.

Note that even though the parameter type of the exit function is int and the main function returns an int, Linux does not preserve the full 32 bits of the return code. In fact, you should use exit codes only between zero and 127. Exit codes above 128 have a special meaning—when a process is terminated by a signal, its exit code is 128 plus the signal number.

2.2 Waiting for process termination

If you typed in and ran the fork and exec example in part 1 paragraph 1.2, where we tried to create forked process, you may have noticed that the output from the ls program often appears after the "main program" has already completed. That’s because the child process, in which ls is run, is scheduled independently of the parent process. Because Linux is a multitasking operating system, both processes appear to execute simultaneously, and you can’t predict whether the ls program will have a chance to run before or after the parent process runs.

In some situations, though, it is desirable for the parent process to wait until one or more child processes have completed. This can be done with the wait family of system calls. These functions allow you to wait for a process to finish executing, and enable the parent process to retrieve information about its child’s termination. There are three different system calls in the wait family; you can choose to get a little or a lot of information about the process that exited, and you can choose whether you care about which child process terminated.

The simplest such function is called simply wait. It blocks the calling process until one of its child processes exits (or an error occurs). It returns a status code via an integer pointer argument, from which you can extract information about how the child process exited. For instance, the WEXITSTATUS macro extracts the child process’s exit code.

You can use the WIFEXITED macro to determine from a child process’s exit status whether that process exited normally (via the exit function or returning from main) or died from an unhandled signal. In the latter case, use the WTERMSIG macro to extract from its exit status the signal number by which it died.

Here is the main function from the fork and exec example again. This time, the parent process calls wait to wait until the child process, in which the ls command executes, is finished:

#include "stdio.h"
#include "stdlib.h"
#include "sys/types.h"
#include "unistd.h"

/* Spawn a child process running a new program. PROGRAM is the name
of the program to run; the path will be searched for this program.
ARG_LIST is a NULL-terminated list of character strings to be
passed as the program’s argument list. Returns the process ID of
the spawned process. */
int spawn (char* program, char** arg_list) {
    pid_t child_pid;
    
    /* Duplicate this process. */
    child_pid = fork ();

    if (child_pid != 0) {
       /* This is the parent process. */
       return child_pid;
    } else {
       /* Now execute PROGRAM, searching for it in the path. */
       execvp (program, arg_list);
       /* The execvp function returns only if an error occurs. */
       fprintf (stderr, "an error occurred in execvp\n");
       abort ();
    }
}

int main (int argc, char** argv) {
    int child_status;

    /* The argument list to pass to the "ls" command. */
    char* arg_list[] = {
        "ls", /* argv[0], the name of the program. */
        "-l",
        "/",
        NULL /* The argument list must end with a NULL. */
    };
    
    /* Spawn a child process running the "ls" command. Ignore the returned child process ID. */
    spawn ("ls", arg_list);

    /* Wait for the child process to complete. */
    wait (&child_status);

    if (WIFEXITED (child_status)) {
        printf ("the child process exited normally, with exit code [%i]\n", WEXITSTATUS (child_status));
    } else {
        printf ("the child process exited abnormally\n");
    }

    return 0;
}

Several similar system calls are available in Linux, which are more flexible or provide more information about the exiting child process. The waitpid function can be used to wait for a specific child process to exit instead of any child process. And there are two waitpid functions. The first waitpid function returns CPU usage statistics about the exiting child process, and the second one waitpid function allows you to specify additional options about which processes to wait for.

In the third, and not the last, part of this article we will talk about process states and zombie processes.

No comments:

Post a Comment