31.7.16

Linux Processes: Part 1

A running instance of a program is called a PROCESS. If you have two terminals showing on your screen, then you are probably running the same terminal program twice — you have two independent terminal processes. Each terminal is probably running a shell; each running shell is another process. When you invoke a command from a shell, the corresponding program is executed in a new process; the shell process resumes when that process completes.

Advanced programmers often use multiple cooperating processes in a single application to enable the application to do more than one thing at once, to increase application robustness, and to make use of already-existing programs.

Most of the process manipulation functions described in this post are similar to those on other UNIX systems. Most are declared in the header file "unistd.h"; check the man page for each function to be sure.

1. Looking at Processes

Even as you sit down at your computer, there are processes running. Every executing program uses one or more processes. Let’s start by taking a look at the processes already on your computer.

1.1 What are the process IDs?

Each process in a Linux system is identified by its unique process ID, sometimes referred to as pid. Process IDs are unsigned 16-bit integer numbers that are assigned sequentially by Linux as new processes are created.

Every process also has a parent process (except the special init process, described in section “Zombie Processes”). Thus, you can think of the processes on a Linux system as arranged in a tree, with the init process at its root. The parent process ID, or ppid, is simply the process identity of the process’s parent.

When referring to process IDs in a C or C++ program, always use the pid_t typedef, which is defined in "sys/types.h". A program can obtain the process ID of the process it’s running in with the getpid() system call, and it can obtain the process ID of its parent process with the getppid() system call. For instance, the program in Listing 1 prints its process ID and its parent’s process ID.

#include "stdio.h"
#include "unistd.h"

int main(int argc, char** argv) {
    printf( "The process id is: '%i'\n", getpid() );
    printf( "The parent process id is: '%i'\n", getppid() );
    return 0;
}

Observe that if you invoke this program several times, a different process ID is reported because each invocation is in a new process. However, if you invoke it every time from the same shell, the parent process ID (that is, the process ID of the shell process) is the same.

1.2 Viewing active processes

The ps command displays the processes that are running on your system. The GNU/Linux version of ps has lots of options because it tries to be compatible with versions of ps on several other UNIX variants. These options control which processes are listed and what information about each is shown.

By default, invoking ps displays the processes controlled by the terminal or terminal window in which ps is invoked. For example:

  PID TTY       TIME CMD
21693 pts/8 00:00:00 bash
21694 pts/8 00:00:00 ps

This invocation of ps shows two processes.The first, bash, is the shell running on this terminal. The second is the running instance of the ps program itself. The first column, labeled PID, displays the process ID of each one.

For a more detailed look at what’s running on your GNU/Linux system, invoke this:

 % ps -e -o pid,ppid,command 

The -e option instructs ps to display all processes running on the system.The -o pid,ppid,command option tells ps what information to show about each process — in this case, the process ID, the parent process ID, and the command running in this process.

ps Output Formats
With the -o option to the ps command, you specify the information about processes that you want in the output as a comma-separated list. For example, ps -o pid,user,start_time,command displays the process ID, the name of the user owning the process, the wall clock time at which the process started, and the command running in the process. See the man page for ps for the full list of field codes. You can use the -f (full listing), -l (long listing), or -j (jobs listing) options instead to get three different preset listing formats.

Here are the first few lines and last few lines of output from this command on my system.You may see different output, depending on what’s running on your system.

   PID PPID  COMMAND
     1    0  init [5]
 ....................
 21724 21693 terminator
 21727 21725 bash
 21728 21727 ps -e -o pid,ppid,command

Note that the parent process ID of the ps command, 21727, is the process ID of bash, the shell from which I invoked ps.The parent process ID of bash is in turn 21725, the process ID of the terminator (variation on terminal wrapper) program in which the shell is running.

Anyway we are talking too much on process creation and viewing it's current state, but what about...

1.3 Creating process using fork and exec

UNIX family of operating system provides set of functions, fork(), that makes a child process that is an exact copy of its parent process. Linux provides another set of functions, the exec() family, that causes a particular process to cease being an instance of one program and to instead become an instance of another program. To spawn a new process, you first use fork to make a copy of the current process. Then you use exec to transform one of these processes into an instance of the program you want to spawn.

When a program calls fork, a duplicate process, called the child process, is created. The parent process continues executing the program from the point that fork was called. The child process, too, executes the same program from the same place.

So how do the two processes differ? First, the child process is a new process and therefore has a new process ID, distinct from its parent’s process ID. One way for a program to distinguish whether it’s in the parent process or the child process is to call getpid. However, the fork function provides different return values to the parent and child processes—one process "goes in" to the fork call, and two processes "come out" with different return values. The return value in the parent process is the process ID of the child. The return value in the child process is zero. Because no process ever has a process ID of zero, this makes it easy for the program whether it is now running as the parent or the child process.

The next listing is an example of using fork to duplicate a program’s process. Note that the first block of the if statement is executed only in the parent process, while the else clause is executed in the child process.

#include "stdio.h"
#include "sys/types.h"
#include "unistd.h"

int main (int argc, char** argv) {
    pid_t child_pid;

    printf ("The main program process ID is %d\n", getpid ());
    child_pid = fork ();

    if (child_pid != 0) {
        printf ("This is the parent process, with id: [%i]\n", (int) getpid ());
        printf ("The child’s process ID is: [%i]\n", child_pid);
    } else {
        printf ("This is the child process, with id: [%i]\n", getpid ());
    }

    return 0;
}

The exec functions replace the program running in a process with another program. When a program calls an exec function, that process immediately ceases executing that program and begins executing a new program from the beginning, assuming that the exec call doesn’t encounter an error.

Within the exec family, there are functions that vary slightly in their capabilities and how they are called.

  • Functions that contain the letter 'p' in their names (execvp and execlp) accept a program name and search for a program by that name in the current execution path; functions that don’t contain the 'p' must be given the full path of the program to be executed.
  • Functions that contain the letter 'v' in their names (execv, execvp, and execve) accept the argument list for the new program as a NULL-terminated array of pointers to strings.
  • Functions that contain the letter 'l' (execl, execlp, and execle) accept the argument list using the C language’s varargs mechanism.
  • Functions that contain the letter 'e' in their names (execve and execle) accept an additional argument, an array of environment variables.The argument should be a NULL-terminated array of pointers to character strings. Each character string should be of the form "VARIABLE=value".

Because exec replaces the calling program with another one, it never returns unless an error occurs.

The argument list passed to the program is analogous to the command-line arguments that you specify to a program when you run it from the shell. They are available through the argc and argv parameters to main. Remember, when a program is invoked from the shell, the shell sets the first element of the argument list (argv[0]) to the name of the program, the second element of the argument list (argv[1]) to the first command-line argument, and so on. When you use an exec function in your programs, you too should pass the name of the function as the first element of the argument. list.

A common pattern to run a subprogram within a program is first to fork the process and then exec the subprogram.This allows the calling program to continue execution in the parent process while the calling program is replaced by the subprogram in the child process.

The next example shows invokes the ls command directly, passing it the command-line arguments -l and / rather than invoking it through a shell.

#include "stdio.h"
#include "stdlib.h"
#include "sys/types.h"
#include "unistd.h"

/* Spawn a child process running a new program. PROGRAM is the name
of the program to run; the path will be searched for this program.
ARG_LIST is a NULL-terminated list of character strings to be
passed as the program’s argument list. Returns the process ID of
the spawned process. */
int spawn (char* program, char** arg_list) {
    pid_t child_pid;
    
    /* Duplicate this process. */
    child_pid = fork ();

    if (child_pid != 0) {
       /* This is the parent process. */
       return child_pid;
    } else {
       /* Now execute PROGRAM, searching for it in the path. */
       execvp (program, arg_list);
       /* The execvp function returns only if an error occurs. */
       fprintf (stderr, "an error occurred in execvp\n");
       abort ();
    }
}

int main (int argc, char** argv) {
    /* The argument list to pass to the "ls" command. */
    char* arg_list[] = {
        "ls", /* argv[0], the name of the program. */
        "-l",
        "/",
        NULL /* The argument list must end with a NULL. */
    };
    
    /* Spawn a child process running the "ls" command. Ignore the returned child process ID. */
    spawn ("ls", arg_list);
    printf ("done with main program\n");
    return 0;
}

1.4 Killing process

You can kill a running process with the kill command. Simply specify on the command line the process ID of the process to be killed.

The kill command works by sending the process a SIGTERM, or termination, signal. This causes the process to terminate, unless the executing program explicitly handles or masks the SIGTERM signal. Signals are described in part two of this article.

No comments:

Post a Comment