March 2019

The table of processes (top) utility is a task manager that's been around in some form since the 1980s. My introduction to it in that era came from a system administrator who described it as "The hi-tech 'ps' alternative". The utility has been rewritten over the years as various *nix systems have come and gone. Today, most Linux distros pull top from the procps-ng package. This is the version I'll focus on in this project.

My goal is to make this resource useful for seasoned developers. I'll highlight important OS interactions since these are applicable to many current (and future!) utilities. This resource cannot substitute for reading the code, but it can save you a lot of time on your first attempt.

The main topics include:

Data Sources Signals Data Structures Execution Optimization Namespace (Globals, Functions, Callback macros)

Snapshot

[Source] [Code Walkthrough]

Version: procps-ng 3.3.15+ (NEXT - commit: dbe12b54 [Jan 2019])

Lines of code: 6522

Functions: 115

Callback macros: 54



Full Disclosure: I am not a procps contributor -- all credit goes to James C. Warner

Defer to the official Documentation or the development community for any conflicting information

Part 0 - Introductory Presentation

I've put together a ~10 minute presentation that cuts across the utility by example. We trace the load averages from kernel, through /proc, into top, and to the screen.

Part 1 - Data Sources

top pulls data (almost) entirely from the /proc file system. Indirectly, /proc pulls from many data structures and global variables around the kernel. In my opinion, understanding the OS interactions of 'top' is possibly the most important idea that extends to other projects.

I've itemized the rougly 75 data sources used in top. At a minimum, I've included the /proc data source and associated 3.10 kernel declarations. Most summary kernel statistics are declared as globals within respective subsystems and exported (symbol made visible) to other subsystems.

For completeness, this strace of top definitively identifies all interactions with the OS on my system for the initial update frame. I ran top and pressed 'q' after about 1 second. Picking apart an strace is always very enlightening and worthly of an article by itself. I'll save that for another day.

Summary Information

Data /proc Kernel CPU idle time /proc/stat Scheduler (dec, def) CPU hw-int time /proc/stat Scheduler (dec, def) CPU sw-int time /proc/stat Scheduler (dec, def) CPU kernel time /proc/stat Scheduler (dec, def) CPU nice time /proc/stat Scheduler (dec, def) CPU steal time /proc/stat Scheduler (dec, def) CPU user time /proc/stat Scheduler (dec, def) Load averages /proc/loadavg Scheduler (dec/def) Mem buffers /proc/meminfo File System (derived) Mem free /proc/meminfo Memory Manager (dec/def) Mem total /proc/meminfo Memory Manager (dec/def) Mem used N/A (derived) N/A Swap free /proc/meminfo Memory Manager (dec) Swap total /proc/meminfo Memory Manager (dec) Swap used N/A (derived) N/A Task Counts N/A (derived) N/A Uptime /proc/uptime System Timer (derived) User count N/A (utmp file) N/A

Task Information

Most per-task data are available through the stat , statm , and status entries under /proc/[pid#]. You can use the retrieval function mappings to trace back most process fields. The kernel display functions for the big three are:

For process data within the kernel, most roads lead back to the task_struct.

Part 2 - Signals

top rewires default signals to several custom handlers for two purposes:

The window may have been resized either directly or while paused. We need a custom signal handler to set the global flag to force a full window and task field recalculation on the next update. Signals and handlers affected: sig_resize() - SIGWINCH, SIGCONT Since top overrides terminal settings at start up, the original settings need to be restored if execution is paused or terminated. Otherwise, the terminal may be in an unusable state after returning to the shell. The map of signals to handlers is: sig_endpgm() - SIGALRM, SIGHUP, SIGINT, SIGPIPE, SIGQUIT, SIGTERM, SIGUSR1, SIGUSR2

- SIGALRM, SIGHUP, SIGINT, SIGPIPE, SIGQUIT, SIGTERM, SIGUSR1, SIGUSR2 sig_paused() - SIGSTP, SIGTTIN, SIGTTOU

- SIGSTP, SIGTTIN, SIGTTOU sig_abexit() - All others

Here's an annotated look at the code for all four signal handlers:

static void sig_endpgm (int dont_care_sig) { sigset_t ss; sigfillset(&ss); /* Blocking all signals */ sigprocmask(SIG_BLOCK, &ss, NULL); Frames_signal = BREAK_sig; /* Flag to force full terminal recompute */ bye_bye(NULL); /* Full graceful exit (See Part 6) */ (void)dont_care_sig; /* Silence compiler warnings */ }

static void sig_abexit (int sig) { sigset_t ss; sigfillset(&ss); /* Blocking all signals */ sigprocmask(SIG_BLOCK, &ss, NULL); at_eoj(); /* Restore TTY and show debug log */ fprintf(stderr, N_fmt(EXIT_signals_fmt) , sig, signal_number_to_name(sig), Myname); signal(sig, SIG_DFL); /* Redirect to default */ raise(sig); /* Execute default */ _exit(sig | 0x80); /* Error exit */ }

static void sig_resize (int dont_care_sig) { tcdrain(STDOUT_FILENO); /* Clear STDOUT buffer*/ Frames_signal = BREAK_sig; /* Flag to force full terminal recompute */ (void)dont_care_sig; /* Silence compiler warnings */ }

static void sig_paused (int dont_care_sig) { /* Restore Original TTY */ if (-1 == tcsetattr(STDIN_FILENO, TCSAFLUSH, &Tty_original)) error_exit(fmtmk(N_fmt(FAIL_tty_set_fmt), strerror(errno))); if (keypad_local) putp(keypad_local); putp(tg2(0, Screen_rows)); /* Move to bottom left */ putp(Cap_curs_norm); /* Normalize cursor format */ #ifndef RMAN_IGNORED putp(Cap_smam); /* Enable right margin */ #endif fflush(stdout); /* Push STDOUT buffer */ raise(SIGSTOP); /* Block here until SIGCONT is received */ /* Restore top's expected TTY configuration */ if (-1 == tcsetattr(STDIN_FILENO, TCSAFLUSH, &Tty_raw)) error_exit(fmtmk(N_fmt(FAIL_tty_set_fmt), strerror(errno))); #ifndef RMAN_IGNORED putp(Cap_rmam); /* Disable right margin */ #endif if (keypad_xmit) putp(keypad_xmit); putp(Cursor_state); /* Restore cursor settings */ Frames_signal = BREAK_sig; /* Flag to force full terminal recompute */ (void)dont_care_sig; /* Silence compiler warnings */ }

Part 3 - Data Structures

top relies on several structures to organize and cache data needed during computation. I'll separate the data structure discussion in to two groups: Those defined exclusively within top, and those included from external sources (i.e. C/POSIX standard structures). I'll focus more on the top-specific structures since they are unique, but the external standard structures are generally more useful to understand.

top-specific structures

These are the data structures defined and used exclusively within top

Fields [FLD_t]

A field structure represents a single column of information in the task list. All possible fields are defined as a global array and accessed by the task window.

Three of five members deal directly with data display:

width - The character count of the field. -1 is variable, usually text. 0 is computed at runtime.

- The character count of the field. -1 is variable, usually text. 0 is computed at runtime. scale - Represents the magnitude of the number (Kb, Mb, Gb, etc)

- Represents the magnitude of the number (Kb, Mb, Gb, etc) align - Visual alignment, left or right

The other two members have specialized usage:

sort - The sort function used to order by this fields. (See part 6)

- The sort function used to order by this fields. (See part 6) lflag - The /proc source (usually stat, status, statm, or specialized)

Task Window [WIN_t]

The task windows are the most complex custom structure in top, performing many roles from holding configurations, referencing accessing many globals, tracking screen dimensions, and caching process data. top defines four Windows as globals.

Important members to know:

**ppt - The proc_t array used by the window

- The proc_t array used by the window pflgsall[] - Tracks fields being displayed in the window

- Tracks fields being displayed in the window procflgs[] - Holds behavior flags for the active fields

- Holds behavior flags for the active fields rc - Window configuration specifics (see RCW_t below)

Window view management variables:

begnext - The offset to apply to task list on next update

- The offset to apply to task list on next update begpflg - The first visible point in pflgsall[]

- The first visible point in pflgsall[] begtask - The first task visible in ppt[]

- The first task visible in ppt[] endpflg - The visible ending point in pflgsall[]

- The visible ending point in pflgsall[] usrselflg - Flag for include or exclude (e)uid matches

- Flag for include or exclude (e)uid matches usrseltyp - The type of user to match (uid or euid)

- The type of user to match (uid or euid) usrseluid - The chosen user id to match

- The chosen user id to match varcolsz - The cap on variable width columns

- The cap on variable width columns winlines - The number of lines for the window

Others members:

cap* - Terminal capabilities for this window

- Terminal capabilities for this window grpname - A string representing window number:name

- A string representing window number:name *next - Pointer to the next window

- Pointer to the next window *prev - Pointer to the last window

Window Configuration [RCW_t]

Each window has an associated configuration structure (rc) built from both user settings and an external file. This structure is critical for tracking 'live' fields in the current window.

The three most important members to know are:

fieldscur[] - The ordered array of active fields in the window

- The ordered array of active fields in the window sortinx - The field enum within Fieldstab[] to sort on

- The field enum within Fieldstab[] to sort on winflags - The window modes including view/sort details

Other members include:

graph_cpus - The alternate cpu display values for 't'

- The alternate cpu display values for 't' graph_mems - The alternate memory display values for 'm'

- The alternate memory display values for 'm' headclr - Header color

- Header color maxtasks - The user-provided maximum tasks to

- The user-provided maximum tasks to msgsclr - Message color

- Message color summclr - Summary color

- Summary color taskclr - Task color

- Task color winname[] - The user-provided window name

General Configuration [RCF_t]

This structure supports the external configuration file and data not used directly by a window.

The main members include:

id - The rc file version

- The rc file version win_index - The index to the global Winstk for the current window

- The index to the global Winstk for the current window win - Local references to all window config data

The remaining members track settings:

delay_time - Desired delay time between updates, 'd'

- Desired delay time between updates, 'd' fixed_widest - Extra space for fields, 'X'

- Extra space for fields, 'X' mode_altscr - Alternate display mode, 'A'

- Alternate display mode, 'A' mode_irixps - CPU display mode, 'I'

- CPU display mode, 'I' summ_mscale - Summary memory scaling 'I'

- Summary memory scaling 'I' task_mscale - Process memory scaling 'e'

- Process memory scaling 'e' zero_suppress - Suppress zeroes for scaled values '0'

History [HST_t]

top tracks a useful varaibles between frames, notably CPU ticks and fault counts. The members are:

lnk - Integer index to the next HST in the chain

- Integer index to the next HST in the chain maj - Major fault count

- Major fault count min - Minor fault count

- Minor fault count pid - The associated PID

- The associated PID tics - The tick count for this entry

CPU History [CT_t, CPU_t]

We need to keep track of some CPU data, especially the changes between frames. These two structures work together manage updates:

The CPU_t structure, one per CPU

cur - The current frame's tick data (CT_t)

- The current frame's tick data (CT_t) id - The CPU number

- The CPU number node - The numa node of the CPU

- The numa node of the CPU sav - The previous frame's tick data (CT_t)

The CT_t (CPU tick tracker), two per CPU (this update and the last update). These data are provided by /proc/stat and correspond to tick counts in the following modes:

i - Idle time

- Idle time n - Nice-invoked process time

- Nice-invoked process time s - Kernel time

- Kernel time u - Userland time

- Userland time w - I/O blocked time

- I/O blocked time x - HW interrupt time

- HW interrupt time y - SW interrupt time

- SW interrupt time z - Virtual steal time (hypervisor block time)

Other structures included in top

These structures are used in top but defined elsewhere. These are useful to know since they're used throughout the OS/Kernel.

pid_t - Required by POSIX, thus typedef'd in the kernel as int

- Required by POSIX, thus typedef'd in the kernel as int proc_t - Defined within procps to hold all data about a process

- Defined within procps to hold all data about a process PROCTAB - Defined by procps to track persistent /proc information

- Defined by procps to track persistent /proc information sigset_t - Required by POSIX, and defined in the kernel as an architecture-specific type array

- Required by POSIX, and defined in the kernel as an architecture-specific type array termios - The POSIX definition of the terminal, specifically the I/O and control modes.

Part 4 - Execution

Execution appears simple at a high level. top begins with basic initialization, including reading configurations from files and command line flags, then setting up the terminal. The main loop continuously pulls input from /proc, formatting it, and sending results to the terminal. The program will only exit if the user pressed 'q', the user configured a fixed number of updates, or the operating system sends a disruptive signal. Let's dig into initialization, the main loop, and the exit procedure.

Let's break down each of these steps.

Initialization

int main (int dont_care_argc, char **argv) { (void)dont_care_argc; before(*argv); wins_stage_1(); configs_reads(); parse_args(&argv[1]); whack_terminal(); wins_stage_2(); ...to main loop...

(void)dont_care_argc;

Suppresses two compiler warnings: Unused argument (by using it) and statement with no effect (by void cast).

before(*argv);

Performs important preparatory work such as verifying /proc mount, counting CPUs, and rewiring signal handlers.

wins_stage_1();

Basic window preparation. Sets up the capabilities table, sets up a logical window ring, and sets the current window to the first.

configs_reads();

Processes external configuration files (rc files)

parse_args(&argv[1]);

Handles command line arguments

whack_terminal();

Clobbers the terminal in preparation for displaying the top utility. Saves old settings, adjusts flags, and sets buffering

wins_stage_2();

Finishes window configuration by defining terminal capabilities (global and window) and initially defining fields

The Big Loop

top will churn through the main loop until either the user or the operating system forces an exit. I'll focus on the three operations pictured above and highlighted in the lines below.

...in main(), after initialization... for (;;) { struct timespec ts; frame_make(); /* Builds and displays the next update */ if (0 < Loops) --Loops; /* Controls the user-specified */ if (!Loops) bye_bye(NULL); /* automatic exit counter (-n) */ ts.tv_sec = Rc.delay_time; /* Set up delay time between frames */ ts.tv_nsec = (Rc.delay_time - (int)Rc.delay_time) * 1000000000; if (Batch) /* Batch mode operation (-b) */ pselect(0, NULL, NULL, NULL, &ts, NULL); else { if (ioa(&ts)) /* Wait - pselect() within ioa() */ do_key(iokey(1)) /* Handles keyboard input */; } }

Let's dig in to the highlighted operations: Building a single output frame, waiting for the next update, and processing input.

Building a Frame

An output frame requires preperatory computations. In the interest of optimization, the actual computation sequence depends on if we need a full update (such as recomputing screen dimensions), or only a limited update if most elements are unchanged.

At a minimum, preparing a frame will always:

Read system information via sysinfo_refresh() . This includes the CPU counts from /proc/stat and memory information from /proc/meminfo. These data changed on each frame and are always referenced.

. This includes the CPU counts from /proc/stat and memory information from /proc/meminfo. These data changed on each frame and are always referenced. Refresh process information via procs_refresh() . Updates relevant pointers in windows.

Sometimes we need full updates such as when this is the first frame displayed after running top or when the user resizes the terminal window. Full updates include the previous operations, plus:

Recalculating the field tables for each window in zap_fieldstab() .

. Rechecking the CPU counts via cpus_refresh() .

Displaying data on the terminal

We draw the screen one line at a time, from top to bottom. Functionally, there are three distinct sections:

Summary - The summary area occupies the first several lines and the procedure is summary_show()

- The summary area occupies the first several lines and the procedure is Message - A single line after the summary, which is simply cleared using a termcap. Actually messages are added interactively.

- A single line after the summary, which is simply cleared using a termcap. Actually messages are added interactively. Task Window(s) - The list of tasks and associated fields. There may be up to four windows, but the default is one. The procedure is window_show()

Let's trace the retrieval, construction, formatting, and output of a single line of data through top. I'll try to distill the bottom line and show a single example that can extend to all screen elements. The anti-climactic punchline is:

At the lowest level, we read() from a /proc source, format data to a buffer, and putp() the buffer to the terminal with help from curses.

If you blink, you make miss this simple process so let's look at how this happens with load average in the summary display.

The load average appears at the top of the screen, so we expect the action to be near the beginning of the procedure. Here are the first 15 lines of summary_show() :

static void summary_show (void) { #define isROOM(f,n) (CHKw(w, f) && Msg_row + (n) < Screen_rows - 1) #define anyFLG 0xffffff WIN_t *w = Curwin; char tmp[MEDBUFSIZ]; int i; /* Display Uptime and Loadavg */ if (isROOM(View_LOADAV, 1)) { if (!Rc.mode_altscr) show_special(0, fmtmk(LOADAV_line, Myname, sprint_uptime(0))); else show_special(0, fmtmk(CHKw(w, Show_TASKON)? LOADAV_line_alt : LOADAV_line , w->grpname, sprint_uptime(0))); Msg_row += 1; } ...remaining summary_show() procedure...

The critical line includes the show_special() function that uses two layers of argument evaluations. This line encapsulates the entire process of reading, formatting, and writing. The call stack (including macros) for the I/O process looks like this:

Reading - Load average is pulled from /proc/loadavg by sprint_uptime() ultimately using the read() syscall. sprint_uptime() - top.c loadavg() - sysinfo.c FILE_TO_BUF() - sysinfo.c (macro) read() - Standard library syscall wrapper

ultimately using the syscall. Formatting - The raw result is formatted within fmtmk() to include special terminal characters and returned in a buffer. fmtmk() - top.c vnsprintf() - Standard library

to include special terminal characters and returned in a buffer. Writing - show_special() pushes the formatted result to the terminal using a macro and ultimately putp() show_special() - top.c PUFF() - top.h (macro) putp() - curses

pushes the formatted result to the terminal using a macro and ultimately

This is the call stack (plus function-like macros) for reading and writing the load average

Waiting (Blocking)

After we've built a single frame, the next task in the main loop is to pause operation until it's time for a new frame update OR the user presses a key. The pselect() syscall blocks until either a time delay passes or user input occurs. Let's look at the prototype, and then the usage:

int pselect(int nfds, /* Descriptor count to check */ fd_set *restrict readfds, /* Reading check descriptor set */ fd_set *restrict writefds, /* Writing check descriptor set */ fd_set *restrict errorfds, /* Err condition descriptor set */ timespec *restrict timeout, /* Delay timeout */ sigset_t *restrict sigmask); /* Signal mask to apply */

Under normal (non-batch) operation, the pselect() syscall is invoked within the ioa() function as shown here:

static inline int ioa (struct timespec *ts) { fd_set fs; int rc; FD_ZERO(&fs); /* Clears target file descriptors */ FD_SET(STDIN_FILENO, &fs); /* Adds STDIN to the target set */ rc = pselect(STDIN_FILENO + 1, &fs, NULL, NULL, ts, &Sigwinch_set); if (rc < 0) rc = 0; return rc; } // end: ioa

The first argument to pselect() is the number of file descriptors to monitor, since STDIN_FILENO is always 0 (by POSIX), this number amounts to 1. We pass STDIN as the descriptor to read, and we also pass the delay time as the computed frame delay from the main loop.

We ignore SIGWINCH since that case is handled automatically by the signal handler and processed during the next scheduled frame update

Input Handling

Any keyboard input unblocks pselect() and do_key() processes the keystroke. This handler is really the first of several possible layers of switched handlers which are organized to group common functions.

static void do_key (int ch) { static struct { void (*func)(int ch); /* Handler to execute */ char keys[SMLBUFSIZ]; /* Keys to associate to the handler */ } key_tab[] = { { keys_global, { '?', 'B', 'd', ... }}, /* Actual key handler map */ { keys_summary,{ '1', '2', '3', ... }}, { keys_task, { '#', '<', '>', ... }}, { keys_window, { '+', '-', '=', ... }}, { keys_xtra, { 'M', 'N', 'P', 'T', '\0' }} }; int i; switch (ch) { case 'q': /* Jump to exit routine on 'q' */ bye_bye(NULL); default: /* Check all keys/handlers */ for (i = 0; i < MAXTBL(key_tab); ++i) /* Loop through handlers */ if (strchr(key_tab[i].keys, ch)) { /* Check keys for a match */ key_tab[i].func(ch); /* Execute next key handler */ Frames_signal = BREAK_kbd; /* Force update on next frame */ goto all_done; /* Key matched, jump out of loop */ } }; ...content cut... }

The next layer of keyboard handlers switch on the handful of matching keys, which eventually execute the proper procedure. Handlers typically call a function, update a global variable, or alter one of the data structures.

Exiting

All graceful exits call the bye_bye() procedure, which is annotated below. The most important idea is that we restore the original shell terminal before exiting.

static void bye_bye (const char *str) { at_eoj(); /* End of Job -- restore shell TTY */ #ifdef ATEOJ_RPTSTD ...debug reports cut.. #endif numa_uninit(); /* (libproc) */ if (str) { /* Error handling */ fputs(str, stderr); /* Error reporting */ exit(EXIT_FAILURE); /* Error exit */ } if (Batch) fputs("

", stdout); /* Batch mode output */ exit(EXIT_SUCCESS); /* Standard exit */ }

Part 5 - Optimizations

Reading and writing is easy. Reading and writing only what you need, when you need it, with few instructions and little memory is not. top works hard to keep information flowing under adverse conditions (think a million processes, filtered, and sorted across four windows ever second).

top applies many common sense strategies, for instance:

Caching - Data structures allow us to retain data and computations between updates. For instance, top computes and retains window/field dimenions potentially indefinitely.

- Data structures allow us to retain data and computations between updates. For instance, top computes and retains window/field dimenions potentially indefinitely. Separate/Limit Operations - Identify data that must be read every frame and grab it all at once, such as the process table. Don't read any data that won't be displayed, such as tasks data beyond the window size.

- Identify data that must be read every frame and grab it all at once, such as the process table. Don't read any data that won't be displayed, such as tasks data beyond the window size. Pointers, not data - Confine updates to pointers and avoid directly manipulating data. Task windows do this with the process tables and field metadata

- Confine updates to pointers and avoid directly manipulating data. Task windows do this with the process tables and field metadata Small buffers - Large buffers lead to more latency. top reduces the default STDOUT buffer to 2kb at initialization

I'm not going to dig in to those well-known optimizations. Instead, let's focus on top's "Pseudo screen" and companion macros. The idea is related to back-buffering in a graphics pipeline, but this implementation seems unique to top and contributes to smooth operation.

top uses a global buffer called Pseudo_screen to hold the logical contents of the current frame that will be sent to the terminal. The benefits may not be obvious, but consider this:

We can analyze and scribble across a buffer at any time. This means we can make smart decisions about how to handle these data.

Terminals are write-only. Data sent is now on display and making changes means sending more data. It's not cheap (or possible in some cases) to know the state of the terminal.

In short, we can optimize data sent to the terminal via a Pseudo_screen. However, we retain the ability to write directly to the terminal, such as for direct user interactions.

Most interaction with the terminal and Pseudo_screen is done through three macros, depending on the use case:

POOF - Writes directly to the terminal and advances pseudo screen state

- Writes directly to the terminal and advances pseudo screen state PUFF - Writes text data to the pseudo screen which may be optimized before writing to the terminal

- Writes text data to the pseudo screen which may be optimized before writing to the terminal PUTT - Writes text or capability data directly to the terminal, ignoring pseudo screen. Used for interactives (ex. message row)

Let's look at each:

#define POOF(str,cap) do { \ putp(str); putp(cap); \ /* Direct to tty */ Pseudo_screen[Pseudo_row * ROWMAXSIZ] = '\0'; \ /* Null-term row */ if (Pseudo_row + 1 < Screen_rows) ++Pseudo_row; \ /* Advance position */ } while (0)

#define PUFF(fmt,arg...) do { \ char _str[ROWMAXSIZ]; \ const int _len = snprintf(_str, sizeof(_str), fmt, ## arg); \ if (Batch) { \ char *_eol = _str + (_len < 0 ? 0 : (size_t)_len >= sizeof(_str) ? sizeof(_str)-1 : (size_t)_len); \ while (_eol > _str && _eol[-1] == ' ') _eol--; *_eol = '\0'; putp(_str); } \ else if (Pseudo_row >= 0 && Pseudo_row < Screen_rows) { \ /* Visible? */ char *_ptr = &Pseudo_screen[Pseudo_row++ * ROWMAXSIZ]; \ if (!strcmp(_ptr, _str)) putp("

"); \ /* No work to do? */ else { \ strcpy(_ptr, _str); \ /* Write to frame */ putp(_ptr); } } \ /* Write to tty */ } while (0)

#define PUTT(fmt,arg...) do { \ char _str[ROWMAXSIZ]; \ snprintf(_str, sizeof(_str), fmt, ## arg); \ /* Format to buffer */ putp(_str); \ /* Write buffer to tty */ } while (0)

Part 6 - Namespace

Getting comfortable with the namespace is an early hurdle when diving in to any new utility. To reduce that chore, I'll break down the various symbols you'll see throughout top.[ch]. This section is really a reference rather than a detailed explanation.

Global variables

Roughly the first 150 lines of code declare and define globals used in many of the functions we'll see below. The most important globals to know are:

Batch - Flag for batch mode when redirecting output to a file. I'm assuming this is off

- Flag for batch mode when redirecting output to a file. I'm assuming this is off Cap_* - Terminal capabilities for controlling screen and cursor

- Terminal capabilities for controlling screen and cursor Curwin - Pointer to the window currently being processed

- Pointer to the window currently being processed Fieldstab[] - The global array describing the possible task list fields

- The global array describing the possible task list fields Frames_signal - The global refresh state flag used to alter frame updates (set in signal handlers)

- The global refresh state flag used to alter frame updates (set in signal handlers) Loops - Limits the number of top updates to this number, assumed -1 for forever

- Limits the number of top updates to this number, assumed -1 for forever Msg_row - The row number of the message row (row count of summary section + 1)

- The row number of the message row (row count of summary section + 1) Pseudo_* - A buffer and descriptors for optimized screen control

- A buffer and descriptors for optimized screen control Stdout_buf - The actual buffer space for the terminal. Not directly used, but know that it's smaller than default

- The actual buffer space for the terminal. Not directly used, but know that it's smaller than default Tty_original - The terminal configuration prior to top execution

- The terminal configuration prior to top execution Ttychanged - Flag indicating that top has altered terminal flags

- Flag indicating that top has altered terminal flags Winstk[] - All of the possible windows to display (default number is 4)

Functions

Here are all the functions defined by top and grouped by purpose with groups ordered by importance. More comprehensive groups are at the top and the boilerplate work is at the bottom:

Screen/Frame

frame_hlp() - Computes the sizes of multiple task windows on screen

- Computes the sizes of multiple task windows on screen frame_make() - Procedure to collate and construct a single frame update

- Procedure to collate and construct a single frame update help_view() - Display help procedure

- Display help procedure summary_hlp() - Prints various CPU% usages in the summary

- Prints various CPU% usages in the summary summary_show() - Procedure to collate and construct the current summary area

- Procedure to collate and construct the current summary area task_show() - Constructs a single line of the task list

- Constructs a single line of the task list window_hlp() - Verifies that the window being processed is visible

- Verifies that the window being processed is visible window_show() - Displays the entire contents of a single window/task list

Initialization

before() - top initialization procedure (signals, memory, etc)

- top initialization procedure (signals, memory, etc) config_cvt() - Configuration file converter

- Configuration file converter config_insp() - Validates configuration file inspection entries

- Validates configuration file inspection entries config_osel() - Validates configuration file other filters

- Validates configuration file other filters configs_file() - Top-level configuration file validation procedure

- Top-level configuration file validation procedure configs_path() - Verifies a configuration file path

- Verifies a configuration file path configs_reads() - Reads in up to three configuration files

- Reads in up to three configuration files parse_args() - Processes user-provided command-line flags

- Processes user-provided command-line flags whack_terminal() - Reconfigurations terminal output for top

- Reconfigurations terminal output for top wins_stage_1() - Early-stage window configuration settings

- Early-stage window configuration settings wins_stage_2() - Late-stage window configuration settings (after config file processings)

Exit / Interrupt:

at_eoj() - Restores the original terminal state prior to top

- Restores the original terminal state prior to top bye_bye() - Application exit procedure. Restores TTY and returns exit status

- Application exit procedure. Restores TTY and returns exit status error_exit() - Formats exit error message

- Formats exit error message sig_abexit() - Signal handler for abnormal exit

- Signal handler for abnormal exit sig_endpgm() - Signal handler for standard exit

- Signal handler for standard exit sig_paused() - Signal handler for pausing execution

- Signal handler for pausing execution sig_resize() - Signal handler force terminal resize

- Signal handler force terminal resize xalloc_our_handler() - Custom error handler for xalloc (part of libproc)

Library Interfaces

cpus_refresh() - Refresh CPU timing information

- Refresh CPU timing information hstbsrch() - History binary search

- History binary search hstget() - Gets a history record

- Gets a history record hstput() - Adds a history record

- Adds a history record procs_hlp() - Calculates proc deltas between frames

- Calculates proc deltas between frames procs_refresh() - Walks the kernel proccess table and associates to windows

- Walks the kernel proccess table and associates to windows sysinfo_refresh() - Updates memory and CPU info

Fields Management

adj_geometry() - Recalculates screen dimension and associates

- Recalculates screen dimension and associates build_headers() - Recalculate column headers

- Recalculate column headers calibrate_fields() - Recalculate column/field headers/flags

- Recalculate column/field headers/flags display_fields() - Show the viewable fields including active markers

- Show the viewable fields including active markers fields_utility() - Perform the field management procedure ('[fF]')

- Perform the field management procedure ('[fF]') widths_resize() - Revisit columns with auto width calculations

- Revisit columns with auto width calculations zap_fieldstab() - Main procedure to recompute task list fields

Color and Display

capsmk() - Generates a color capability string

- Generates a color capability string show_msg() - Displays a message on the interactive message row

- Displays a message on the interactive message row show_pmt() - Displays a prompt for user input

- Displays a prompt for user input show_scroll() - Displays scroll coordinates

- Displays scroll coordinates show_special() - Displays a text buffer that includes special capabilities

Input

do_key() - Primary keyboard input handler for main()

- Primary keyboard input handler for main() find_ofs() - Locates a sub-string within a buffer

- Locates a sub-string within a buffer find_string() - Collects user-specfied input for string search

- Collects user-specfied input for string search keys_global() - Secondary keyboard input handler for global keys

- Secondary keyboard input handler for global keys keys_summary() - Secondary keyboard input handler for summary area keys

- Secondary keyboard input handler for summary area keys keys_window() - Secondary keyboard input handler for task list keys

- Secondary keyboard input handler for task list keys keys_xtra() - Secondary keyboard input handler for miscellaneous keys

- Secondary keyboard input handler for miscellaneous keys other_filters() - Collects a user-specified filter ('[oO]')

- Collects a user-specified filter ('[oO]') write_rcfile() - Procedure to export the current top configuration ('W')

Multi-byte character support

utf8_cols() - Computes the count of displayed characters

- Computes the count of displayed characters utf8_delta() - Returns difference between actual and visible character count

- Returns difference between actual and visible character count utf8_embody() - Finds the byte-level separation point for a visible line

- Finds the byte-level separation point for a visible line utf8_justify() - Multi-byte justification function

- Multi-byte justification function utf8_proper_col() - Finds the resulting column number given an input and type

Formatting

justify_pad() - Add padding after a colum

- Add padding after a colum make_chr() - Justifies a single character

- Justifies a single character make_num() - Justifies an integer

- Justifies an integer make_str() - Justify a string

- Justify a string make_str_utf8() - Justify a string with multi-byte characters

- Justify a string with multi-byte characters scale_mem() - Changes memory size basis and re-format

- Changes memory size basis and re-format scale_num() - Changes a number basis and re-format

- Changes a number basis and re-format scale_pcnt() - Justifies a percent value

- Justifies a percent value scale_tics() - Justifies a time-tick value

Windows

wins_clrhlp() - Save/restore terminal color settings

- Save/restore terminal color settings wins_colors() - Changes temrinal colors

- Changes temrinal colors win_names() - Sets the window names from configuration

- Sets the window names from configuration wins_reflag() - Changes window flags

- Changes window flags wins_reset() - Resets window to basic display (no monitoring)

- Resets window to basic display (no monitoring) wins_select() - Changes the active window

- Changes the active window wins_usrselect() - Matches a task to

- Matches a task to wins_warn() - Warns the user of an invalid command

Forest View

forest_adds() - Adds a node and adjusts the tree accordingly

- Adds a node and adjusts the tree accordingly forest_based() - Sorts the tree by start time

- Sorts the tree by start time forest_create() - Procedure to construct the tree

- Procedure to construct the tree forest_display() - Adjusts the task cmdline for tree view

Inspection

insp_cnt_nl() - Counts elements in the inspection buffer

- Counts elements in the inspection buffer insp_do_demo() - Executes the inspection and fills the buffer

- Executes the inspection and fills the buffer insp_do_file() - Accesses the inspect entry for file types

- Accesses the inspect entry for file types insp_do_pipe() - Accesses the inspect entry for pipe types

- Accesses the inspect entry for pipe types insp_find_ofs() - Searches inspection entry for a string offset

- Searches inspection entry for a string offset insp_find_str() - Implements string search for an inspection entry

- Implements string search for an inspection entry insp_mkrow_raw() - Builds an output row for an inspection result (highlights unprintables)

- Builds an output row for an inspection result (highlights unprintables) insp_mkrow_utf8() - Builds an output row for an inspection result

- Builds an output row for an inspection result insp_show_pgs() - Displays a single page of inspection resutls

- Displays a single page of inspection resutls insp_view_choice() - Displays inspection results and handles user controls

- Displays inspection results and handles user controls inspection_utility() - The main inspection procedure

Filtering

osel_add() - Processes a new 'other' filter (task list filter)

- Processes a new 'other' filter (task list filter) osel_clear() - Removes an 'other' filter

- Removes an 'other' filter osel_matched() - Checks if the filter matches any task in the window

Miscellaneous:

alloc_c() - Custom calloc() function with safe error handler

- Custom calloc() function with safe error handler alloc_r() - Custom realloc() function with safe error handler

- Custom realloc() function with safe error handler alloc_s() - Custom string allocator

- Custom string allocator fmtmk() - Generic format helper function invoking vsnprintf()

- Generic format helper function invoking vsnprintf() get_float() - Prompt user to input a float

- Prompt user to input a float get_int() - Prompt user to input an integer

- Prompt user to input an integer hex_make() - Convert an input value to hex

- Convert an input value to hex ioa() - Blocks execution loop for either keyboard input or elapsed time

- Blocks execution loop for either keyboard input or elapsed time ioch() - Reads user input directly from STDIN

- Reads user input directly from STDIN iokey() - Processes user input and escaped values

- Processes user input and escaped values ioline() - Gets user input from an interactive line

- Gets user input from an interactive line mkfloat() - Converts an input number value to a float representation

- Converts an input number value to a float representation readfile() - Reads a file of unknown size

- Reads a file of unknown size scat() - Cheap and fast strcat()

- Cheap and fast strcat() tg2() - Moves the cursor to specified x,y position

- Moves the cursor to specified x,y position user_certify() - Verifies that the input string is a user name or ID

Sorting Callbacks

top uses ~56 callback function-like macros to sort on the possible task list fields. I won't bother to list them here since they only appear once in the code, and never in expanded form. However, I will unroll the macro that defines the functions and demonstrate with one field. Then I'll show another macro that integrates sort functions to the global fieldstab[]. Let's consider sorting on %CPU.

Defining sort callbacks

Consider this line from early in the top.c source:

SCB_NUM1(CPU, pcpu)

Bounce this statement off these pre-processor macros from top.h:

#define SCB_NUM1(f,n) \ static int SCB_NAME(f) (const proc_t **P, const proc_t **Q) { \ if ( (*P)->n < (*Q)->n ) return SORT_lt; \ if ( (*P)->n > (*Q)->n ) return SORT_gt; \ return SORT_eq; } #define SCB_NAME(f) sort_EU_ ## f #define SORT_lt ( Frame_srtflg > 0 ? 1 : -1 ) #define SORT_gt ( Frame_srtflg > 0 ? -1 : 1 ) #define SORT_eq 0

The pre-processor transforms the original single line statement:

static int sort_EU_CPU (const proc_t **P, const proc_t **Q) { if ( (*P)->pcpu < (*Q)->pcpu ) return ( Frame_srtflg > 0 ? 1 : -1 ); if ( (*P)->pcpu > (*Q)->pcpu ) return ( Frame_srtflg > 0 ? -1 : 1 ); return 0; }

Since these functions are ultimately passed to qsort() , we expect the typical return values of -1, 0, or 1. Frame_srtflg is our global flag that specifies ascending or descending order.

Connecting callbacks to fields

Now we associate the callback to the fields via the global Fieldstab[]. Consider this definition line from top.c:

.width .scale .align .sort .lflg ------ ------ -------- -------- -------- */ ... { 5, -1, A_right, SF(CPU), L_stat }, ...

This is the FLD_t entry for CPU. Note the sort member function macro. Run this through these macros:

#define SF(f) (QFP_t)SCB_NAME(f) #define SCB_NAME(f) sort_EU_ ## f

The preprocessor revises the entry to:

.width .scale .align .sort .lflg ------ ------ -------- -------- -------- */ ... { 5, -1, A_right, (QFP_t)sort_EU_CPU, L_stat }, ...

Finally, the QFP_t type is a function signature typedef'd to int in top.h. Compare the prototype to the generated callback signature:

/* The QFP_t type */ typedef int (*QFP_t)(const void *, const void *); /* Our new matching callback */ static int sort_EU_CPU (const proc_t **P, const proc_t **Q)

When we're ready to order the task list, we pass this function to qsort() :

qsort(q->ppt, Frame_maxtask, sizeof(proc_t *), Fieldstab[q->rc.sortindx].sort);

Terminal Capabilities

On my CentOS system, I've mapped my terminal info (xterm) from infocmp to the POSIX-defined terminal capabilities as used within top. If you're feeling adventurous, compare these xterm sequences to an actual VT100/VT102 terminal (here)

top symbol POSIX Capname xterm Purpose Cap_clr_eol el \E[K Clear remaining line Cap_clr_eos ed \E[J Clear remaining display Cap_nl_clreos (composite)

\E[J Clear and new line Cap_clr_scr clear \E[H\E[2J Clear screen and home cursor Cap_curs_norm cnorm \E[?12l\E[?25h Normal cursor Cap_curs_huge cvvis \E[?12;25h Prominent cursor Cap_curs_hide civis \E[?25l Hide cursor Cap_home home \E[H Home cursor (upper left) Cap_norm sgr0 \E(B\E[m Turf off attributes Cap_reverse rev \E[7m Reverse video cursor Caps_off op \E[39;49m Set default color Caps_endline (composite) \E[39;49m\E[K Default color & new line Cap_rmam rmam \E[?7l No auto margins Cap_smam smam \E[?7h Auto margins on

Note that \E in xterm sequences represent the 'escape' character (^[), commonly shown in octal as \33

FAQ

What the most important topics not covered?

There are a few things worth further research and discussion:

The interaction with the curses library

The readproc support subsystem of procps

Some features, such as custom configurations, filters, and inspect entries

What insights are there in the strace demonstration in part 1?

Quite a few things, especially if you're interested in the process at load time. Following an strace is beyond the scope of top, but nevertheless, I'll throw out a few things to follow.

The whole process ran for only the initial update and I pressed 'q' to exit, yet over 4439 syscalls were executed, producing that many lines of output. The first syscall was top launching with execvp , presumably just after the shell clones in to this new process. Beyond that:

The first ~200 lines show the CRT linked libraries loading

main() begins executing around line 342 and we see the signal handlers rewiring

The next 4000 lines show top opening /proc entries to build the initial frame.

The most fun appears in the last ~15 lines:

write(1, "\33[H\33[2J\33(B\33[mtop"..., 2048) = 2048 /* Summary write */ write(1, "K

\33(B\33[m 4 root"..., 1041) = 1041 /* Task window write */ pselect6(1, [0], ...) /* Block */ pselect6(1, [0], ...) read(0, "q", 127) = 1 /* I press 'q' */ ioctl(0, TCFLSH, TCIFLUSH) = 0 /* Flush input */ ioctl(0, SNDCTL_TMR_CONTINUE ...}) = 0 write(1, "\33[?1l\33>"..., 30) = 30 /* Move cursor to end and reset */ dup2(3, 2) = 2 close(3) = 0 munmap(0x7fd56592d000, 2144488) = 0 close(1) = 0 close(2) = 0 exit_group(0) = ? /* Exit success */ +++ exited with 0 +++

Note that writing the summary section pushes the whole 2k buffer, despite using only 5 lines. The message line and task window is more reasonable.

Beyond that, advanced users may be interested in pouring over the /proc access. For example, is it really necessary to open and read the full contents of /etc/passwd a dozen times in a single update?