Scientific Programming and Computer Architecture





Scientific and Engineering Computation

William Gropp and Ewing Lusk, editors; Janusz Kowalik, founding editor

A complete list of books published in the Scientific and Engineering Computation series appears at the back of this book.





Scientific Programming and Computer Architecture

Divakar Viswanath

The MIT Press

Cambridge, Massachusetts

London, England













To all my teachers, with thanks.









The website https://github.com/divakarvi/bk-spca has all the programs discussed in this book.

Preface

It is a common experience that minor changes to C/C++ programs can make a big difference to their speed. Although all programmers who opt for C/C++ do so at least partly, and much of the time mainly, because programs in these languages can be fast, writing fast programs in these languages is not so straightforward. Well-optimized programs in C/C++ can be even 10 or more times faster than programs that are not well optimized.

At the heart of this book is the following question: what makes computer programs fast or slow? Programming languages provide a level of abstraction that makes computers look simpler than they are. As soon as we ask this question about program speed, we have to get behind the abstractions and understand how a computer really works and how programming constructs map to different parts of the computer’s architecture. Although there is much that can be understood, the modern computer is such a complicated device that this basic question cannot be answered perfectly.

Writing fast programs is the major theme of this book, but it is not the only theme. The other theme is modularity of programs. Structuring programs so that their structure explains what they do is a valuable principle. Computer programs are organized into a series of functions to serve the purpose of that principle. Yet when computer programs become large, merely dividing a program into functions becomes highly inadequate. It becomes necessary to organize the computer program into distinct sources and the sources into a source tree. The entire source tree can be made available as a library. We pay heed to program structure throughout this book.

Most books on computer programming are written at the same level of abstraction as the programming language they utilize or explain. If we want to understand program speed, we have to understand the different parts of a computer, and such an approach is not feasible. It is inevitable that choices have to be made regarding the type of computer system that is studied.

The big choices in this book are to opt for the x86 line of computers backed by Intel and AMD corporations, and for the Linux operating system. Nearly 100% of the computers in use today as servers, desktops, or laptops are x86 based, and the x86 line has been dominant for more than 30 years. So a great deal is not lost. The choice of the operating system does not have such a great impact on program speed. We pick Linux because it is the preferred platform for scientific computing and because it is open source. Because it is open source, we can peer into its inner workings when necessary to understand program speed.

A computer program is mathematical logic in action. The picture of the computer that emerges from this book shows how layered, and therefore complex, that logic can be. There is of course the design of the program that we ourselves write. But that is only a small part of the overall design. There are other computer programs, the biggest of which is the operating system kernel (the Linux kernel for our purposes), which handle the program we write and make it run on the computer. These systems programs make the computer a more tractable device, hiding the complexity of many of its parts, such as the hard disk or the network interface. The hardware, which includes the processor architecture and memory system, is itself designed using complex logic similar in kind to the computer programs we ourselves write but vastly different in degree of complexity.

The conventional approach to high-performance computing revolves around general principles, such as Amdahl’s laws, weak and strong scaling, data and functional parallelism, SIMD/MIMD programming, load balancing, and blocking. The approach taken in this book is diametrically opposite. Our focus really is on understanding the computer as a whole, especially as viewed through programs. We dig into linkers, compilers, operating systems, and computer architecture to understand how the different parts of the computer interact with programs. The general principles are useful in getting a sense of the average program. However, in writing any particular program, the general principles are never so straightforward to apply. Understanding the parts of a computer can be far more nettlesome and also fascinating. Once that is done, the general principles, insofar as they are useful in actual programming, become self-evident.

The main principles of design that will concern us and that have an impact on program speed have not changed for decades. The approach we adopt is to begin with specific programs that are generally quite simple in what they do. We move up to general principles gradually using these specific programs as a vehicle. There are two advantages to this approach. First, it makes the discussion vivid, more organized, and far less reliant on rules of thumb that may appear arbitrary. Second, it lends useful context to the principles that inform the writing of well-optimized computer programs.

The context is essential. There are no general principles of the range, precision, and completeness of Schrödinger’s equation in this area. Such principles as there are do not go far beyond common sense. Take, for example, the idea of load balancing. It says that if a task is to be divided between equally capable workers, we are better off dividing the task equally, rather than burdening a single worker with an excessive share of the work. There can be absolutely no doubt that this principle has been known for millennia. Without the appropriate context, the principle is quite sterile. Indeed, to call it a principle seems an exaggeration.

Overview of chapters

Chapter 1↓ begins with a rapid review of C/C++. The reader is assumed to have an undergraduate knowledge of C/C++ programming. Our review emphasizes a few parts of these languages that students typically don’t learn in introductory classes. The C programming language is the most fundamental of all languages, to the extent that one can no longer speak of a computer without C. The C language is close to the machine and provides only a basic, although highly valuable, layer of abstraction. The C++ language is a colossal extension of C that includes many mechanisms for representing abstract concepts to bring the program closer to the problem domain. The idiom of C++ we use is close to C. However, we discuss more C++ than we actually need to dispel myths about C++ being slow.

make utility explained in this chapter is far more valuable than complex C++ syntax. There is no modular programming in C/C++ without make . Yet it is a neglected topic. Libraries and makefiles are the basis of modular programming in C/C++. In chapter 2↓ , we explain how libraries and linkers work. Although we do not recommend Fortran, it is not uncommon to have to use old Fortran 77 programs in scientific programming. This chapter explains how to call Fortran programs from C/C++. Knowledge of the GNUutility explained in this chapter is far more valuable than complex C++ syntax. There is no modular programming in C/C++ without. Yet it is a neglected topic.

A significant part of the answer to what makes programs fast or slow is contained in chapter 3↓ . Compilers convert C/C++ programs into a stream of machine instructions, native to the processor. A small subset of x86 assembly language is introduced and used to understand how structuring loops in different ways leads to different instruction streams.

Although part of chapter 3↓ is about optimizing for the processor pipeline, which has to be programmed in assembly language, in general we do not recommend programming in assembly. The small part of x86 assembly language introduced in that chapter is mainly used to understand how loop-nests map to machine instructions. There can be no rational account of program speed without a discussion of machine instructions. The discussion of assembly language is like a ladder that helps us understand how to structure loops and loop-nests, one of the most important tasks in optimizing a program. As in Wittgenstein’s famous metaphor, the ladder can be thrown away once the understanding is gained.

3 in 2010, whereas the speedup is much greater and nearly a factor of 15 in 2015. Yet the programming methodology is unchanged. Even skilled programmers are often unaware of registers. In chapter 3↓ , we not only talk about registers but also introduce techniques for optimizing for the instruction pipeline. This type of optimization can improve program speeds considerably---by as much as a factor of 3 to 15 in the example discussed in chapter 3↓ . However, the reader may wonder whether these techniques may become obsolete quickly because of rapid changes in hardware. The answer is an emphatic no. Although hardware changes rapidly, the design principles do not change so fast. In addition, the value of optimizing for the instruction pipeline has increased rapidly over time. Such optimization implied a speedup by a factor ofin 2010, whereas the speedup is much greater and nearly a factor ofin 2015. Yet the programming methodology is unchanged.

Although optimizing for the instruction pipeline is a very difficult skill, it can be a valuable one. It can speed up C/C++ programs by a factor of 10 , or even a factor of 100 if the C/C++ coding is naive to begin with. As an argument for the value of instruction pipeline optimizations, we mention that although discussion of new algorithms overwhelmingly dominates the scientific literature, algorithms that speed up any nontrivial computation by a factor of 10 or even a factor of 2 are rare.

Chapters 4↓ and 5↓ have wider applicability than chapter 3↓ . The sort of loop optimizations discussed in chapter 3↓ apply mainly when data is regular, as in image processing or the solution of differential equations. However, memory optimizations, discussed in chapter 4↓ , are useful in all kinds of programming. For simplicity, our examples use regular data, but the same principles apply even when dynamic data structures such as linked lists or graphs are used.

Much of program optimization is optimization of memory access, and many of the principles are the same for single- and multithreaded programs. Memory usually refers to Dynamic Random Access Memory (DRAM), which can be 10s, or even 100s, of gigabytes in extent. Faster and smaller caches are maintained to speed access to DRAM by nearly a factor of 100 . In addition, memory is organized into pages to give an independent view of memory to disparate processes. Memory accesses may be optimized for both caching and the paging system.

Perhaps an even more important point that arises in chapter 4↓ , as a natural consequence of the discussion of instruction level parallelism in chapter 3↓ , is the role of parallelism in the memory system. Programs that allow the processor to parallelize multiple memory accesses will be much faster. Thus, for example, there can be a huge difference in speed between a program that uses linked lists, which disrupt parallelism, and another program that accomplishes the same task by accessing an array in sequence.

The clock speed of processors stopped improving around 2005. Much of the growth in processing power since then is from putting more and more processor cores inside the same computer. All these processor cores share the same memory. Programming with threads, the topic of chapter 5↓ , derives its importance from this well-established trend. Many threaded programs can be written quite easily. Yet there are always subtleties under the surface whenever different threads or processes share memory.

Many of the subtleties of threaded programming are related to the memory system. Threaded programming is impossible without coherent caches, and any programmer who writes threaded programs in C/C++ without understanding as much will be befuddled sooner rather than later. Even the simplest threaded programs for adding a sequence of numbers, using OpenMP or Pthreads, rely on the memory model in ways that are often not appreciated.

In addition to the memory system, threaded programs interact with the operating system. The simple act of creating threads can involve an overhead that swamps the benefits of parallelizing. When is it advantageous to invoke multiple threads? Why is it not a good idea to change the number of threads between OpenMP parallel regions? Why should threaded programs avoid abusing the stack? These questions are answered in a rational manner once the role of the operating system is understood.

Much of a C/C++ programmer’s time is spent dealing with segmentation faults. The precise manner in which segmentation faults and other memory errors arise from inside the operating system is explained in chapter 5↓ . Chapter 5↓ concludes what may be seen as the core of this book. Topics covered up to this point are relevant to many kinds of programming, well beyond the scientific world.

The Top 500 list (see http://www.top500.org/ ), which uses a linear algebra problem to benchmark and then rank the fastest computers in the world, has provided powerful impetus to scientific computing. For more than two decades, the most powerful computers in the world have been clusters where many computers are tightly connected using a high-performance network. In such clusters, concurrent processes communicate over the network by sending and receiving messages. Message passing is the topic of chapter 6↓

An example of the outcome of our choice to focus on computer architecture instead of general principles may be found in this chapter. The general principle is to overlap processor and network activity. Few can contest the utility of such a principle or argue that there is anything in it beyond ordinary common sense. However, our discussion of the matrix transpose in chapter 6↓ shows that it requires deep knowledge of network architecture to in fact overlap processor and network activity, although the general principle is quite obvious. It is precisely that type of knowledge that gets lost when we do not look past general abstractions and examine details of systems software and computer architecture.

When a large group of computers is tightly coupled, message passing is the preferred paradigm. Because there is no truly credible challenge to message passing in the context of supercomputing, the largest physics computations are likely to continue to rely on message passing. However, a single computer today can tackle complex 3D problems with more than a billion grid points. The vast majority of scientific projects can be effectively implemented on a single node. When a single computer is so powerful, the additional difficulty of resorting to message passing between multiple computers, which can be considerable, becomes less attractive.

Market forces that are propelling the Internet are powerful, indeed amazingly powerful, and should not be ignored by the scientific programmer. The pertinence of the Internet is more obvious in newer areas such as genomics and data science than in classical physics. Chapter 6↓ includes a section about the Internet. As in other parts of the book, here too we look behind the abstractions to understand the plumbing of the Internet.

The final two chapters, chapters 7↓ and 8↓ , are about coprocessors. In 2007, the NVIDIA corporation showed how its graphics processors can be used to speed up a range of scientific problems. The excitement that resulted was undoubtedly justified. The graphics coprocessors hint at the possibilities for architectural design that may be available beyond the x86 line. Intel, which could not afford to ignore this competition, has introduced its own coprocessor. It must be said that excitement about coprocessors has not been matched by utilization. When the coprocessor competes with the processor to execute the same tasks, there is a major disruption of modularity at the hardware level. The resulting heterogeneity makes it difficult to break tasks into subtasks in a uniform manner. Heterogeneity overly complicates programming models, which are already quite complicated.

The brief appendix may be the right place to begin reading this book. The appendix begins with a table of machines used to time programs. The rest of the book makes frequent references to that table.

Although interpreted languages such as Python or Matlab are easier to use, the resulting programs will be much slower. How much slower does not seem to be widely understood, and the appendix dispels a few myths. This author has heard estimates of the slow-down factor ranging from a few percent to a factor of 2 to a better informed guess of a factor of 10 . In fact the interpreted languages can be several hundred times slower for even fairly simple programs that run on a single core. As the complexity of the program and the hardware platform increases, the slow-down penalty can get much worse. Even for moderately complex programs, the slow-down can be by a factor of 104 , as we note in the appendix. If the effort for mastering C/C++ is much greater, so is the reward.

cscope utility---essential for downloading and working with the code. Both GIT and cscope are of great value in programming in general. Even in the era of Internet search, cscope , which has been around since the 1970s, is an excellent option for browsing and searching source trees. The entire source code corresponding to this book, which runs to more than 15,000 lines of C/C++, is available at https://github.com/divakarvi/bk-spca . The appendix briefly introduces two tools---GIT and theutility---essential for downloading and working with the code. Both GIT andare of great value in programming in general. Even in the era of Internet search,, which has been around since the 1970s, is an excellent option for browsing and searching source trees.

icc/icpc compiler. However, except for chapter gcc/g++ compiler may be substituted with little trouble. The few nuances that arise are described in the appendix. The examples in this book rely on Intel’scompiler. However, except for chapter 7↓ , the widely used, easily available, and open sourcecompiler may be substituted with little trouble. The few nuances that arise are described in the appendix.

Acknowledgments

Most of all, I thank my undergraduate teachers at the Indian Institute of Technology, Bombay. From them I began to learn much of what is found in this book.

I taught a graduate class based on this book on four occasions at the University of Michigan. The class typically covered about a third or more of the material in the book, with greater emphasis on the beginning sections in each chapter. I thank all the hundred or so students who took that class. Thanks in particular to Zhongming Qu, who penetrated the material quite deeply. I am especially grateful to Zhongming for helping me understand Makefiles much better.

I was privileged to have access to superbly maintained systems at the University of Michigan. I thank Bennet Fauber, Brock Palen, Andy Caird, Neil Tweedy, Charles Antonelli, Reed Hoyer, and Rusty Dekema for their help. I am especially grateful to Seth Meyer for showing me how to build and load the Linux operating system. All who work with computers are aware of the peculiar difficulty of getting started. In the memorable words of Kernighan and Ritchie, it is a big hurdle, and once crossed “everything else is comparatively easy.” In addition to showing me how to get started with Linux, Seth also freely shared his deep knowledge of the Internet with me.

The last three chapters of this book were written using systems deployed at the Texas Advanced Computing Center (TACC), with access obtained through XSEDE. I am thankful to XSEDE as well as TACC for the wonderful support they offered. At TACC, I am especially grateful to Chris Hempel for being so accommodating. The technical help desk at TACC answered numerous questions with unfailing promptness and helped me in many ways. At the risk of omission, I thank Doug James, Lars Koesterke, Hang Liu, Si Liu, Robert McLay, Cyrus Proctor, and Frank Willmore.

I thank Paul Besl, Tim Prince, and Mike Tucker of Intel Corporation for being gracious and helpful when difficulties arose with the Westmere microarchitecture. I also thank Intel’s Russian team for resolving these technical difficulties and Tim Prince for his expert comments.

I thank my colleagues Danny Forger and Hans Johnston for offering advice and much needed support.

I wrote this book using LyX, relying on Inkscape to produce figures. The html version of the book was produced using eLyXer. I thank all the people responsible for these wonderful open source tools. Technical information accessed via the Internet was invaluable, and I have acknowledged this help wherever possible.

Finally, I am grateful to Dr. John Sarno for curing my chronic back pain through his books and to Dr. Howard Schubiner for a helpful consultation.

A computer program is a sequence of instructions to a machine. In this chapter and the next, we emphasize that it is a sequence of instructions that builds on other computer programs and that in turn can be built on. Codes that exist in isolation are often limited to quite trivial tasks and can hardly be considered computer programs.

This chapter is a review of C/C++. The C programming language is the most fundamental of all programming languages. Computing machines come in great variety and are put together using many parts. The computer’s parts, consisting of the processor at the center, and with memory, hard disk, network interfaces, graphics devices, and other peripherals connected to it, are very different from each other. It would be an almost impossible task for any single programmer to deliver instructions to such a complicated machine. The C programming language is a major part of the setup to give the programmer a uniform view of computing machines. No modern computing device can exist and be useful without C.

Much of the time when programs are written, the programmer is not at all aware of the many parts of the computer. Indeed, the programmer may not even be aware that there is a processor. It is more natural to think in terms of the abstractions of the programming language. This is in general a good thing because the purpose of programming languages is to set up abstractions that hide the complicated parts of the computer. In addition, programs written in this way can be moved from computer to computer easily.

In this book, our goal is to understand what makes programs fast or slow. As soon as we set ourselves this goal, we have no choice but to peer behind the abstractions of programming and understand how those abstractions are realized through the many parts that constitute a computer. The C programming language is the most natural vehicle in moving toward this goal. In fact, it is the only vehicle that is really appropriate.

The C programming language is close to the machine. There are high-level languages, a notable and outstanding example being Python, which bring programming much closer to the problem domain. Concepts and ideas intrinsic to the problem domain are expressed far more easily in these programming languages. Programs that would take days to write in C can be written in hours, or even minutes, in Python.

Although these high-level languages are much easier on the programmer, programs in these languages run slowly. As shown in the appendix, these high-level programs can be more than 100 times slower, or even worse. In fact, even C programs written without a knowledge of computer architecture can be several times slower than C programs written with that knowledge. The C++ language is a compromise. It strives to combine the speed of C with the abstraction facilities of higher level languages. It can be quite useful, although it lacks the simplicity and elegance of C. The C++ idiom we use is close to C. Some of the facilities of C++, mainly the facility of defining classes, are adopted to make C a little easier and a little more presentable.

The review of C/C++ in this chapter attempts to bring out certain features that people often do not learn from a single course or two. Beginning programmers often tend to think of a C/C++ program as a single .c or .cpp source file. Modular organization of sources is far superior. Modular organization is essential for writing programs whose structure reflects and explains how they work as well as the underlying concepts.

Section 1.1↓ sets up an example to exhibit the modular organization of sources in C/C++. The example is a technique called the Aitken iteration, which can transform certain sequences to hasten their convergence. In sections 1.2↓ and 1.3↓ , we review some features of C and C++ using this example. The concluding section 1.4↓ introduces a little Fortran. For reasons explained at length later, we do not recommend programming in Fortran. However, a lot of scientific programs are written in Fortran, mainly Fortran 77. A scientific programmer needs to know just enough Fortran to be able to use these old Fortran codes.

1.1 An example: The Aitken transformation

The Aitken transformation maps a sequence of numbers to another sequence of numbers. It serves as the vehicle to introduce aspects of C, C++, and Fortran in this chapter. It is also interesting in its own right.

The Aitken transformation is given by the following formula:

(1.1) t n − 1 = s n − 1 − ( (s n − s n − 1 )2 )/( s n + 1 − 2s n + s n − 1 ) . s 0 , s 1 , …, s N into a new sequence t 0 , t 1 , …, t N − 2 , which has two fewer terms. The idea behind the Aitken transformation is as follows. If the s n sequence is of the form s n = S + aλn , all terms in the t n sequence are equal to S . It is useful for speeding up the convergence of a number of sequences, even those that do not directly fit the S + aλn pattern. Section It transforms a sequenceinto a new sequence, which has two fewer terms. The idea behind the Aitken transformation is as follows. If thesequence is of the form, all terms in thesequence are equal to. It is useful for speeding up the convergence of a number of sequences, even those that do not directly fit thepattern. Section 1.1.1↓ illustrates the dramatic power of the Aitken iteration on a couple of examples---the Leibniz series and the logarithmic series. To be sure, these examples are chosen carefully.

This section begins to make the point that it is generally advantageous to split a program into multiple sources. We could use a single source file to code the Aitken iteration and apply it to the Leibniz series as well as the logarithmic series. Such a program would work just as well to begin with. A few days later, we may want to apply the Aitken iteration to another example. If we also throw that example into the same source file, the source file will become a little more unwieldy. A few months later, we may want to use the Aitken iteration as part of a large project. If we insist on using a single source file, there are two equally unpleasant alternatives: copy the whole Aitken program into the large project or copy the large project into the Aitken program. There is a heavy price to pay for avoiding modular organization of programs. Section 1.1.2↓ gives a preliminary discussion of the modular organization of program sources using the Aitken iteration as an example. Later sections build on this preliminary discussion.

1.1.1 Leibniz series and the logarithmic series

π = 4 − ( 4 )/( 3 ) + ( 4 )/( 5 ) − ( 4 )/( 7 ) + ( 4 )/( 9 ) − ( 4 )/( 11 ) + ⋯ This series, whose terms alternate in sign and diminish in magnitude monotonically, will be a recurring example. So we will begin by looking at it carefully. Let S n be the sum of the first n terms. As shown in figure S n are alternately above and below π . Further, the convergence is slow. In fact, | π − S n | > 2 ⁄ (2n + 1) , which implies that the first million terms of the Leibniz series can give only slightly more than six digits of π after the decimal point. The Leibniz series is given byThis series, whose terms alternate in sign and diminish in magnitude monotonically, will be a recurring example. So we will begin by looking at it carefully. Letbe the sum of the firstterms. As shown in figure 1.1↓ , the partial sumsare alternately above and below. Further, the convergence is slow. In fact,, which implies that the first million terms of the Leibniz series can give only slightly more than six digits ofafter the decimal point.

π . The partial sums are alternately above and below, and it seems as if we can fit a smooth curve through the iterates. The Aitken iteration guesses this trend quite well and speeds up the convergence of the Leibniz series. If we look at figure 1.1↑ , we may notice that although the convergence to the limit is slow, the partial sums seem to follow a certain trend as they approach. The partial sums are alternately above and below, and it seems as if we can fit a smooth curve through the iterates. The Aitken iteration guesses this trend quite well and speeds up the convergence of the Leibniz series.

13 partial sums of the Leibniz series. After each application, we have a sequence with two fewer numbers, and at the end of the sixth application of the Aitken transformation, we have just one number that equals π to 10 digits of accuracy. Because none of the 13 partial sums gives even the first digit after the decimal point, it seems astonishing that an answer with 10 digits of accuracy can be produced from those numbers. Table 1.1↓ shows Aitken’s transformation (1.1)↑ applied repeatedly to the firstpartial sums of the Leibniz series. After each application, we have a sequence with two fewer numbers, and at the end of the sixth application of the Aitken transformation, we have just one number that equalsto 10 digits of accuracy. Because none of thepartial sums gives even the first digit after the decimal point, it seems astonishing that an answer with 10 digits of accuracy can be produced from those numbers.

Computing the digits of π is a mathematical sport of unending interest. Even Isaac Newton had a weakness for it. The Aitken iteration, although impressive, is far from being the best method for computing π .

log(1 + x) = x − x2 ⁄ 2 + x3 ⁄ 3 − x4 ⁄ 4 + ⋯ diverges for |x| > 1 . However, as shown in table 13 partial sums recover the value of log(1 + x) for x = 1.25 . If x were larger, just the first 13 partial sums would not be enough to produce 10 digits of accuracy after the decimal point. The logarithmic seriesdiverges for. However, as shown in table 1.2↓ , the Aitken transformations of the firstpartial sums recover the value offor. Ifwere larger, just the firstpartial sums would not be enough to producedigits of accuracy after the decimal point.

x Partial Sum Extrapolate 0.00 0.0000000000 0.0000000000 0.20 0.1823215568 0.1823215568 0.40 0.3364723763 0.3364722366 0.60 0.4700395318 0.4700036292 0.80 0.5895867562 0.5877866649 1.00 0.7301337551 0.6931471806 1.25 1.5615505069 0.8109302162

Here we have our first programming problem. There is a simple iteration (1.1↑) to begin with. The problem is to code it in C/C++ and apply it to the Leibniz series and the logarithmic series.

1.1.2 Modular organization of sources

Before we delve into the syntax of C/C++, let us look at how to structure a program for the Aitken iteration. In particular, we look at how to split the program between source files.

It is not uncommon to introduce computer programming using programs that reside in a single source file. A computer program as a single source file is a bad idea to allow into one’s head. It is a bad idea that can grow and grow. This author has heard of Fortran programs longer than 100,000 lines in a single file. Even the simple Aitken example shows why a single monolithic source is a bad idea.

The Aitken iteration, as we have discussed it so far, consists of the iteration (1.1↑) and its application to the Leibniz and Mercator series. One way to write this program is to code a function for the Aitken iteration, another function to set up and apply it to the Leibniz series, and likewise yet another function to apply it to the logarithmic series.

Breaking up programs into functions is the first step. However, throwing all the three functions into the same source file would not be a good idea. There is a simple conceptual reason for this. The Aitken iteration is a general technique, and its applications to the Leibniz and logarithmic series are two specific examples. Coding all the functions in the same source file limits the usefulness of the Aitken iteration.

In later sections, we will implement the Aitken iteration and the two examples in three separate source files. The first source file aitken.c codes the Aitken iteration in a form that would apply to any sequence. The specific applications are in the source files leibniz.c and logseries.c .

If the program is separated into three sources in this manner, the question of how they may interface with each other and work together arises. The first part of the interfacing is a header file that we will call aitken.h . This header file gives a basic summary of what is found inside aitken.c but does not have the implementation of any function. The source files leibniz.c and logseries.c include this interface within their code. Thus, the sources will look as follows:

aitken.h aitken.c leibniz.c logseries.c

with the header file aitken.h included in all the three .c source files.

This manner of breaking up a program into multiple source files is deeply integrated into C/C++. A C/C++ compiler converts a source file into a sequence of machine instructions independently of all other sources. For this reason, a source file is sometimes called a compilation unit. In our Aitken example, three object files aitken.o , leibniz.o , and logseries.o , holding machine instructions corresponding to the respective source files, result from compilation.

The source leibniz.c will include calls to functions defined in aitken.c . The header aitken.h , which is included in leibniz.c , includes just enough information to partially set up this function call when the machine instructions of leibniz.o are being compiled. The crucial information that is missing is the address and definition of the function in aitken.c that is invoked in leibniz.c . That information is supplied only when the object files aitken.o and leibniz.o are combined into an executable, which we will call leibniz.exe, by the linker. It is the linker’s job to eliminate unresolved references in the object files and put together an executable without any unresolved references. The executable is the program that ultimately runs on the computer.

We will make the discussion of compilation and linking far more concrete in later sections. The overall picture is as follows: the program is separated into multiple source files, the source files are compiled separately into object files holding machine instructions, and the linker merges the object files and eliminates unresolved references to produce an executable. The overall picture for the Aitken program is shown in figure 1.2↑

Exercise: Substitute s n = S + aλn into the Aitken formula t n is equal to S . Substituteinto the Aitken formula (1.1↑) and verify that each new termis equal to

Exercise: In S = s n − 1 if s n − 1 = s n − 2 and there is a divide by zero if s n + 1 − 2s n + s n − 1 = 0 . Interpret both conditions. In (1.1)↑ ifand there is a divide by zero if. Interpret both conditions.

Exercise: Prove that ( 1 )/( 1 + x2 ) = 1 − x2 + x4 − ⋯ + ( − 1)n − 1x2n − 2 + R n , where R n = ( − 1)nx2n ⁄ (1 + x2) . Integrate from 0 to 1 , while noting that 1 ⌠ ⌡ 0 ( x2n )/( 1 + x2 ) dx < ( 1 )/( 2n + 1 ) , to deduce the Leibniz series. Prove thatwhere. Integrate fromto, while noting thatto deduce the Leibniz series.

Exercise: If S n is the partial sum of the first n terms of the Liebniz series for π (not π ⁄ 4 ), prove that | π − S n | > 2 ⁄ (2n + 1) , showing that the Leibniz series converges slowly.

Exercise: To understand the Aitken iteration, it is helpful to look at singularities. Prove that the function arctanz has singularities at z = ±i in the complex plane. Determine the type of the singularities.

The C programming language is concise. In this review, we go over a few features of C, emphasizing points that introductory classes often miss. Thus, we discuss header files, arrays, and pointers, with emphasis on the distinction between lvalues and rvalues, as well as the distinction between declarations and definitions. We go over the compilation and linking process while adding more detail to the picture in figure 1.2↑ . Although C is concise, it demands precision in thinking. The emphasis in this review is toward greater precision. Table 1.3↓ shows two of the best books to learn C and C++.

C programming B.W. Kernighan and D.M. Ritchie, The C programming language, 2nd ed. C++ programming B. Stroustrup, The C++ programming language, 3rd ed.

1.2.1 Header files

aitken.h . This header file is an interface to the functions defined in aitken.c. A header file in C is an interface to one or more source files, with each source file a collection of function definitions. As an example of a simple header file, we look at. This header file is an interface to the functions defined in

1 #ifndef aitkenJuly09DVsfe 2 #define aitkenJuly09DVsfe 3 void aitken(double *seq1, double *seq2, int len); 4 double aitkenExtrapolate(double *seq1, double* seq2, 5 int len); 6 #endif

The two symbolic names aitken and aitkenExtrapolate are introduced in this header file. These are the names of two functions that are defined in aitken.c . For the two functions, the declarations in the header file specify the type of the arguments as well as the type of the value that is returned. Thus, this header file is saying that aitken() is a function that takes three arguments---the first two of type pointer to double and the last of type int ---and returns nothing ( void ). The function aitkenExtrapolate() is stated to take three arguments of the same types, but it returns a double .

Both of these are examples of function declarations. The declarations specify the arguments the functions take in as well as the types of what they return. When a declaration is made, the function is stated to exist somewhere. However, a sequence of statements defining the functions is lacking.

The source files aitken.c , leibniz.c , and logseries.c include aitken.h within their text using the line #include "aitken.h" . The #include statement has the effect of splicing all of aitken.h into those source files. The source aitken.c includes the header aitken.h and goes on to define the functions declared in the header. The arguments and return type of a function at the point of definition must exactly match the declaration in the header. The sources leibniz.c and logseries.c include the header aitken.h with a different intent. Their intent is to obtain license to make calls to the functions aitken() and aitkenExtrapolate() .

The header file acts as an interface, and it is natural, as well as typical, to include comments in it about the arguments to functions declared in it and about what those functions do. In this book, there are few comments in the code listings. The comments are instead found in the text.

aitken() (line 3) applies one step of the Aitken transformation seq1[0..len-1] and leaves the result in the array seq2[0..len-3] . The function aitkenExtrapolate() (line 4) applies the Aitken transformation repeatedly to the array seq1[0..len-1] while using seq2[] as scratch space, until only one or two numbers are left. One of the remaining numbers is returned as the extrapolated value. The function(line 3) applies one step of the Aitken transformation (1.1↑) to the arrayand leaves the result in the array. The function(line 4) applies the Aitken transformation repeatedly to the arraywhile usingas scratch space, until only one or two numbers are left. One of the remaining numbers is returned as the extrapolated value.

At the beginning of aitken.h and at the end, three lines (1, 2, and 6) begin with the special character # . These lines are directives to the macro preprocessor, one of the initial phases of compilation. Sources are manipulated textually by the macro preprocessor.

The source file leibniz.c includes aitken.h using a #include directive. When the compiler is applied to leibniz.c , the initial preprocessing stage handles the #include directive. When processing that directive, the file aitken.h is opened by the preprocessor. The first directive in that file ( #ifndef on line 1) asks the preprocessor to check whether a name called aitkenJuly09DVsfe is already defined. If the name aitkenJuly09DVsfe is not defined, the second line of aitken.h ( #define directive) asks the preprocessor to define such a name, which it does. The preprocessor splices in all the text between the second and last lines of aitken.h (lines 3 to 5) into the source file that issued the include "aitken.h" directive. The splicing into leibniz.c is done at the point where the #include directive was issued.

However, if such a name is already defined (because the header file was included by an earlier directive), it skips over everything until it sees a #endif directive (line 6); in this case, it would be skipping the entire aitken.h file.

Using #ifndef , #define , and #endif directives ensures that if a directive to include the same header file is issued twice by the same source file, the second directive has no effect. That may not sound useful at first. However, in practice, a source file may include a header file that includes some other header file and so on. So two directives to include different header files may end up including the same header file twice. Such a thing is prevented by the combination of preprocessor directives we just discussed.

The macro variable aitkenJuly09DVsfe is chosen to be complicated to avoid accidentally using the same name in two different header files. Such accidental reuse will mean that the two header files can never be simultaneously included in the same source. Such errors at the macro preprocessing stage can be a little bothersome.

1.2.2 Arrays and pointers

Arrays and pointers are the heart of the C language. An array is a sequence of locations in memory. A pointer-type variable is a variable that holds the address of a memory location. The word pointer can be used for either such a variable or an expression that evaluates to an address. The two concepts may appear different, but the C language blurs the distinction between them. In C, arrays and pointers are almost interchangeable.

There is a very good reason for blurring the distinction between arrays and pointers. Suppose we want to pass a long sequence, occupying a great deal of memory, as an argument to a function. It would be wasteful to allocate new memory and copy the sequence entry by entry at every function call. In C, arrays are passed as pointers.

The key idea in almost identifying arrays with pointers is as follows. A sequence of data items in memory can be specified using three pieces of information: the address of the first item, the size in bytes of each item, and the number of entries in the sequence. In C, a pointer holding an address is specified as a pointer of a certain type. The size of each item is inferred from type information. For example, in an array of double s, the size of each item is 8 bytes and in an array of int s the size of each item is 4 bytes (on GNU/Linux). Thus, from merely knowing a pointer, we can infer the first two pieces of information: the address of the first item in memory and the size of each item in bytes. The last piece of information, namely, the length of the array, is often tagged along separately. Thus, arrays and pointers may be identified, and arrays may be passed to functions efficiently as pointers with the length of the array tagged along as an extra parameter.

How this idea plays out in practice, we will examine presently.

Arrays, pointers, lvalues, and rvalues

char or int or double , or to arrays, or to pointers---will all ultimately correspond to locations in computer memory. To introduce a name of a variable of basic type, we may use a definition such as int x; to introduce a name of an array, we may use a definition such as int list[3]; to introduce a name of a pointer, we may use a definition such as int *a. The C language takes a certain view of computer memory, and arrays and pointers are both best understood in terms of computer memory. Figure 1.3↑ is a schematic view of a portion of the computer memory (in hardware, this would be DRAM in most circumstances). Names that we introduce into our code---whether they correspond to variables of basic types such asoror, or to arrays, or to pointers---will all ultimately correspond to locations in computer memory. To introduce a name of a variable of basic type, we may use a definition such asto introduce a name of an array, we may use a definition such asto introduce a name of a pointer, we may use a definition such as

In C semantics, an expression may evaluate to a value that is the name of a memory location. Such a value is called an lvalue. The “l” refers to the fact that such values may occur on the left-hand side of an assignment. In contrast, an expression may evaluate to a value that may be used to fill a memory location but is not necessarily the name of any location in memory. Such a value is called an rvalue. The “r” here refers to the possible occurrence of such a value on the right-hand side of an assignment statement. The concept of rvalues and lvalues is useful for understanding arrays and pointers.

The distinction between lvalues and rvalues arises fundamentally because of the assignment statement. In an assignment, what occurs on the left is the name of a memory location or an lvalue. What occurs on the right is a value that is used to fill a memory location or an rvalue. The distinction is important in the context of pointers and arrays because, although a pointer is naturally thought of as an address and an address is nothing but a number naming a memory location, pointers themselves may be stored in pointer-type variables.

x is introduced using the definition int x , it is an lvalue because it is the name of a memory location as shown in figure x=37 executes, the location is filled with 37 as shown in the figure. The variable x can be both an lvalue and an rvalue. In the statement x=x+7 , the occurrence on the left is an lvalue and the occurrence on the right is an rvalue. If the variableis introduced using the definition, it is an lvalue because it is the name of a memory location as shown in figure 1.3↑ . If the assignment statementexecutes, the location is filled withas shown in the figure. The variablecan be both an lvalue and an rvalue. In the statement, the occurrence on the left is an lvalue and the occurrence on the right is an rvalue.

x is shown as A(0..3) . On a typical computer system today, A may be understood as a 64 -bit ( 8 -byte) address. An int occupies four bytes in GNU/Linux, and the addresses of the four bytes named by the variable x are A + 0, A + 1, A + 2, A + 3 . This is shown in the figure as A(0..3) . Addresses are shown in figure 1.3↑ in a slightly nonstandard way. At the top of the figure, the address of the memory location named byis shown as. On a typical computer system today,may be understood as a-bit (-byte) address. Anoccupies four bytes in GNU/Linux, and the addresses of the four bytes named by the variableare. This is shown in the figure as

The C language allows us to take the address of a variable (in general, the address of any lvalue). To get the address of the variable x of type int , we may use the syntax &x . Although the int location is 4 bytes with addresses A, A + 0, A + 1, A + 2 , the value of &x is A , which is the address of the first byte in the memory location named by x . Here &x is an rvalue (its value being A ) but not an lvalue because it is not the name of any location in memory. In the same expression &x , x is an lvalue.

int *a . As shown in figure a is the name of 8 bytes of memory and not just 4 , as in the case of x . This is because the pointer-type variables are meant to hold addresses, and as we have already said, an address is 8 bytes on most computers today. We may define a pointer using the syntax. As shown in figure 1.3↑ is the name ofbytes of memory and not just, as in the case of. This is because the pointer-type variables are meant to hold addresses, and as we have already said, an address isbytes on most computers today.

a=&x , the memory location named a gets filled with the address of x , which is A in figure a=&x+1 , the location a would get filled with A + 4 and not A + 1 . That is because both &x and a are of type pointer to int , and the C compiler knows that an int is 4 bytes and not just 1 byte. In the assignment a=&x , a is an lvalue and &x is an rvalue. Less obviously, x is an lvalue in the same expression. The distinction between names of memory locations (lvalues) and values that may be used to fill memory locations (rvalues) is valuable to keep in mind. If we now say, the memory location namedgets filled with the address of, which isin figure 1.3↑ . If we were to say, the locationwould get filled withand not. That is because bothandare of type pointer to, and the C compiler knows that anisbytes and not justbyte. In the assignmentis an lvalue andis an rvalue. Less obviously,is an lvalue in the same expression. The distinction between names of memory locations (lvalues) and values that may be used to fill memory locations (rvalues) is valuable to keep in mind.

The operator & allows us to extract the address of an lvalue. Conversely, the operator * converts an address to the name of a memory location (an lvalue). So if we say a=&x and then say *a=7 , the effect is as follows. First, the location named a is filled with the address of x . Next, when we say *a=7 , *a is the name of the location whose address is the value held in a . The lvalue in this assignment is *a , not a . Of course, the location named by *a is the same as the location named by x . So the effect of *a=7 is to change the value of x to 7 .

It is worth noting that the word pointer may refer to an lvalue or an rvalue. A variable, or any lvalue, that holds addresses may be called pointer. In addition, the address itself (an rvalue) may be called a pointer. The picture behind either usage is of an address pointing to a location.

If e is an expression that evaluates to a pointer to a double (rvalue), the numerical value of e+1 is 8 more than the value of e . That is because a double is 8 bytes. If e is a pointer to a more complex type such as a struct , the C compiler calculates the size of the struct in bytes and increments e by that amount when evaluating e+1 . Thus, e+1 advances the pointer by one item and e+27 advances the pointer by 27 items, taking into account the data type that e points to. If p is a pointer-type variable, the assignment p=p+17 advances the pointer by 17 items. Likewise, the assignment p=p-17 moves the pointer backward by 17 items.

int list[3] . After this definition list is an rvalue but not an lvalue (it is not the name of any memory location). The (r)value of list is the address of the first of three consecutive memory locations of type int set aside by the definition. In figure A + 12 . If we use syntax such as list[1]=8 , list[1] is an lvalue as shown in the figure. It is nothing but an abbreviation of *(list+1) . Similarly, list[0] and list[2] are also valid lvalues as shown in the figure. In fact, list[100] is also a syntactically valid lvalue. The only problem is that an assignment such as list[100]=2 will most likely lead to a runtime error because only three int locations have been legally claimed. Such runtime errors are triggered by the operating system in a manner that is explained in a later chapter. An array may be introduced using a definition such as. After this definitionis an rvalue but not an lvalue (it is not the name of any memory location). The (r)value ofis the address of the first of three consecutive memory locations of typeset aside by the definition. In figure 1.3↑ , it is. If we use syntax such asis an lvalue as shown in the figure. It is nothing but an abbreviation of. Similarly,andare also valid lvalues as shown in the figure. In fact,is also a syntactically valid lvalue. The only problem is that an assignment such aswill most likely lead to a runtime error because only threelocations have been legally claimed. Such runtime errors are triggered by the operating system in a manner that is explained in a later chapter.

int list[3] supplies all three pieces of information. The address of the first location is the value of list . The size of each item is 4 bytes (on GNU/Linux) because an int is four bytes. Finally, the number of items in the array is 3 as shown in figure list . If list is passed as an argument to a function, the length of the array, which is the third piece of information, must be supplied separately. We began by noting that a sequence may be specified using three pieces of information: a pointer to the first location, the size of each item, and the number of items. The definitionsupplies all three pieces of information. The address of the first location is the value of. The size of each item is 4 bytes (on GNU/Linux) because anis four bytes. Finally, the number of items in the array is 3 as shown in figure 1.3↑ . The first two pieces of information are contained in the type and value of. Ifis passed as an argument to a function, the length of the array, which is the third piece of information, must be supplied separately.

Machine or assembly languages too access data items in a sequence using an address, the size of each item, and the index of the item, much as in C.

A C source with arrays and pointers

So far our discussion of arrays and pointers has been with reference to figure 1.3↑ . We will now write a simple C program illustrating the discussion.

Before looking at C source, a few comments about indentation are in order. For program source not meant for display in a book, we use eight-space indentation. Tab stops are separated by eight characters and terminal screens are conventionally 80 character wide. When the program code is lined up according to tab stops, the code is much easier to browse. The nesting level of loops becomes readily evident. The nesting level is an indication of the level of complexity of the code. Therefore, it is useful to be able to recognize the nesting level immediately. Too many levels of nesting often imply that the code is poorly structured.

Let us look at the following code, which uses five-space indentation. The suggestion of eight-space indentation assumes 80 -character-wide lines. Here the lines are about 50 characters, and the indentation has been scaled down proportionally.

1 #include <stdio.h> 2 int main() 3 { 4 int x; 5 int list[3]; 6 int *a; 7 a = &x; 8 list[1]=2; 9 *a = 35+list[1]; 10 printf("%p %p %d

", &x, list, x); 11 }

We are allowed to say list[1] = 2; (line 8), but list = &x; would have been illegal. The reason is that list has a value but is not the name of any location. On one run, this code had the following output (line 10).

0x7fff3cb8e414 0x7fff3cb8e400 37

x is printed as 37 , as we may have expected. The address of x and the value of list are printed in hexadecimal as indicated by the 0x at front. Thus, each address is 48 bits long. Figure The value ofis printed as, as we may have expected. The address of x and the value ofare printed in hexadecimal as indicated by theat front. Thus, each address isbits long. Figure 1.3↑ may give the impression that each address is the address of a location in physical memory. In fact, the addresses that are printed out are virtual addresses, a concept we will discuss in later chapters.

The printf() function used on line 10 is part of the standard C library. Its declaration will be in the standard header file stdio.h , which is included on line 1. The C compiler knows where to look for this header file.

1.2.3 The Aitken iteration using arrays and pointers

As already noted, arrays and pointers are almost equivalent in C. The principal advantage of thinking of arrays in this way arises in passing arrays as arguments to functions. Here we use the Aitken example to illustrate how arrays may be passed as pointers.

aitken.c begins with two directives The filebegins with two directives

#include <assert.h> #include "aitken.h"

the second of which includes the header file aitken.h . Including aitken.h allows the compiler to ensure that the definitions in aitken.c are consistent with the declarations in the header file. The first line includes the standard header assert.h . The job of finding that header file is left to the compiler. Including that header file allows us to use the assert statement to check a condition in the body of the code (see line 4 below).

aitken() operates on arrays that are passed to it as pointers. The functionoperates on arrays that are passed to it as pointers.

1 void aitken(double* seq1, double* seq2, int len){ 2 int i; 3 double a, b, c; 4 assert(len > 2); 5 for(i=0; i < len-2; i++){ 6 a = seq1[i]; 7 b = seq1[i+1]; 8 c = seq1[i+2]; 9 seq2[i] = a - (b-a)*(b-a)/(a-2*b+c); 10 } 11 }

aitken() from some other source file, we may have declared two arrays using double s[100], t[100] . Those two arrays will correspond to two segments of memory each equal to 100 doubles (see figure s[0]...s[99] and t[0]...t[99] are names of double locations, which make up those segments of memory. In contrast, s and t have values of type double * but are not names of any locations in memory. If a call is made as In a function that callsfrom some other source file, we may have declared two arrays using. Those two arrays will correspond to two segments of memory each equal to(see figure 1.4↓ ). As we have noted,andare names oflocations, which make up those segments of memory. In contrast,andhave values of typebut are not names of any locations in memory. If a call is made as

aitken(s, t, 100);

seq1 and seq2 are names of locations in memory that can hold double * . The values of s and t are copied to the locations in memory whose names are seq1 and seq2 , respectively (see figure seq1+17 is the same as the value of s+17 . Thus, seq1[17] , which is exactly the same as *(seq1+17) , is another name for the memory location s[17] . Thus, we see that by indexing into seq1 and seq2 as seq1[0]...seq1[99] and seq2[0]...seq2[99] , we may refer to any entry in the arrays s and t defined by the caller (see figure it has the following effect. The function parametersandare names of locations in memory that can hold. The values ofandare copied to the locations in memory whose names areand, respectively (see figure 1.4↓ ). The value ofis the same as the value of. Thus,, which is exactly the same as, is another name for the memory location. Thus, we see that by indexing intoandasand, we may refer to any entry in the arraysanddefined by the caller (see figure 1.4↓ ).

There is a little catch here, however. What happens if we say seq1[100] or seq1[200] ? We would be generating a name for a location in memory that was not legally claimed by the caller. That is likely to result in a run-time error. By just using the pointers seq1 and seq2 , there is no way we can tell how long the array is. Therefore, the length of the array is the third parameter, which is named len , in the function definition. The caller has to explicitly give the length of the array, as it does here by passing 100 as the third argument.

Line 9 of the listing corresponds directly to the Aitken transformation formula (1.1)↑

The assert(len>2) statement on line 4 works as follows. If the code is compiled with the option -DNDEBUG , it is as if that line were not there and no extra overhead is incurred. If that option is not used during compilation, the condition len>2 is checked during run-time. If it is violated, the program will abort and print a message indicating the name of the file and the line number of the assertion that turned out to be false. The assert statements are valuable aids to debugging and indirectly useful as comments.

The definition of aitkenExtrapolate() , which is also in aitken.c , is shown below to give a complete account of the source aitken.c .

1 double aitkenExtrapolate(double *seq1, double* seq2, 2 int len){ 3 int n, i, j; 4 n = len/2; 5 if(len%2==0) 6 n--; 7 for(i=0; i < n; i++){ 8 aitken(seq1, seq2, len-2*i); 9 for(j=0; j < len-2*(i+1); j++) 10 seq1[j] = seq2[j]; 11 } 12 return (len%2==0)?seq1[1]:seq1[0]; 13 }

We assume familiarity with the level of C that occurs in this function, although reading programs such as this can be harder than writing them. Writing good for-loops is the heart of C programming, and there are two nested for-loops in this program. Line 12 uses a conditional expression that may be less familiar. The conditional expression (a<b)?c:d tests the condition a<b . If the condition is true, it evaluates to c but to d otherwise.

1.2.4 Declarations and definitions

Names introduced into a C program are for the most part names of either functions or variables. The names can be introduced as either declarations or definitions.

Suppose a variable name is introduced using int x . When the compiler encounters that statement, it sets aside a location in memory for an int and makes x the name of that memory location. This statement is a variable definition, not merely a declaration, because it sets aside memory for x .

A declaration gives type information about a variable that is expected to be defined elsewhere. An example of a variable declaration is a statement such as extern int x . When the compiler encounters such a statement, it notes that x is a variable of type int that is expected to be defined in some other source file. It does not set aside any location in memory. If it later encounters a statement such as x=x+2 , it does not complain. However, it cannot generate complete machine instructions to carry out that statement because it has no idea where x is defined and what address it corresponds to. That information has to be supplied by the linker later.

Both the lines in the header file aitken.h are function declarations not definitions. When the compiler encounters a declaration such as

void aitken(double *seq1, double *seq2, int len);

it notes that aitken() is the name of a function that takes three arguments, the first two of which are of type double * and the last of which is an int , and returns nothing ( void ). We can omit seq1 , seq2 , and len from the declaration. Such an omission would make the declaration difficult to read and understand for us, but it makes no difference as far as the compiler is concerned. The compiler does nothing more than note the types of the arguments (or parameters).

When the compiler later encounters a function call such as aitken(s, t, n) , it first checks that s and t are of type double * and n is of type int . If the check succeeds, the compiler will generate instructions to set up the arguments and pass control to the function aitken() . If the function aitken() is defined in some other source file, which it may well be, the compiler has no idea where in memory the code for aitken() is located. So it cannot generate complete machine instructions for passing control. That job is the linker’s responsibility.

A function definition such as

void aitken(double *seq1, double *seq2, int len){ ... }

has a totally different effect. When the compiler encounters a function definition, it generates machine instructions for the body of the function (which is omitted here) and figures out where to place these instructions in memory. After that, aitken corresponds to the chunk of memory that contains machine instructions that implement the body of the function. Just as with variables, defining a function amounts to setting aside memory for it during compilation.

1.2.5 Function calls and the compilation process

leibniz.c , which uses the functions defined by aitken.c to extrapolate the partial sums of the Leibniz series and produces data corresponding to table Here we take a look at the mechanism of function calls and the compilation process. Much of the discussion is centered around the file, which uses the functions defined byto extrapolate the partial sums of the Leibniz series and produces data corresponding to table 1.1↑

The source file leibniz.c begins with two directives:

#include "aitken.h" #include <stdio.h>

The first directive includes aitken.h to interface to functions defined in aitken.c . The second directive includes stdio.h to interface to printing functions defined in the stdio (standard input/output) library.

leibniz() function, which generates partial sums of the Leibniz series, is defined below without comment. Thefunction, which generates partial sums of the Leibniz series, is defined below without comment.

//partial sums of 4(1-1/3+1/5-1/7+1/9-...) void leibniz(double* seq, int len){ int i; for(i=0; i < len; i++) if(i==0) seq[i] = 4.0; else if(i%2==1) seq[i] = seq[i-1] - 4.0/(2.0*i+1); else seq[i] = seq[i-1] + 4.0/(2.0*i+1); }

The printseq() function prints a sequence using the printf() function defined in the stdio library.

void printseq(double* seq, int len){ int i; printf("



"); for(i=0; i < len; i++) printf("%-.10f

",seq[i]); }

main() function for generating the data shown in table aitkenExtrapolate() . In a C or C++ program, the function named main() is the first to gain control when a program is run. The logic used by thefunction for generating the data shown in table 1.1↑ is similar to that of. In a C or C++ program, the function namedis the first to gain control when a program is run.

1 int main(){ 2 const int len = 13; 3 double seq1[len]; 4 double seq2[len]; 5 int n, i, j; 6 leibniz(seq1, len); 7 n = len/2; 8 if(len%2==0) 9 n--; 10 for(i=0; i < n; i++){ 11 printseq(seq1,len-2*i); 12 aitken(seq1, seq2, len-2*i); 13 for(j=0; j < len-2*(i+1); j++) 14 seq1[j] = seq2[j]; 15 } 16 if(len%2==0) 17 printseq(seq1, 2); 18 else 19 printseq(seq1, 1); 20 }

main() function calls other functions on lines 6, 11, 12, 17, and 19. All calls except the one to aitken() on line 12 are to functions defined in the same source file. We have looked at the call to aitken() from the point of view of the callee in section leibniz(seq1, len) on line 6 from the point of view of the caller. Thefunction calls other functions on lines 6, 11, 12, 17, and 19. All calls except the one toon line 12 are to functions defined in the same source file. We have looked at the call tofrom the point of view of the callee in section 1.2.3↑ . Let us look at the function callon line 6 from the point of view of the caller.

Within main() , seq1 is the name of an array of length 13 and len is an int (because it is declared const , it must be treated as something that is merely an rvalue). When the function call leibniz(seq1, len) is executed, the value of seq1 is copied to the location whose name is seq within leibniz() (see the definition of leibniz() given above). In addition, the content of the location whose name is len inside main() is copied to the location whose name is len inside leibniz() . This is the call-by-value semantics of C.

The function call aitken(seq1, seq2, len) on line 12 invokes a function that is not defined in the leibniz.c compilation unit. To see what difference that makes, let us see what happens when we issue the command

icc -c leibniz.c

The -c option tells the icc C compiler that it should only produce the object file and not the executable. The object file it produces will be called leibniz.o (on Unix systems). To produce the object file, the compiler runs through the code and converts the leibniz() function definition into a sequence of machine instructions. Next it converts printseq() into a sequence of machine instructions before turning to main() . When it hits the function call leibniz(seq1, len) on line 6 of the main() function, it copies the arguments seq1 and len into a place where the leibniz() function can retrieve them. After that it simply generates a machine instruction to make the function call. That machine instruction will pass control to the leibniz() function, which the compiler has already converted into a set of machine instructions.

When the compiler sees the aitken(seq1, seq2, len) call on line 12 of the main() function, the process is initially similar. Thanks to the included header aitken.h , the compiler will have already seen the declaration of aitken() to be void aitken(double *seq1, double *seq2, int len) . So the compiler generates machine instructions to copy seq 1, seq2 , and len into a place where the definition of aitken() can find them. At this point, there is a problem: the compiler cannot generate machine instructions to pass control to aitken() . The compiler has no idea where the machine instructions for aitken() are. The name aitken() remains an unresolved reference in the object file leibniz.o .

Thus, the object file leibniz.o by itself cannot be turned into an executable. To turn it into an executable, we must first compile aitken.c as follows:

icc -c aitken.c

The -c option tells the icc command to compile only. Without that option, the command will try to compile and then link and produce an executable with the default name a.out . With the compile only option, the command will generate an object file called aitken.o with machine instructions for the two functions defined in aitken.c . To generate the executable, we run the following command:

icc -o leibniz.exe leibniz.o aitken.o

The -o option tells the icc command to leave its output in the file leibniz.exe instead of the default a.out . The C linker uses the definition of aitken() in aitken.o to eliminate the unresolved reference to aitken() in leibniz.o . The executable file leibniz.exe it generates is a sequence of machine instructions with no unresolved names. It is ready to be loaded and run.

The extensions .cpp , .c , and .o correspond to C++ source files, C source files, and object files, respectively. In the command icc -c aitken.c , the file extension of aitken.c indicates to the icc command that it is operating on a C source file. So the command will invoke the C compiler. In the linking command, the filename extension .o indicates that leibniz.o and aitken.o are both object files. Therefore, icc invokes the C linker.

We have used the icc compiler/linker from Intel . But the syntax is almost identical if we used GNU’s gcc or the Portland Group’s pgcc . Commands such as icc , pgcc , and gcc look at the file name extension to determine the type of the file.

We use Intel’s C/C++ compilers in this book because it is easier to link certain libraries that we will discuss later. In addition, the Intel compilers were the default standard on some supercomputing systems used in later chapters. GNU’s gcc/g++ compiler is widely used, open source, and reputed to be of excellent quality. For work unrelated to this book, this author normally uses the GNU compilers.

Conventionally, the executable files in Linux do not use the file name extension .exe . The .exe filename extension for executables is a Windows convention. We have adopted the Windows convention throughout this book as it makes for greater clarity.

Going back to leibniz.exe , we may wonder how the name printf() in leibniz.o gets resolved. It is used in the definition of printseq() , and we understood that printf() is defined in the stdio library. However, the linking command gave no explicit instructions to fetch that library. It did not need to because the linker fetches and links a number of standard libraries by default and stdio is one of them.

Exercise: Look up the meaning of the options -c -o -I -L -lm -lmath in the C compiler’s user’s guide.

Exercise: Use a short C program and the sizeof() facility to print the size in bytes of variables of char , int , long int , double , char * , int * , and double * types. Notice that all pointer are the same size (8 bytes).

Exercise: If p and q are pointers of type char * and double * , p=p+1 moves p by one byte while q=q+1 moves q up by eight bytes, so that p or q point to the next char or double , respectively. Write a short C program to demonstrate this aspect of pointer arithmetic.

Exercise: In C, a pointer can be an lvalue or an rvalue. Give an example of a pointer that is an rvalue but not an lvalue.

Exercise: The list of numbers 1, 2, …, n can be rearranged in n! different orders. Write a function that takes an argument n of the type int and prints the n! different permutations.

Exercise: Suppose a[] is an array of type double and size n . Write a function that takes a and n as arguments and sorts the entries of the array. The basic sorting algorithms are bubble sort, quick sort, and heap sort.

Exercise: Rewrite aitken() to confirm to the declaration void aitken(double *seq, int len); Rewriteto confirm to the declarationso that it takes a single sequence and transforms it in place.

Exercise: What is special about the following C program? char *s="char *s=%c%s%c;%cmain(){printf(s,34,s,34,10,10)};%c"; main(){printf(s,34,s,34,10,10);} #include the header file stdio.h , which contains the declaration of printf() . Modify the program to #include that header file while serving the same purpose. What is special about the following C program?The program does notthe header file, which contains the declaration of. Modify the program tothat header file while serving the same purpose.

Exercise: The following C function copies one array to another array, if the two arrays do not overlap. void arr_copy(int* in, int* out, int len) { int i; for(i=0; i < len; i++) out[i] = in[i]; } 1, 2, 3, 4 is rotated left, it becomes 2, 3, 4, 1 . Write a C function for rotating an array left (in-place rotation) that does not use a loop but is allowed to make one call to arr_copy() . Can you write a similar function to rotate right? The following C function copies one array to another array, if the two arrays do not overlap.Ifis rotated left, it becomes. Write a C function for rotating an array left (in-place rotation) that does not use a loop but is allowed to make one call to. Can you write a similar function to rotate right?

Exercise: Suppose that double **x points to an array of pointers of length n and that each entry of the array points to an array of double s, whose length is also n .

Interpret x as a square matrix and write a function void transpose(double **x, int n); Write a function that rotates the matrix columnwise and another function that rotates it rowwise. Write a function that rotates each diagonal forward by one step. More precisely, an entry in position ( i , j ) with 0 ≤ i , j < n must end up in the position ( i + 1 mod n , j + 1 mod n ) .



The C language has a simple philosophy. Its aim is to offer a uniform view of the computer, especially computer memory, to the programmer. C has been so successful that nearly every object that can be called a computer, ranging from supercomputers to routers to embedded and mobile devices, is equipped with a C compiler or a cross-compiler. C is the best vehicle for highly optimized programs.

Because C seeks to be close to the machine, it is a low-level language. There is often a considerable distance between concepts that are native to a problem domain and their expression as C programs. High-level languages provide constructs and syntax that bring the program much closer to ideas and concepts that are native to the problem domain.

The C++ language is something of a compromise to provide the facilities of high-level languages without sacrificing the speed of C. Despite its name, it is not an incremental extension of C. It is a colossal expansion of C syntax. It does not have the seamless nature of truly high-level languages such as Python. Classes in Python use very little syntax and fit cleanly within the highly modular architecture of Python programs. However, languages such as Python are much slower than C++.

Although the C++ language is a compromise, or perhaps because of being a compromise, it has found a great range of uses. On the one hand, C++ has all of C inside it. On the other hand, it provides many mechanisms for capturing concepts and ideas more precisely. Its downside is its complexity. Although clear and careful thinking are essential to all programming, a failure in this respect has particularly acute consequences in C++.

Because our focus is on program speed, the part of C++ we use is quite small. Narrowly defined and flat (as opposed to hierarchical) classes, references, occasional function name overloading, and the ability to define variables in the middle of programs is an almost complete list of the C++ features we use. The classes we define are no more than C struct s endowed with functions. C features such as enum , struct , typedef , and static can be quite powerful for representing concepts and ideas when used judiciously.

Classes in C++ are a mechanism for representing concepts and endowing them with functionality that makes them easy to use. Classes can be general or narrow. The Vector class studied in this section is an example of a general class. It can be made even more general. The Vector class assumes that each entry of a vector is a double . Using templates, one may define a class that allows each entry to be a double or a float or an int or even some user defined type or class.

The Vector class is the only example in this book of a class that attempts to capture a general concept (in this instance, the concept of vectors in linear algebra). Every other class discussed in this book is narrowly defined. We begin our discussion of C++ with the Vector class for two reasons. It is a good vehicle for reviewing some of the features of C++. Indeed, we introduce more features of C++ than we need, but that too serves a purpose. C++ is sometimes believed to lead to slow programs. This example helps us explain thoroughly how that may come about. In later chapters, we illustrate the overhead of the Vector class and the related Matrix class, when used in inner-loops or for disk input/output.

1.3.1 The Vector class

The C++ language can be used in many different ways. Using general classes, one may make C++ look like easy to use languages such as Python or Matlab without incurring the enormous cost of such interpreted languages. Our interest is in fast programs and, even more so, in understanding what makes programs fast or slow. The C++ style we adopt is quite close to C.

Nevertheless, we begin with a general type of class, namely, the Vector class. This class helps us review a few of the features of C++ and is used to implement the Aitken iteration later. In a later chapter, we criticize the use of this class and show it to be slow.

Header file with class definition

The Vector class is defined in the header file Vector.hh . The C++ class consists of data members and member functions. With respect to computer memory, a class object is a collection of data items. The data items could be of basic types such as double or char , or pointers. The data items may also be other class objects or C structures. The member functions provide various means to manipulate the class object or, equivalently, the package of data items that constitutes the class object.

It is typical for header files to give only part of the definition of the class. Many of the member functions are typically defined in a separate source. Here the entire class definition is in the header file Vector.hh , which makes the header file a bit long. We present the contents of the header file in stages, gradually unveiling features of C++. The skeleton of the header file is listed below.

1 #ifndef MYVECTOR 2 #define MYVECTOR 3 #include <cassert> 4 #include <cmath> 5 #include <cstdlib> 6 #include <cstring> 7 #include <iostream> 8 #include <fstream> 9 ​ 10 using namespace std; 11 ​ 12 class Vector{ 13 private: 14 ... 15 public: 16 ... 17 }; 18 ​ 19 #endif

Lines 1, 2, and 19 ensure that the header file expands to the source code in between if and only if the macro variable MYVECTOR is defined. Here MYVECTOR is not good nomenclature, as it may be inadvertently reused by some other header, subverting our attempt to ensure conditional inclusion of header files.

assert.h to use the assert() macro (see section assert.h or cassert , as on line 3, to emphasize that a C facility is being employed. It is typical for C/C++ sources to begin by including a number of header files to interface to other source files or libraries. C programs include the headerto use themacro (see section 1.2.3↑ ). C++ programs may include eitheror, as on line 3, to emphasize that a C facility is being employed.

The Vector class uses other C facilities as well. It uses the fabs() function from cmath (line 4) in one of its member functions to compute the norm of the vector. C facilities to allocate and release memory are defined in stdlib (line 5).

C++ facilities for input/output from terminals and files are in iostream (line 7) and fstream (line 8), respectively. Member functions that input/output vectors to/from files use these facilities.

To output a variable x to standard output (typically the terminal), we may say

cout<<x<<endl;

Here cout is the name of standard output declared in iostream . The end of line character endl is also declared in iostream . In C, input/output syntax is sensitive to the type of data items being handled. The abstraction features of C++ are used by the iostream library to provide a uniform interface for input/output regardless of the type of the variable. Even class objects may be input/output in this manner if the operators << (for output) and >> (for input) are overloaded suitably.

To output to a file, the syntax looks as follows:

ofstream ofile("tmp.txt"); ofile<<x<<endl;

Here ofile is defined as an ofstream object. At the point of definition, it is tied to the file tmp.txt .

To input x from standard input, we may say

cin>>x;

To input from a file, we may say

ifstream ifile("tmp.txt") ifile>>x;

This will work regardless of whether x is a double or an int or a long or a char .

The class names ofstream and ifstream as well as class object names cout and cin are defined in the namespace std . In general, we should say std::cout and std::ofstream because these names do not exist outside the namespace. However, the using namespace declaration on line 10 brings in all the names in std into scope. It allows us to say cout instead of std::cout .

It is often not a good idea to bring in the entire std namespace, especially within header files. The C++ standard library is vast. Bringing in the entire std namespace, as we do on line 10, pollutes the namespace considerably. For example, a programmer may define a function called copy() to copy double arrays in a program-specific manner and conflict with the std namespace. There are functions with common names such as copy() and sort() in the standard library.

The listing shows the outline of the definition of the class Vector (lines 12 through 17). There is a private section (line 13) in the class definition and a public section (line 15). The class definition must end with a semicolon as on line 17. Omitting the semicolon at the end of class definitions is a common novice error.

Private section of the Vector class

A class is a collection of data members and member functions. In the Vector class all the data members are in the private section.

class Vector{ private: long int size; double *data; int owner; public: ... };

The class member size is the length of the vector and data is a pointer to the contents of the vector; so data[i] is in effect a name for the i th entry of the vector. The owner field will be either 0 or 1---its meaning is explained later.

v is a variable of type Vector , then it is a name for a segment of memory that includes a location of type int , a location of type double * , and a location of type int . A schematic view of a Vector of length 2 is shown in figure v is the name of an object of class Vector , v.size , v.data , and v.owner are names for members of v as shown in the figure. One may say that the class object v is a package of the data items v.size , v.data , and v.owner . Ifis a variable of type, then it is a name for a segment of memory that includes a location of type, a location of type, and a location of type. A schematic view of aof lengthis shown in figure 1.5↑ . Ifis the name of an object of class, andare names for members ofas shown in the figure. One may say that the class objectis a package of the data items, and

If a data member or a member function is in the private section, its access is restricted. Because the data items v.size , v.data , and v.owner are in the private section, only member functions are allowed to access them. Whatever functionality we want must be defined through the member functions. For example, suppose we want to input a vector from a file. We are not allowed to directly set the size of the Vector object during input using syntax such as v.size=100 . Instead, we may define a member function input() , which takes care of reading data and setting v.size appropriately.

Member functions of the Vector class

All the member functions of the Vector class are in the public section.

1 class Vector{ 2 private: 3 ... 4 public: 5 Vector(){...} 6 Vector(long int n){...} 7 Vector(const Vector& v){...} 8 ~Vector(){...} 9 void shadow(double *dataptr, long int len) 10 {...} 11 void shadow(const Vector& v){...} 12 void shadow(const Vector& v, long int i, 13 long int len){...} 14 long int getSize() const{...} 15 double * getRawData() const{...} 16 double& operator()(long int i){...} 17 const double& operator()(long int i) const{.} 18 Vector& operator=(const Vector& v){...} 19 void add(const Vector& v){...} 20 void sub(const Vector& v){...} 21 void mul(const Vector& v){...} 22 void div(const Vector& v){...} 23 void scale(const double x){...} 24 void add_constant(const double x){...} 25 double norm() const{...} 26 void output(const char* fname)const{...} 27 void input(const char* fname){...} 28 };

There is a basic difference between data members and member functions. Suppose v is a Vector class object. Then v.size , v.data , and v.owner refer to the data items packaged inside v . In contrast, v.norm() applies the member function norm() (line 25) with the class object v as its target.

The member functions exist within the namespace defined by the class. Each class defines a namespace. The Vector class defines the eponymous namespace Vector:: . If we want to refer to a member function such as norm() outside the scope of the class (the scope here is from lines 1 through 28), the name must be given as Vector::norm() .

For example, we may define the member function norm() in a compilation unit and outside the scope of the class Vector as

double Vector::norm() const{ ... }

We have chosen to define all the member functions in Vector.hh and within the scope of the Vector class.

Before delving into the member functions, we give an example of how Vector objects are used.

#include "Vector.hh" ​ int main(){ Vector v(20); for(int i=0; i < 20; i++) v(i) = i; Vector w1, w2; w1.shadow(v, 0, 10); w2.shadow(v, 10, 10); w1.add(w2); w1.output("w1.txt"); }

The definition Vector v(20) invokes the class constructor on line 6. This constructor will make v.data point to 20 double s, set v.size to 20 , and set v.owner=1 , making v the owner of its data.

The body of the for-loop has v(i)=i . When we say f(x) in C/C++, f() is a function being applied to x . Here we say v(i) , but v is a class object not a function being applied to i . However, thanks to operator overloading, the compiler interprets v(i) as v.operator()(i) ---in words, the member function operator() is applied to the target v with i as the sole argument. The member function defined on line 16 is called. This member function will look at v.data , access the i th entry, and return a reference to it. Thus, when we say v(i)=i , the i th entry of the Vector object v gets set to i . At the end of the for-loop, the entries of v are 0, ..., 19 .

At the definition Vector w1, w2; , the Vector objects w1 and w2 are created using the empty constructor (line 5). This empty constructor sets w1.size to 0 , w1.data to NULL , and w1.owner to 0 , and likewise for w2 . Neither w1 nor w2 owns any data. To begin with, they are vectors of size 0 .

When we say w1.shadow(v, 0, 10) , the member function shadow() defined on line 12 gets invoked. Its effect is to set w1.data to v.data and w1.size to 10 . However, w1.owner remains 0 because the data is owned by v . The Vector object w1 is a shadow of the first 10 entries of v . Likewise, after w2.shadow(v, 10, 10) , w2.data is set to v.data+10 , and w2 becomes a shadow of entries 10 through 19 of v .

When we say w1.add(w2) , the member function add() (line 19) is invoked with w1 as the target and w2 as its argument. This member function verifies that its target and its argument are vectors of the same size and adds w2 to w1 , entry by entry.

The final line w1.output(“w1.txt”) outputs w1 to the file w1.txt via the member function output() (line 26). Thus, the numbers 10, 12, ..., 28 will be output to w1.txt .

The program is not done yet, however. When the class objects v , w1 , and w2 go out of scope at the end of main() , the compiler inserts calls to the destructor (line 8). The destructor is called thrice. When destroying v , the destructor notes that v is the owner of its data and releases the memory that data points to. There is nothing to be done to destroy w1 and w2 because they do not own their data. The destructor returns as soon as it notes that they are not owners.

References

We will step through a few C++ concepts using the public member functions of the class Vector as illustrations. The first of these is references.

A reference in C++ is another name for an object that is already in existence. For example, we may say

double &x = y;

and x becomes another name for y , which is assumed to be of type double . If we say x=2 and then print y , its value will be 2 .

The & character is used to take the address of locations. It is also used for references as illustrated. Overloading the & operator in this way may create some confusion initially, but in fact references are really just a shorthand for the use of pointers.

We may use references in passing arguments to a function. For example, if we define a function as

double f(double &x){ ... }

we may call it as f(z) , assuming z is a variable of type double and x will become a reference to z . The call f(2.0) is not legal, however, because 2.0 is a value and not the name of a double location.

References are commonly used to pass class objects as arguments to functions. The Vector class holds just three data items ( size , data , and owner ), but in general a lot of data can be packaged into a single class. When a class object is passed as an argument to a function, do we really want a fresh copy to be made of all the data inside that object? Typically, the answer is no. First, the expense of having to copy all that data may be undesirable. Second, even if there is only a small amount of data, the semantics could be incorrect. For Vector objects, item-by-item copying results in multiple objects incorrectly owning the same data.

A declaration of a member function can specify that some of the arguments are references. The declaration void add(const Vector& v) (line 19) specifies that the argument named v is passed by reference (the const qualifier is discussed later). The crucial symbol here is the & , which precedes v in the declaration. If arguments are passed as references, we do not have to worry about the correct semantics for copying.

Functions may also return a reference as does the member function operator() (line 16) . This function is defined as

double& operator()(long int i){ assert(i < size); return(data[i]); }

in Vector.hh within the scope of the class Vector . Here size and data are data members of the target. So if v is a Vector object and we say v(6)=17 , this member function is called with v as the target and with argument i equal to 6 . Assuming the vector is of size 6 or greater, the member function returns data[i] , which is the same as v.data[i] by reference. Whatever is returned is another name for the 6 th entry of the Vector object v . Thus, saying v(6)=17 has the effect of setting that entry to 17 .

Operator and function name overloading

We have already seen how the function call operator is overloaded. Below is another example. In this example, the assignment operator is overloaded.

Vector& operator=(const Vector& v){ assert(size==v.size); memcpy((char *)data, (char *)v.data, size*sizeof(double)); return(*this); }

The assignment operator uses memcpy() to copy the data in v to its target. If we say w=v , with both v and w being Vector objects, the member function operator=() is invoked with w as the target and v as its argument. As before, size and data refer to the corresponding items of the target.

A new bit of syntax here is the keyword this . Within a member function, this is a pointer to the target. Therefore, *this is the target. The overloaded assignment operator returns a reference to the target.

If u , v , and w are Vector objects, and we say u=v=w , the compiler interprets that as u=(v=w) . In C/C++, the assignment operator is right associative. The result of v=w is a reference to v , which is the argument in the next call to the overloaded assignment operator that assigns to u .

C++ also allows function name overloading. For instance, we can define a function as

void add(double *v, const double *w, const int len){...}

which adds the array w to the array v . We can define another function

void add(const double *u, const double *w, double *v, const int len){...}

which leaves the sum of u and w in v . In C, the two definitions would conflict because they are both trying to bind a definition to the same name add . However, in C++, the compiler allows both the definitions because they can be disambiguated using the number and types of the arguments. The compiler internally generates mangled names to keep the definitions separate in the object file, so as not to confuse the linker.

Function call inlining

The member functions of the class Vector are defined in the header file Vector.hh within the scope of the class definition. If we wanted to define the overloaded function call operator elsewhere, we can give its definition as

double& Vector::operator()(long int i){ assert(i < size); return(data[i]); }

In fact, it is better to define the overloaded function call operator within the scope of the class as we did. Every source that uses the Vector class will include the header Vector.hh . Thus, every compilation unit will see the complete definition of the function call operator. If we say v(i) = 7.0 , with v being a Vector object, the compiler does not actually generate a function call for v(i) . Instead it scans the definition of the overloaded function call operator and splices in the body of the function at the point of call. The overhead of making a function call is eliminated. This is called function call inlining.

The function call overhead in modern processors is quite small and is typically less than 10 cycles. Much of the time it is nothing to worry about. However, if a member function compiles to only a few machine instructions and is called frequently, one must ensure that the member function is inlined.

In C++, one can explicitly ask the compiler to inline functions that are not member functions. For instance, if a function with the declaration void add(Vector u, Vector v) is instead declared inline void add(Vector u, Vector v) and defined within the header file, the compiler will try to and almost certainly succeed in inlining that function call. The compiler may refuse or fail to inline functions that are long and complicated without warning.

If several source files are compiled and linked simultaneously with interprocedural optimization turned on ( -ipo option in icc and icpc ), the compiler will again try to inline certain function calls. Normally, we will avoid interprocedural optimization, preferring to build object files for each source file or compilation unit separately.

Vector class. A well-written C-style loop will lead to faster and more compact assembly code for reasons explained in a later chapter (see section Inlining is not quite the panacea it is sometimes believed to be. Suppose we need to replace each entry of an array by the cumulative sum of the preceding entries. We can write the loop in C using pointers and pointer arithmetic or we can use a class such as theclass. A well-written C-style loop will lead to faster and more compact assembly code for reasons explained in a later chapter (see section 3.2.6↓ ). Inlining occasionally leads to worse code as we illustrate later in this chapter.

Constructors and destructors

Constructors and destructors are central to the class mechanism in C++. The three constructors of the Vector class are defined below.

//empty constructor Vector(){ size = 0; data = NULL; owner = 0; } //only constructor to allocate space for data Vector(long int n){ size = n; data = (double *)malloc(sizeof(double)*n); owner = 1; } //*this becomes shadow of Vector v //(copy constructor) Vector(const Vector& v){ size = v.size; data = v.data; owner = 0; }

The empty constructor and the copy constructor do not allocate space. The constructor in the middle allocates space using malloc() , which is in cstdlib . These member functions are understood to be constructors because they have the same name as the class. Constructors are not allowed to return anything.

Constructors are called implicitly at the point of definition of class objects. It is essential to understand when constructors are called. Suppose we define a vector object as

Vector v;

the empty constructor is called to initialize the object v . If we define an array of vectors

Vector v[100];

the empty constructor is called for each object in the array. It is illegal to define an array of objects if the class is not equipped with an empty constructor.

If we define a vector object as

Vector v(27)

the constructor in the middle is called. The compiler notes that the object v is being built with the single argument 27 , which is a constant of type int . The constructor that calls malloc() takes long int as an argument. In C/C++, an int is automatically promoted to a long int if necessary.

The constructor uses malloc() to claim space for this Vector object. If we make the call

malloc(1000*1000)

the function returns a pointer to 106 bytes of memory. The pointer is of type voi