Brown University and Expository and Math 100 and Mathematics Brown University, Calculus, david lowry-duda, elementary, graphing, graphing software, introduction, intuitive, latex2wp, math, math 100, mathematics, sage, sagemath mixedmath 3:14 am

This is a post written for my fall 2013 Math 100 class but largely intended for anyone with knowledge of what a function is and a desire to know what calculus is all about. Calculus is made out to be the pinnacle of the high school math curriculum, and correspondingly is thought to be very hard. But the difficulty is bloated, blown out of proportion. In fact, the ideas behind calculus are approachable and even intuitive if thought about in the right way.

Many people managed to stumble across the page before I’d finished all the graphics. I’m sorry, but they’re all done now! I was having trouble interpreting how WordPress was going to handle my gif files – it turns out that they automagically resize them if you don’t make them of the correct size, which makes them not display. It took me a bit to realize this. I’d like to mention that this actually started as a 90 minute talk I had with my wife over coffee, so perhaps an alternate title would be “Learning calculus in 2 hours over a cup of coffee.”

So read on if you would like to understand what calculus is, or if you’re looking for a refresher of the concepts from a first semester in calculus (like for Math 100 students at Brown), or if you’re looking for a bird’s eye view of AP Calc AB subject material.

1. An intuitive and semicomplete introduction to calculus

We will think of a function as something that takes an input and gives out another number, which we’ll denote by . We know functions like , which means that if I give in a number then the function returns the number . So I put in , I get , i.e. . Primary and secondary school overly conditions students to think of functions in terms of a formula or equation. The important thing to remember is that a function is really just something that gives an output when given an input, and if the same input is given later then the function spits the same output out. As an aside, I should mention that the most common problem I’ve seen in my teaching and tutoring is a fundamental misunderstanding of functions and their graphs

For a function that takes in and spits out numbers, we can associate a graph. A graph is a two-dimensional representation of our function, where by convention the input is put on the horizontal axis and the output is put on the vertical axis. Each axis is numbered, and in this way we can identify any point in the graph by its coordinates, i.e. its horizontal and vertical position. A graph of a function includes a point if .

Thus each point on the graph is really of the form . A large portion of algebra I and II is devoted to being able to draw graphs for a variety of functions. And if you think about it, graphs contain a huge amount of information. Graphing involves drawing an upwards-facing parabola, which really represents an infinite number of points. That’s pretty intense, but it’s not what I want to focus on here.

1.1. Generalizing slope – introducing the derivative

You might recall the idea of the ‘slope’ of a line. A line has a constant ratio of how much the value changes for a specific change in , which we call the slope (people always seem to remember rise over run). In particular, if a line passes through the points and , then its slope will be the vertical change divided by the horizontal change , or .

So if the line is given by an equation , then the slope from two inputs and is . As an aside, for those that remember things like the ‘standard equation’ or ‘point-slope’ but who have never thought or been taught where these come from: the claim that lines are the curves of constant slope is saying that for any choice of on the line, we expect a constant, which I denote by for no particularly good reason other than the fact that some textbook author long ago did such a thing. Since we’re allowing ourselves to choose any , we might drop the subscripts – since they usually mean a constant – and rearrange our equation to give , which is what has been so unkindly drilled into students’ heads as the ‘point-slope form.’ This is why lines have a point-slope form, and a reason that it comes up so much is that it comes so naturally from the defining characteristic of a line, i.e. constant slope.

But one cannot speak of the ‘slope’ of a parabola.

Intuitively, we look at our parabola and see that the ‘slope,’ or an estimate of how much the function changes with a change in , seems to be changing depending on what values we choose. (This should make sense – if it didn’t change, and had constant slope, then it would be a line). The first major goal of calculus is to come up with an idea of a ‘slope’ for non-linear functions. I should add that we already know a sort of ‘instantaneous rate of change’ of a nonlinear function. When we’re in a car and we’re driving somewhere, we’re usually speeding up or slowing down, and our pace isn’t usually linear. Yet our speedometer still manages to say how fast we’re going, which is an immediate rate of change. So if we had a function that gave us our position at a time , then the slope would give us our velocity (change in position per change in time) at a moment. So without knowing it, we’re familiar with a generalized slope already. Now in our parabola, we don’t expect a constant slope, so we want to associate a ‘slope’ to each input . In other words, we want to be able to understand how rapidly the function is changing at each , analogous to how the slope of a line tells us that if we change our input by an amount then our output value will change by .

How does calculus do that? The idea is to get closer and closer approximations. Suppose we want to find the ‘slope’ of our parabola at the point . Let’s get an approximate answer. The slope of the line coming from inputs and is a (poor) approximation. In particular, since we’re working with , we have that and , so that the ‘approximate slope’ from and is . But looking at the graph,

we see that it feels like this slope is too large. So let’s get closer. Suppose we use inputs and . We get that the approximate slope is . If we were to graph it, this would also feel too large. So we can keep choosing smaller and smaller changes, like using and , or and , and so on. This next graphic contains these approximations, with chosen points getting closer and closer to .

Let’s look a little closer at the values we’re getting for our slopes when we use and as our inputs. We get

It looks like the approximate slopes are approaching . What if we plot the graph with a line of slope going through the point ?

It looks great! Let’s zoom in a whole lot.

That looks really close! In fact, what I’ve been allowing as the natural feeling slope, or local rate of change, is really the line tangent to the graph of our function at the point . In a calculus class, you’ll spend a bit of time making sense of what it means for the approximate slopes to ‘approach’ . This is called a ‘limit,’ and the details are not important to us right now. The important thing is that this let us get an idea of a ‘slope’ at a point on a parabola. It’s not really a slope, because a parabola isn’t a line. So we’ve given it a different name – we call this ‘the derivative.’ So the derivative of at is , i.e. right around we expect a rate of change of , so that we expect . If you think about it, we’re saying that we can approximate near the point by the line shown in the graph above: this line passes through and it’s slope is , what we’re calling the slope of at .

Let’s generalize. We were able to speak of the derivative at one point, but how about other points? The rest of this post is below the ‘more’ tag below.

What we did was look at a sequence of points getting closer and closer to , and finding the slopes of the corresponding lines. So we were calculating , , \dots. Seen in a slightly different way, we had a decreasing , starting with and we were looking at slopes between the inputs and . So we were calculating . We did this for when and when so find the derivative at . But this idea leads to the same process for any . Let’s reparse this formula.

finds the slope between the points and , which approximates the slope at the point . For small , this approximation should be even better. So to find the derivative (read the ‘slope’) at a value of , we insert that value of and try this for decreasing . And hopefully it will approach a number, like it approached above. If this does approach a number like above, then we say that is differentiable at , and we call the resulting slope the derivative, which we denote by . So above, when , the derivative at is , or .

In general, we have

Definition 1 The derivative of a function at the point , which is an analogy of slope for nonlinear functions, is given by

as gets smaller and smaller (if it exists). If this does not tend to some number as gets smaller and smaller, when does not have a derivative at . If has a derivative at , then is said to be ‘differentiable’ at .

We haven’t really talked about cases when a function doesn’t have a derivative, but not every function does. Functions with discontinuities or jumps, or that aren’t defined everywhere, etc. don’t have good ideas of local slopes. So sometimes functions have derivatives, sometimes they don’t, and sometimes they have derivatives at some points and not at others.

Another thing we haven’t yet addressed is the notation. Derivatives also have a function-like notation, and that’s because they are a function. Above, for , we had that , i.e. the derivative of at is . It turns out that has a derivative everywhere (i.e. is called ‘differentiable’), and the derivative is given by the function . This is an amazingly compact presentation of information. So the ‘slope’ of at a point is given by , for any point . Whoa.

But the important thing to accept now is that we have a way of talking about the ‘slope,’ or rather the local rate of change, of reasonable-behaving nonlinear functions, and the key is given by definition 1 above.

1.2. What can we do with a derivative?

A good question now is so what? Why do we care about derivatives? We mentioned that the derivative gives a good linear approximation to functions (like why our line is so close to the graph of the parabola above). This linear approximation is very useful and important in its own right. We’ve also mentioned that derivatives give you a way to talk about rates of change, which is also very important in its own right. But I’ll mention 3 more things now, and a few in the Section 1, when we talk about ‘undoing derivatives.’

First and most commonly talked about is optimization. Sometimes you want to make something as big, as small, or as cheap as possible. These problems, when you’re trying to maximize or minimize something, are called optimization problems. Derivatives provide a method of solving optimization problems (in many ways, the best method). This relies on the key observation that if you have a differentiable (i.e. has a derivative) function that takes a maximum value when , so that for near , then the ‘slope’ of at will be . In other words, . Why? Well, this slope describes the slope of the line tangent to at , and if it’s not flat then the function is going up in one direction or the other – and so isn’t a max afterall. An example is shown in the image below:

Similarly, if takes a minimum when , then .

So to maximize or minimize a function, we can calculate its derivative (well, we can’t because we’re not focusing on the calculations right now, but it’s possible and you learn this in a calculus course) and try to find its zeroes. This is really used all the time, and is simple enough to be done automatically. So when companies try to maximize profits, or land use is optimized, or power consumption minimized, etc., there’s probably calculus afoot.

Second (although a bit complicated – if you don’t understand, don’t worry), derivatives give good ways to find zeroes of functions through something called ‘Newton’s Method.’ The idea is that derivatives give linear approximations to our function, and it’s easy to see when a line has a zero. So you try to find a point near a zero, approximate the function with a line using a derivative, and find the zero of the line. This will an approximate zero. Repeating this process (approximate the function with a line, find the zero, plug this zero into the function and approximate with a line again) can very quickly yield zeroes. So conceivably you’ll be optimizing a function, and thus will find its derivative and want to find its zeroes. So you then use derivatives to find the zeroes of the derivative of the original function\dots in other words, derivatives everywhere.

Third and perhaps most importantly (because this ends up yielding the key to the sections below, because it’s not at all obvious how important this is) is a deceptively simple statement. In the first reason above, we talked about how the derivative of a differentiable function at a max or a min is zero. This leads to Rolle’s Theorem,

which says that if is differentiable and for some and , then there is a between and such that . The reasoning behind Rolle’s Theorem is very simple: if and f is not constant, then there is a max or a min between and . At this max or min, the derivative is . And if is constant, then it is a line of slope zero, and thus has derivative .

Using slightly more refined thinking (which amounts to ‘rotating a graph’ to be level so that we can appeal to Rolle’s Theorem), we can get a similar theorem called the Mean Value Theorem:

Theorem 2 (Mean Value Theorem) Suppose is a differentiable function and . Then there is a point between and such that

In other words, there is a point between and whose derivative, or immediate slope, is the same as the average slope of from to .

Let’s use this for a moment to really prove something that sort of know to be true, but that now we can really justify. Let’s say that you travel 30 miles in 30 minutes. If we think of your position as a function of time, then we might think of (so at time , you have gone zero distance) and (so 30 minutes in, you’ve gone 30 miles). Your average speed was miles per minutes, or miles per hour. By the mean value theorem, there was at least one time when you were going exactly miles per hour. If cops were to use this to measure speed, like have strips and/or cameras that record your positions at different times, then they could issue speeding tickets without ever actually measuring your speed. That’s sort of cool.

There are many more reasons why derivatives are awesome – but this is what a calculus course is for.

1.3. Undoing derivatives – introducing the integral

So we’re done talking about derivatives (mostly). Two big questions motivate the ‘other half’ of calculus: if I give you the derivative of a function, can you ‘undifferentiate’ it and find the original function ? (This is the intrinsic motivation, how you might motivate it yourself from learning derivatives for the first time). But there is also an extrinsic motivation, sort of in the same way that derivatives arise from wanting to talk about slopes of nonlinear functions. This extrinsic motivation is: how do you calculate the area under a function ? It’s not at all obvious that these are related (real math is full of these surprising and deep connections).

We will proceed with the second one: how do we calculate the area under a function ? For this section, we’ll start with the function .

We actually know how to calculate the area under a triangle. Suppose we want to calculate the area between the horizontal axis and , starting at and going as far right as . So we have a triangle of width and height , so the area is . Think back to earlier: we found the derivative of to be , and now the area from to under is . They almost undid each other! This is an incredible relationship. Let’s look deeper.

Let’s think about a generic well-behaved function , like the picture below.

We’re going to create a function called , which takes in a nice always-positive function and spits out the area between the function and the horizontal axis from to . So this is a function in a variable, – but it’s formulated a bit differently (it’s right around here where some people may need to adjust how they think of functions). In terms of the function from the graph above, the area represented by is the area of the shaded region in the picture below.

Let’s do something a bit interesting: let’s try to take the derivative of at . Recall that a derivative of a function is gotten from looking at as . So we want to try to make sense of

Well, this is finding the area from to and then taking out the area from to . If you think about it, we’re just left with the area from to , or . Pictorally, this picture shows the area from to in blue, and the area from to a bit more than in red (so that the overlap is purple). So in the picture, we are treating as , so that we’re taking the derivative at .

Once we remove the image in blue from the image in red (so we take away everything in purple), we are left with a strip from to a bit more than , as shown here.

It’s time to appeal to a bit of intuition (or the intermediate value theorem ). The area from to under is the same as the area of a rectangle of length and height within the range of on . For example, the area from the shape above (zoomed in a bit here)

is the same as the area in this rectangle.

Now as is getting smaller, the height of the rectangle must get closer and closer to , i.e. the value of at the point . In fact, as gets smaller, the area from to under gets closer and closer to (which is the same as the width of the rectangle times the approximate height), so . This lets us evaluate our derivative:

so that taking the derivative of this area function gives back the original function . This is known as the First Fundamental Theorem of Calculus.

Theorem 3 (Fundamental Theorem of Calculus I) The derivative of the area under a function from up to , which we write as , is precisely , the value of the function at .

Aside: remember, this is just an intuitive introduction. There are annoying requirements and assumptions on the functions we’re using and ways to make the style arguments rigorous, but I sweep these under the rug for now.

There is a big caveat to what we’ve just said, and it has to do with this ‘Area function.’ When does it make sense to talk about the area under a function? For example, what if we have the following function:

Does it have an area function? What about a worse function, with points all on their own? What do we mean by area? We know how to find areas of polygons and straight-sided shapes. What about non-straight-sided shapes? Just like how we developed derivatives to talk about slopes of nonlinear functions, we will now develop a method to calculate areas of non-straight-sided functions. And just like with derivatives, we’re going to do this with approximations.

We love being asked to find areas of rectangles, because it’s so easy. So given a function and a region on the horizontal axis, say from to , we can approximate the area by a rectangle. Well, how do we choose how tall to make the rectangle? Let’s compare two alternatives: using the minimum value of on (using a minimum-rectangle approximation), and the maximum value of on (using a maximum-rectangle approximation). Let’s return to our generic function from above. In blue is the maximum-rectangle approximation, in red in the minimum-rectangle approximation (purple is overlap).

But this is clearly a poor approximation. How can we make it better? What if we used two rectangles? Or three? Or ten? Maybe a hundred?

In the animation above, note that as the number of rectangles increases, the approximation becomes better and better, and our two alternative area methods are getting closer and closer. If, as the number of rectangles gets huge, the area given by the minimum-rectangles tends to the same number as the area given by the maximum-rectangles, then we say that the area under from to is that number that the approximations tend to (so this agrees with our intuition in the picture). This is a clear parallel to how we thought of derivatives.

Definition 4 We call the area under a function from to the number that arises as the number that both the minimum-rectangles approximation and maximum-rectangles approximation tend to as the number of rectangles increases, if there is such a number. If there is such a number, we call integrable on , and we represent this area by the symbol

where I used to emphasize that the area is not a function of , but is just the area under a fixed region from to .

So let’s set up the parallel: if we can find the slope of a function at a point (which we call the derivative), we call the function differentiable there; if we can find the area under a function on a region (which we call the integral), we call the function integrable there.

With new notation, we can phrase the first Fundamental Theorem of Calculus as follows: if is integrable, then the derivative of is . Said another way, we can find functions that can be differentiated to give . For this reason, integrals are sometimes called anti-derivatives. There is a deeper connection here too.

Suppose is a function whose derivative is another function . So . We’ve seen this relationship so far: the function has derivative . Let’s return to the task of finding the area under , this time from to . The following is a bit slick and the most non-obvious part of this post (in my opinion/memory).

Start with . Let’s carve up the segment into many little pieces of width , i.e. . Then by adding and subtracting evaluated at these points, we see that

Let’s just look at the first set of parentheses for a moment: . By the Mean Value Theorem (equation 2), we know that

for some between and , and recalling that . Rearranging this, we get that

Repeating, we see that

Here’s the magic. This sum has an interpretation. Since is between and , and we’re multiplying by , that could be the same calculation we would do if we were approximating the area under from to with rectangles: each rectangle has width , so that’s why we multiply by . Then is a reasonable height of the rectangle. So the sum of times values on the right is an approximation of the area under from to .

As we use more and more rectangles, it becomes the area, so we get that the area under from to is exactly .

Stated more generally (as there was nothing special about here):

Theorem 5 (Fundamental Theorem of Calculus II) If is a differentiable function with , then



So, for example, to find the area under from to , we can compute where , since is the function whose area we want to understand. This gives . Although is not a hard function to compute areas for, this works for many many functions. In fact, integrals are the best tool we have to compute areas when available.

One way of thinking about these big theorems is that the first fundamental theorem says that antiderivatives exist, and the second says that you can use any antiderivative to calculate the area under a function (I haven’t mentioned this, but antiderivatives are not unique! also has derivative , and you can check that it gives the same area under from to ). So a large part of calculus is learning how to find antiderivatives for functions you want to study/integrate. What makes this so challenging is that there is no good, general method of finding antiderivatives – so you have to learn a lot of patterns and do a lot of computations. (We don’t do any of that here)

This concludes the theoretical development of calculus in AP Calculus AB, and Math 90 at Brown for that matter. But I’d like to mention one under-emphasized fact about the material we’ve discussed here – this will be the final section.

1.4. Why do we care about integrals, other than to calculate area?

Being able to compute areas is cool and useful in its own right, but I think it’s also way over-emphasized. Integrals and derivatives, the two fundamental tools of calculus, allow an entirely different method of thinking about and solving problems. Let’s look at two examples.

\subsubsection{Population growth}

Let’s make a model for population growth from first principles. The great strength of calculus is that we can base our calculations only on assumptions of related rates of change. For instance, suppose that is the population of bacteria in a petri dish at time . We might guess that if there is twice as much bacteria, then there will be twice as much growth (since there will be twice as much bacteria splitting and doing bacteria-reproductive things). Stated in terms of derivatives, we think that the rate of change in bacteria population is proportional to the size of the population, i.e. for some constant .

Calculus allows one to ‘undo the derivative’ on using integration (and a few things that are not in the scope of this survey), and in the process actually explicitly gives that all possibilities for are , where is , the base of the exponential. To reiterate – calculus allows us to show that the only functions whose size at is proportional to its slope at are functions of the form . Then if we measured a bacteria population at two times, we could solve for and , and have an explicit model. It also turns out that this model is really good for small bacteria sizes (before limiting factors like food, etc. become an issue). But it’s possible to develop more sophisticated models too, and these are not hard to create and experiment with.

\subsubsection{Laws of motion}

Galileo famously showed (reputedly by dropping things off the Leaning Tower of Pisa) that acceleration due to gravity is a constant and is independent of the mass of the object being dropped. Well, acceleration is the rate of change of velocity. If we call the velocity at time and the acceleration at time , then we suspect that is a constant. Since acceleration is the rate of change of velocity, we can say that , or that . Integration ‘undoes’ derivatives, and it turns out the antiderivatives of are functions of the form for some constant . So here, we suspect that for some constant .

Well, what is that constant? If we dropped the object at rest, then its initial velocity was . So at time , we expect . This means that (if it didn’t start at rest, then we get a different story). Thus . More generally, if it had initial velocity , then we expect that .

We can do more. Velocity is change in position per time. If is the position at time , then . It turns out that the antiderivatives of are , where is some constant.

In short, we are able to derive formulae and equations that govern the laws of motion starting with simple, testable observations. What I’m trying to emphasize is that calculus is an essential tool for model-making, experimentation, and predictive/reactive analysis. And these few examples barely provide a hint of what calculus can do. It’s an interesting, powerful, expansive world.

2. Concluding remarks

I hope you made it this far. If you have any comments, questions, concerns, tips, or whatnot then feel free to leave a comment below. For additional reading, I would advise you to use only google and free materials, as everything is available for free (and I mean legally and freely available). The last section, section 1, actually details first examples of a class on Ordinary Differential Equations, which describe those equations that arise from relating the values of a function with values of its rate of change (or rates of rate of change, etc.).

This document was written with a slightly modified latex2wp, so I have pdfs available. The pdfs do not include the gifs, and the graphics are a bit too big to be natural. If you’re nice, I also have the TeX available.

The graphics were all produced using the free mathematical software SAGE. I highly encourage people to check SAGE out.

And to my students – I look forward to seeing you in class. Our first class is this coming Thursday.