An unformed idea to teach understanding to the chain rule

I’m soon going to embark on teaching the chain rule in calculus. I have found ways to help kids remember the chain rule (“the outer function is the mama, the inner function is the baby… when you take the derivative, you derive the mama and leave the baby inside, and then you multiply by the derivative of baby”),  ways to write things down so their information stays organized, and I have shown them enough patterns to let them see it’s true. But I have never yet found a way to conceptually get them to understand it without confusing them. (The gear thing doesn’t help me get it… Although I understand the analogy, it feels divorced from the actual functions themselves… and these functions have a constant rate of change.)

I think I now have a way that might help students to get conceptually understand what’s going on. I only had the insight 10 minutes ago so I’m going to use this blogpost to see if I can’t get the ideas straight in my head… The point of this post is not to share a way I’ve made the chain rule understandable. It’s for me to work through some unformed ideas. I am not yet sure if I have a way to turn this into something that my kids will understand.

So here’s where I’m starting from. Every “nice” function (and those are the functions we’re dealing with) is basically like an infinite number of little line segments connected together. Thus, when we take a derivative, we’re pretty much just asking “what’s the slope of the little line segment at x=3?” for example.

Now here’s the magic. In my class, we’ve learned that whatever transformations a function undergoes, the tangent line undergoes the same transformations! If you want to see that, you can check it out here.

For a quick example, let’s look at f(x)=\sin{x} and g(x)=2\sin{(5x)}+1.

We see that g(x) is secretly f(x) which has undergone a vertical stretch of 2, a horizontal shrink of 1/5, and has been moved up 1.

Let’s look at the tangent line to f(x) at x=\pi/3. It is approximately y=0.5x+0.34.


Now let’s put that tangent line through the transformations:


Vertical Stretch of 2: y=2(0.5x+0.34)=x+0.68

Horizontal shrink of 1/5: y=5x+0.68

Shift up 1: y=5x+1.68

Now let’s plot g(x) and our transmogrified tangent line:


Yay! It worked! (But of course we knew that would happen.)

The whole point of this is to show that tangent lines undergo the same transformations as the functions — because the functions themselves are pretty much just a bunch of these infinitely tiny tangent line segments all connected together! So it would actually be weird if the tangent lines didn’t behave like the functions.

My Thought For Using This for The Chain Rule

So why not look at function composition in the same way?

We can look at a composition of functions at a point as simply a composition of these little line segments. 

Let’s see if I can’t clear this up by making it concrete with an example.

Let’s look at m(x)=\sqrt{x^3+1}.

And so we can be super concrete, let’s try to find m'(2), which is simply the slope of the tangent line of m(x) at x=2.

I’m going to argue that just as \sqrt{x} and x^3+1 are composed to get our final function, we can compose the tangent lines to these two functions to get the final tangent line at x=2.

Let’s start with the x^3+1. At x=2, the tangent line is y_{inner}=12x-15 (I’m not showing the work, but you can trust me that it’s true, or work it out yourself.)

Now let’s start with the square root function. We have to be thoughtful about this. We are dealing with m(2) which really means that we’re taking the square root of 9. We we want the tangent line to \sqrt{x} at x=9. That turns out to be (again, trust me?): y_{outer}=\frac{1}{6}x+\frac{3}{2}.

So now we have our two line segments.

We have to compose them.


This simplifies to:


Let’s look at a graph of m(x) and our tangent line:



Where did we ultimately get the slope of 2 from? When we composed to two lines together, we multiplied the slope of the inner function (12) by the slope of the outer function (1/6). And that became our new line’s slope.

Chain rule!

How we generalize this to the chain rule

For any composition of functions, we are going to have an inner and an outer function. Let’s write c(x)=o(i(x)) where we can clearly remember which one is the inner and which one is the outer functions. Let’s pick a point x_0 where we want to find the derivative.

We are going to have to find the little line segment of the inner function and compose that with the little line segment of the outer function, both at x_0. That will approximate the function c(x) at x_0.

The line segment of the inner function is going to be y_{inner}=i'(x_0)x+blah1

The line segment of the outer function is going to be y_{outer}=o'(i(x_0))x+blah2

I am going to keep those terms blah1 and blah2 only because we won’t really need them. Let’s remember we only want the derivative (the slope of the tangent line), not the tangent line itself. So our task becomes easier.

Let’s compose them: y_{composed}=o'(i(x_0))[i'(x_0)x+blah1]+blah2

This simplifies to y_{composed}=o'(i(x_0))i'(x_0)x+blah3

And since we only want the slope of this line (the derivative is the slope of the tangent line, remember), we have:


Of course we chose an arbitrary point x_0 to take the derivative at. So we really have:


Which is the chain rule.



  1. If your students are comfortable with the idea of transforming a function transforming the tangent line, then it would seem like you could follow that quite smoothly by composition of functions of tangent lines.

    Although, I will say that I did have to read through this twice before I made real sense of it and the example was essential.

  2. Feedback from Justin L. I want to save: “Brief feedback about your post: it’s a good idea. It would be even better if you could dispense with the y-intercept bits. Not altogether, but as soon as possible. They just add clutter. You make a move in this direction with the “blah”s, but you can probably do so earlier, and more completely. But you’ve laid out the transformation idea really well!”

  3. This is getting close to the way I understand the chain rule best. Of course, the algebra will work out the same, but the mental model for “functions and their derivatives” I use here is a bit different. I tried sharing this with the last Calc 1 class I taught. I don’t think that it worked for those students, but it was a challenging bunch. I am convinced this _can_ help. So, maybe you want to try it.

    Start with a mental model and picture for functions:
    Instead of using graphs to visualize functions, think of a diagram with a separate line for each of the domain and the range of your function. Then a function is a way to associate points in the domain copy of the line with points in the range copy of your line. As a schematic, I draw two line segments, both horizontally, with an arrow curving from the one on the left (domain) up-over-and-down to the one on the right (range).

    Now, figure out how to interpret the derivative in this picture. On a really small scale, a differentiable function looks like a multiplication operator (stretch or squeeze)
    f'(a) = “infinitesimal stretch factor modeling behavior of f from points near a to points near f(a)” Note that this is describing how you stretch or shrink one piece of a line to make it match up with the other line.

    Finally, think about what the required model for composition should look like. When I draw this picture with three line segments and the arrows for f, g and “g compose f”, the chain rule feels like the only reasonable guess. Maybe it isn’t quite “obvious.”

    Anyway, this is how I understand the chain rule.

  4. Interesting as usual. I really like the approach suggested by TJ and your diagrams in the linked post. The inner function stretches or shrinks (and translates) a small interval in the domain to another small interval in the range. The outer function then further stretches or shrinks (and translates) this interval. The combined stretching/shrinking is the product of the two factors (slopes).

    This geogebra applet shows graphs for the inner, outer, composition, and relevant small intervals all at once.

    Using transformations of tangent lines is interesting, but seems less concrete and less intuitive to me. Focusing on intervals seems to be at the heart of the idea.

  5. Hi Sam, I teach math at Santa Barbara City College in CA. While I’ve always loved my job, I’ve had my passion re-ignited after discovering Robert Talbert’s blog, reading his entire archives, and now working through your entire archives (I’m in mid-2010). (Next is Dan Meyer.) Enthusiasm and ideas like yours are so infectious; they make me want to be a better teacher.
    I’m just posting to introduce myself (and to request your acceptance of me as a Twitter follower :)).
    I don’t have anything in particular to say about the content of this post other than that I sure wish I had time for stuff like that in class :/
    Sometimes I’m quite jealous of you high school teachers that:
    a) Have more (relative) class time than we do
    b) Are teaching things to students seeing them for the first time, thus not ruining the “discovery” aspect of cool new ideas for everyone else.
    Between students that have already seen the content in high school and those taking our course a second time after a failure, more than half our students in any given class are seeing the material for the second or third time, at least at the calc I level and below.

  6. Hi Jared,

    Thanks for the comment, and the compliment. WOW! I’m flattered that you are actually reading my archives. You get to see how I’ve progressed as a teacher since I started over 6 years ago :)

    As for being jealous of us HS teachers, YAY! I wouldn’t know how I would have to change what I do if some of my kids had seen the majority of the material before. As long as you don’t blame us for the students you get… How many articles I’ve read of college math professors blaming high school teaching?

    Thanks for reaching out! I’ve been sick all weekend so this was so sweet to read!

    1. Hi Jared. If you have the freedom to change up what you do in calculus, there is so much fun you can have! I’ve used lots of activities from Matt Boelkins’ text, Active Calculus (free online). I skip around in the textbook, to put limits later in the course, so we can get to the meaty ideas sooner. The other great blog for calculus ideas is by Bowman Dickson. My blog doesn’t have as many exciting calculus ideas, but I do talk about teaching it at college level.

  7. I like this idea, but I do worry some kids will get stuck on “Why did you substitute a y (y=12x-15) in for x (in y=(1/6)x+3/2)?” Perhaps better to call one function x(t) and the other y(x)?

    Conceptually, this distinction works great with y=sin(x), where x=2t. That’s where I start. “Look guys, we can all see that dy/dx is the same as before. But what’s happened with dx/dt, hmm? Now x is changing twice as fast as before. So that must mean that dy/dt is twice as steep as before.”

    Next, we move on to y=sin(x), where x=t^2. And then we do a bunch of examples where they have a graph of some piecewise functions y(x) and x(t) and have to calculate dy/dt.

    But the final challenge is seeing if they can look at a composed function and identify all the t’s where dy/dt=0 — and identify why there are two different types of answer (one where dy/dx=0, and the far more interesting one, where dx/dt=0). Here’s the GeoGebra I have them play around with as an example:

    Thank you for all the interesting posts!

  8. Hello Sam
    Doing this without limits is an excellent idea, but have you approached the derivative as a rate, ie how much does y change when x changes a bit?
    Using D instead of big or small delta, as this editor appears to be a version of notepad we can write
    1 for y = f(g(x)) put y = f(t) and t = g(x)
    2 change in y for a change in t is Dy = f ‘(t) * Dt
    3 change in t for a change in x is Dt = g ‘(x) * Dx , hence
    4 change in y for a change in x is Dy = f ‘(t) * g ‘(x) * Dx, where t = g(x)
    This may look a bit symbol burdened, but a diagram should see to that.

    Logically the derivative is not the slope of the tangent to the curve, but the slope of the tangent to the curve can be measured by the derivative.
    Howard Phillips

  9. Hello again
    You said you were going to try your approach on the product rule. I don’t think it will work. try the small change in x thing on y = f(x) * g(x) by looking at a rectangle whose sides are of length f(x) and g(x). The function y then describes the area of the rectangle.
    Then a small change in x, Dx as above, will increase the f side by f ‘(x) * Dx, giving a change in area of f ‘(x) * Dx * g(x) (a picture is worth a thousand words, and in this case probably worth about 200 symbols), and it will also change the g side by
    g ‘(x) * Dx, with a change of area of g ‘(x) * Dx * f(x).
    Add them up to get Dy = f ‘(x) * Dx * g(x) + g ‘(x) * Dx * f(x), which gives the standard formula Dy = (f ‘(x) * g(x) + g ‘(x) * f(x)) * Dx

  10. I like showing how the outer function gets scrunched up or flattened out when it takes a fast or slow changing inner function as its argument. That’s how I really get why the chain rule says what it does. I made a video about it:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s