Why is the gradient related to the normal vector to a surface?

Today in Multivariable Calculus I was supposed to teach my students how to find the plane tangent to a surface at a point.


The book, however, was not clear how to do this. They had an equation involving the gradient of a function, but the equation was derived via local linear approximations. Fine and dandy, but I didn’t like it. I didn’t “see” it or grasp what was going on.

What’s clear is that to find the equation for the plane — for any plane — we need a point and a vector pointed in the direction normal to the plane. We are given the point, but we need to find the direction normal to the plane. That’s the same as the direction normal to the surface!


So I set my class up with the task of doing this on their own. They’re still working on it.

But honestly, I’m not quite there yet. I don’t want to just give them the equation and method on how to apply it, but I don’t think I can explain it in any good way. I’m almost there, at a conceptual tipping point, but I need one last shove over the edge. Anyone out there ready to help?

First of all, I decided that working with surfaces is silly and I’d reduce the problem to curves. So let’s start simple.

Let’s say we have the graph of y=x^2 and we want to find vectors normal to the curve at (0,0) and (1,1) (the blue and green dots).


Well, traditionally, we’d be crazy and parametrize the parabola by creating the vector-valued function \vec{r(t)}=<t,t^2> and then calculate the unit tangent vector (\vec{T}(t)=\frac{\vec{r}'(t)}{|\vec{r}'(t)|}) and then from that calculate the unit normal vector (\vec{N}(t)=\frac{\vec{T}'(t)}{|\vec{T}'(t)|}). [1] Then we’d calculate \vec{N}(0) and \vec{N}(1) to find the vectors.

But trust me, this is an awful amount of work, and \vec{N} is not a pretty function. We had to parametrize, take derivatives, and plug in values. And if you remember, we started out with such a simple equation y=x^2. Why can’t it be easier?

And it can. And this is where I need your help.

Instead of considering the plain old boring function y=x^2, we turn this into a surface by introducing a z direction: F(x,y)=y-x^2.

The function F(x,y) is a surface. We’re only interested in one slice of the surface, when F(x,y)=0 (when the height is 0). This will then reduce to our original equation y=x^2. The set of level curves of the surface is below. Note that the level curve that goes through the origin is the level curve we’re interested in.


Remember that one important (perhaps the most important) property of the gradient is that the gradient of a function points in the direction of maximum of steepness on a graph of level curves.

Let’s look at the points we’re interested in!


Just looking at the graph shows we’re onto something. Look at the blue dot. Which direction is the steepest, if you were standing at the blue dot and wanted to walk in the steepest direction? Well, clearly it would be directly north. (You want to walk the shortest distance to get to the next level curve. Since the change in heights between level curves is constant, you want to minimize the distance you’ve walked to get to the next height to have the steepest slope.) What about the green dot? Clearly, northwest.

And actually calculating the gradient of F(x,y) gives us \nabla F(x,y)=<-2x,1>.

At the blue dot, we get \nabla F(0,0)=<0,1>, which is a vector pointing straight up.
At the green dot, we get \nabla F(1,1)=<-2,1> which is a vector pointing northwest.

I’m plotting them below.


And without all the pesky level curves to distract us.


Clearly this method works. We take the original function y=x^2 and bring it into a higher dimension (F(x,y)=y-x^2). We use the fact that the gradient gives us the direction which is “steepest” on this surface, if we were trapped at a particular point. (In this case, (0,0) or (1,1). Notice these points lie on the level curve we care about, the level curve which actually is the equation we were initially concerned about (y=x^2). Then we recognize — somehow — that the gradient of the higher dimension equation somehow gives us the normal vector of the original equation we were concerned with.

The questions I have after doing this:

(1) Why did we have to change our nice curve y=x^2 into a surface F(x,y)=y-x^2 to solve this problem? And why this surface?

(2) How can we understand that the vector normal to the curve somehow is “magically” the gradient of the surface we created — one of whose level curves is the curve we’re interested in.

(3) Extending this analysis to problems where we want to find the normal vector to a surface like an ellipsoid (like 9x^2+4y^2+z^2=49) at a particular point, we’re going to be using the function F(x,y,z)=9x^2+4y^2+z^2-49 — whose level curves will be surfaces, stacked one on top of another. To find the normal vector, we take the point on the “level surface” which describes our ellispoid, and find the quickest way to get to the next “level surface”? Is that right? I think that seems right. Strange, but right.

(For a picture of some level surfaces, check it out here.)

Anyway, this is just my musings, my way of thinking through this. I’m not quite there. Any help you can give, great. If not, that’s cool too.

[1] I guess to make things simpler, we could simply calculate the direction of the normal vector and not worry about making it a unit normal vector, so we could simply calculate $\vec{T}'(t)$ only. We’re not concerned about the magnitude of the normal vector, only the fact that it’s normal.



  1. I am several decades removed from studying calculus :-) so I may be missing the point completely, but why is it any more complicated than this:
    1. Slope of curve y = x**2 is found by differentiating : dy/dx = 2x
    2. Slope of the vector normal to the curve is the negative reciprocal of this : -1/(2x).
    3. Can’t compute -1/(2x) for x=0 of course, but x=1 gives -1/2, x=2 gives -1/4 etc. which seems correct.

  2. @Will: hahaha, you’re TOTALLY right about using this method to help us with our 2D function, instead of having to go to gradients and all that other nonsense. (It’s hilarious, because when I reduced the problem in class, everyone went straight to MV Calc tools to solve it. When we all learned how to do this type of work in pre-calc.)

    But the reason I did it with gradients is because for 3D is because I don’t see a way to make your method work for surfaces. (Find the vector that points normal to a surface.)


  3. I picked this up in my reader, and came here to leave the exact same comment as Will (starting with, with no little embarrassment, “decades removed…”)

    But I see your answer. And I think, didn’t I reduce this problem twice, once to f(x) in one plane (|| to axis) containing the point, and then to f(y) in a plane containing the point, and end up with x and y components of something analogous to slope that would let us write the equation of the plane?

    Or am I miles off?


  4. @Sam,

    Ah, yes, I see the point now. I had made the erroneous assumption that there existed a technique to do ‘differentiation in 3D’ to get the gradient of the plane – without knowing whether such a technique actually existed!

    I guess that’s the difference between a mathematician and and engineer (me): a mathematician would demand to see the proof that ‘differentiation in 3D’ existed before using it, but an engineer would just try extrapolating from 2D and then try to verify empirically the results were correct (or at least close enough for the problem at hand). Which would not work in this case…

    Come to think of it (now I’m really beyond what I recall from calculus) isn’t there a notion of partial derivatives where you can get the rate of change of z with respect to x AND y? Wouldn’t that be the ‘gradient’ of the plane that is tangential to our surface? Or can partial derivatives only be done ‘one axis at a time’?

  5. @Will and Jonathan,

    In fact, you’re both heading in the same direction as the book. You do need to take partial derivatives. To refresh, f_x(x,y) is the slope of the function if you hold y constant and vary x. Similarly, f_y(x,y) is the slope of the function if you hold x constant and vary y.

    The tangent plane to a surface f(x,y) at a point (x_o,y_0) is z=f(x_o,y_o)+f_x(x_o,y_o)(x-x_0)+f_y(x_o,y_o)(y-y_0).

    However, when I read this, it didn’t make clear what was going on intuitively/concretely. In some sense, the first term of the three terms makes sure that that point (x_0,y_0) is on the plane, and the second and third terms describes what’s happening locally when you move slightly to the right or left. This sounds like what you are talking about Jonathan.

    But I don’t think that explanation is totally correct — or makes great sense to me. And then the book goes on to say that another way to write the tangent plane is F_x(x_0,y_0,z_0)(x-x_0)+F_y(x_0,y_0,z_0)(y-y_0)+F_z(x_0,y_0,z_0)(z-z_0)=0, for any function F(x,y,z). Well, that was a real challenge for me — where the heck this comes from.

    Which led to this post.

  6. Yeah, I don’t work with this much, and my writing is not so good. But that looks like the 3 space analog of the point-slope form of the equation of a line.


  7. I am kind of late to this discussion, but here goes…

    The rigorous proof that the gradient produces a vector normal to a given level surface is evidently pretty complex, since it is omitted from my vector analysis text. There is, however, an informal explanation.

    Imagine a function f(x,y,z) defined everywhere in a box. Pick a point (x0,y0,z0), and let’s say that f(x0,y0,z0)=C. Then it isn’t hard to believe that a smooth surface could exist where f(x,y,z)=C, which passes through (x0,y0,z0) and is continuous throughout the box.

    If you took a directional derivative while staying on the surface, it is 0, since f(x,y,z) is constant on the surface.

    Notice that the directional derivative is in a direction tangent to the surface, and its magnitude is zero. df/du = grad(f).u implies that this direction, tangent to the surface, is perpindicular to the direction of greatest change. This means that the direction of greatest change (aka the gradient vector) is normal to the level surface.

  8. Sorry for the terrible notation:

    Let F(x, y, z) =c define a family of (level) surfaces.
    F(x1 , y1 ,z1 )=c1 and F(x2 , y2 ,z2 )=c2 are two adjacent surfaces where x2 = x1 + Δx, y2 = y1 + Δy, and
    z2 = z1 + Δz.
    Then in a small neighborhood ΔF = F(x2 , y2 ,z2 ) – F(x1 , y1 ,z1 )
    = F(x1 , y1 ,z1)+(dx F) Δx + (dyF) Δy +(dz F) Δz – F(x1 , y1 ,z1 )
    = grad(F) • Δr = c2 – c1
    If we restrict to one surface, so that the vector Δr lies within that surface (technically tangent to it) and c2 – c1= 0, then the vector grad(F) must be perpendicular to the surface.

  9. There is a theorem that states that given any point (x0,y0) [x0 is x sub zero] and the level curve of f through this point (i.e. the level curve of f at value f(x0,y0), then the gradient of f at f(x0,y0) is perpendicular to the tangent direction along the level curve of f through (x0,y0).

    Proof: Suppose that (a,b) is any vector that is tangent to the level curve of f through (x0,y0). So the derivative of f along the vector (a,b) is zero [some call this the directional derivative, but the directional derivative is along a unit vector]. But this means that dot(gradf(x0,y0), (a,b))=0 since dot(gradf(x0,y0), (a,b)) is the derivative of f along the vector (a,b). But this means that gradf(x0,y0) is perpendicular to the vector (a,b), which is what we wanted to prove.

    This generalizes when you are talking about level surfaces and the gradient there is perpendicular to the surface instead of the level curve.

    1. I think this proof is wrong. Take any directional derivative (a,b) that is not tangent to a surface, and admit that (a,b) is different from the null vector. Take the directional derivative (c,d) tangent to a curve. (c,d) equals the null vector. Thus, dot[(a,b), (c,d)] = 0, since (c,d) is the null vector. But this implies that any vector, be it the grad or not, is perp to (a,b). The flaw is that the dot product implies that the vectors are perpendicular if both vectors are different from the null vector. As in the proof it was assumed that one of the two vectors is the null vector, we can’t conclude this.

  10. Hi,

    love your visually intuitive explanation, very helpful. I look on maths as based on the physical reality, and not the other way around. What you have done is uncover this basis to this particular problem, and then the maths follows easily as it is clear what is going on.

    A simple way of rephrasing what you have illuminated above, is to say that the shortest ie the steepest, route between contours or level surfaces, is by definition, in a direction perpendicular to the plane of the contour, since this is how 2D contours are constructed ie in a plane perpendicular to the third dimension axis.

    This perpendicular direction, parallel to the 3rd dimensional axis, when projected onto the 2D contour, will necessarily be perpendicular to the 2D tangent (or gradient of the 2D function to any point on it.

    Thus by definition, ‘grad f’ or the gradient function of a 3D shape is perpendicular to the gradient of the 2D shape. thanks again Robyn

  11. can you explain what happens if the gradient vector is not in xy plane? say i want to find a gradient vector on a sphere. the vector perpendicular to the surface would be a 3D vector, but the gradient vector contains only 2 components?

  12. Hey, these are my thoughts. Lemme know what you think of them:

    Let there be a function z=f(x,y) describing a surface C. This function can be written as F(x,y,z)=f(x,y)-z=0
    The total differential of this equivalent function at the point (x_o,y_o,z_o) is given as F_x dx+F_y dy+F_z dz=0
    The small incremental changes are essentially small changes in x, y and z and is a short resultant vector such that
    dx=x-x_o dy=y-y_o dz=z-z_o
    and (dx dy dz)=(x-x_o y-y_o z-z_o)
    The total differential then starts to look like the dot product of the incremental resultant vector and the gradient vector.
    (F_x F_y F_z ) . (dx@dy@dz) =F_x dx+F_y dy+F_z dz
    Now since this is equal to 0, it follows that the gradient vector (F_x F_y F_z ) is normal to the incremental vector (dx dy dz)

    Now remember that the equation of a plane is derived from a vector on the plane and its normal. Since (dx dy dz) is on the surface C and is a very small vector, it follows that this small incremental vector also lies on the tangent plane to the surface C at the point (x_o,y_o,z_o). Hence, we can use the incremental vector and the gradient vector – which we now know is normal to the point – to find the equation of the tangent plane. Therefore, the gradient vector is normal to the tangent plane and the differential equation
    F_x dx+F_y dy+F_z dz=0
    …is the equation of the tangent plane.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s