Over a month ago, I challenged myself to explain where the line of best fit comes from — conceptually. I started ended Part I with a question:
Our key question is now:
How are we going to be able to choose one line, out of all the possible lines I could draw, that seems like it fits the data well? (One line to rule them all…)
Another way to think of this question: is there a way to measure the “closeness” of the data to the line, so we can decide if Line A or Line B is a better fit for the data? And more importantly, is there an even better line (besides Line A or Line B) that fits the data?
And now, back to our show.
So right now we’re concerned with a measure of closeness. Can we come up with a measurement, a number, which represents how close the data is to a line? And the easy answer is: yes.
The difficulty is that we can come up with a lot of different measurements.
Measurement 1: Shortest Distance
We could measure the shortest distance from each point to the line and add all those distances up.

If I add the distance of all the dashed lines together, I get
.
Now let’s try a different line (but with the same points).

If I add the distance of all the dashed lines together, I get
.
It’s obvious that the smaller the total sum of those distances is, the “better” the line is to our data. I mean, if we had a bunch of data that fit perfectly on a line, then the sum of all those distances would be 0. And clearly with our two examples, the second line is a HORRIBLE line of best fit, while the first one seems fairly okay (but not great).
So we could use the sum of the perpendicular segments as our measurement. To find the line of best fit, we would say that we have to try out ALL possible lines (there are like, what, infinity of them? hey, you have study hall…) and find the one with the lowest sum. [1]
But, DUM DUM DUM… there are OTHER measurements you could make.
Measurement 2: Horizontal Distance
We could measure the horizontal distance from each point to the line…

If I add the distance of all the solid lines together, I get
.
And for a different scenario:

If I add the distance of all the solid lines together, I get
.
So if we define “closeness” to be horizontal distance (instead of the closest distance) between a point and a line, the we have a different measurement.
And yet another…
Measurement 3: Vertical Distance
We could measure the vertical distance from each point to the line…

If I add the distance of all the solid lines together, I get
.
And for a different scenario:

If I add the distance of all the solid lines together, I get
.
So if we define “closeness” to be vertical distance (instead of the closest distance or the horizontal distance) between a point and a line, the we have a different measurement.
And, in fact, we will see soon (probably in Part III) that there are actually two more measurements we can use.
So which measurement is the best?
You might say: soooo, sir, we have a ton of different measurements. Which one is the right one? The short answer: all of them. Why not? I mean, we wanted to have a measure which tells us how “good” or “bad” a line is when fitting the data, and we have done just that!
It is unsatisfying, but this is how mathematics is. We now have 3 different answers (and there can be more). Each measurement has benefits and drawbacks.
- The benefit of the first measurement is that we are using the closest distance — and that feels (yes, I’m using feeling in math) like a really good thing. The downside is that calculating all those distances from the points to the line is exhausting and algebraically hard.
- The benefit of the second measurement is that calculating the distance between a point and the line is relatively easy. The downside is that the horizontal distance doesn’t feel right.
- The benefit of the third measurement is also that calculating the distance between a point at the line is relatively easy. It also is, conceptually, something deep. If the points are data that have been measured, and the line is a theoretical model for the data, then the distance is the “error” or “difference” between the measured value and the theoretical value. We are summing errors and saying that the line which the smallest sum (least total errors) is minimal. The downside is that it feels better than the second measurement, but less good than the first measurement.
But yeah, you’re upset. You wanted there to be inherently one right answer. We — using our brains — have come up with some proposals. Each have merits. We’ll soon see hone in on one type of measurement that we will use, and talk about the merits of it, and why everyone uses it so much so that it has become the standard measurement to find The Line of Best Fit.
For now, relax. We’ve done something great. Say we gave two of your friends the set of points above and had each one hand draw the line of best fit. You can decide which one did a better job just by adding a bunch of little line segments together. In fact, you have three different ways of deciding, and you have a logical justification for each!
[1] Of course, if you’re a super argumentative student, you might ask: “what if there are two, or even more, lines that have the same lowest measurement?” Well, I love that question. It’s a wonderful question. And worth investigating. Just not right here, right now. And yes, believe it or not, we will check all infinity lines soon enough. It’s possible. Math gives us shortcuts.