16  Extrapolation and Interpolation

Jupyter Notebook

By now we should be comfortable with representing mathematical functions on discrete grids. For example, the plot below is the location of a projectile as it flies through the air at 11 discrete points in time (red dots). There will be times when we want to find good values of a function between grid points (blue dot) or beyond the last grid point (green dot). The former is called interpolation and the latter is called extrapolation. In this chapter we will learn how to do both.

The trajectory of a particle at a discrete set of times.

16.1 Linear Interpolation and Extrapolation

The simplest way to estimate function values is to assume that the function is linear in the neighborhood of the data points. Using the fact that two points define a straight line will allow us to write a linear approximation to the function. For example, suppose that we have function values \((x_1,y_1)\) and \((x_2,y_2)\), as shown in the figure below.

Illustration of linear interpolation/extrapolation. The red dots are the known function values. The lack dot is an interpolation and the green dot is an extrapolation.

The formula for a straight line that passes through these two points is \[ y-y_1 = { (y_2-y_1) \over (x_2-x_1) } (x-x_1) \] Solving this for \(y\) gives \[ y(x) = y_1 + { (y_2-y_1) \over (x_2-x_1) } (x-x_1) \tag{16.1}\]

This function provides a good approximation to the true function \(y(x)\) in the neighborhood of the two data points. All we have to do is evaluate it for any \(x\) we please. If we evaluate it in between two data points we call it interpolation and if we evaluate it somewhere beyond the last data point we call it extrapolation.

16.1.1 Linear Interpolation

To linearly interpolate or extrapolate we simply evaluate the equation above at \(x\) values between or beyond \(x_1\) and \(x_2\).

Let’s evaluate the function above halfway in between two grid points: \(x = x_1 + {1\over 2} (x_2 - x_1)\) \[\begin{align} y(x_1 + {1\over 2} (x_2 - x_1)) &= y_1 + { (y_2-y_1) \over (x_2-x_1) } (x_1 + {1\over 2} (x_2 - x_1)-x_1) \\ &= y_1 + { (y_2-y_1) (x_2 - x_1) \over 2(x_2-x_1) } \\ &= y_1 + {1\over 2} (y_2-y_1) \\ \end{align}\]

That’s a pretty cool result that we can generalize to \[ y(x_1 + ph) = y_1 + p (y_2-y_1) \tag{16.2}\]

where p is a fraction of the grid spacing.

16.1.2 Linear Extrapolation

Now let’s evaluate the equation above beyond the last grid point at \(x_2 + h\) where \(h = (x_2 - x_1)\) is the grid spacing.
\[ \begin{align} y(x_2 + (x_2 - x_1)) &= y_1 + { (y_2-y_1) \over (x_2-x_1) } (x_2 + (x_2 - x_1)-x_1) \\ &= y_1 + { (y_2-y_1) (2x_2 - 2x_1) \over (x_2-x_1) } \\ &= y_1 + 2 (y_2-y_1) \\ &= 2 y_2 - y_1 \end{align} \tag{16.3}\]

16.1.3 Zero Finding

Sometimes you will want to find the x value that produces \(y(x) = 0\). We call that finding the zero of the function. We can do this by setting \(y =0\) in (Equation 16.1) and solving for x: \[ \begin{align} 0 &= y_1 + {y_2 - y_1 \over x_2 - x_1} (x - x_1)\\ -y_1 & = {y_2 - y_1 \over x_2 - x_1} (x - x_1)\\ -{y_1 (x_2 - x_1)\over y_2 - y_1} &= x - x_1\\ x_1 - {y_1 (x_2 - x_1)\over y_2 - y_1} &= x \\ \end{align} \tag{16.4}\]

16.2 Quadratic Interpolation and Extrapolation

A fancier technique for finding values between and beyond grid points is to use a parabola instead of a line. It takes three data points to define a parabola, so we need to start with the function values \((x_1,y_1)\), \((x_2,y_2)\), and \((x_3,y_3)\), as shown in the figure below.

Illustration of quadratic interpolation/extrapolation. The red dots are the known function values. Black dots indicate interpolations and the green dot represents extrapolation.

The general formula for a parabola is \[ y=a + bx + cx^2 \tag{16.5}\] where the coefficients \(a\), \(b,\) and \(c\) need to be chosen so that the parabola passes through our three data points. To determine these constants, you set up three equations that force the parabola to match the data points, like this: \[\begin{align} y_1 &= a + bx_1 + cx_1^2\\ y_2 &= a + bx_2 + cx_2^2\\ y_3 &= a + bx_3 + cx_3^2\\ \end{align}\]

We can solve this system of equations for \(a\), \(b\), and \(c\) and then plug these expressions back into (Equation 16.5) to get an approximation to the function in the neighborhood of these points. When we do this (it’s long and not super pretty so I’ll save you from the gory details) we find the following useful results for interpolating: \[ y(x_1 + {1\over 2} h) = {1\over 8} (3 y_1 + 6 y_2 - y_3) \tag{16.6}\] \[ y(x_2 + {1\over 2} h) = {1\over 8} (- y_1 + 6 y_2 + 3 y_3) \tag{16.7}\] and the following useful results for extrapolating \[ y(x_3 + h) = y_1 - 3 y_2 + 3y_3 \tag{16.8}\] \[ y(x_3 + {1\over 2}h) = {1\over 8}(3 y_1 - 10 y_2 + 15y_3) \tag{16.9}\]

16.3 An Example

Let’s take the example of the projectile that was mentioned at the beginning of this chapter (shown below again). The red dots are the known data points and let’s try to estimate the height of the projectile between the 3rd and 4th data points in the set (the blue dot) using both linear and quadratic methods.

Let’s evaluate (Equation 16.2) and (Equation 16.6) to perform this interpolation in the cell below.

from numpy import linspace,sin,cos,pi

vi = 30
θ = 40 * pi/180
t = linspace(0,3.5,10)
x = vi * cos(θ) * t 
y = vi * sin(θ) * t - 4.9 * t**2

yInterpLinear = y[2] + 1/2 * (y[3]-y[2])
yInterpQuadratic = 1/8 *(3*y[2] + 6 * y[3] - y[4])
print(yInterpLinear)
print(yInterpQuadratic)
13.931150961536407
14.11641330721542

Notice that the linear and quadratic interpolations are close to the same number but they aren’t equal; we wouldn’t really expect them to be. Deciding which interpolation method is the best choice is a case-by-case decision. For this case, I would favor the quadratic interpolation since the trajectory is more parabolic than linear. Now let’s use the extrapolation formulas to estimate the projectile’s height past the last data point. For the linear case, the second-to-last data point is playing the role of \((x_1,y_1)\) and the last data point is playing the role of \((x_2,y_2)\). For the quadratic case, the third-to-last data point is playing the role of \((x_1,y_1)\), the second-to-last data point is playing the role of \((x_2,y_2)\), and the last data point is playing the role of \((x_3,y_3)\). Best to use negative indices to index the data arrays for this case.

from numpy import linspace,sin,cos,pi

vi = 30
θ = 40 * pi/180
t = linspace(0,3.5,10)
x = vi * cos(θ) * t 
y = vi * sin(θ) * t - 4.9 * t**2

yExtrapLinear = 2* y[-1] - y[-2]
yExtrapQuadratic = y[-3] - 3 * y[-2] + 3 * y[-1]
print(yExtrapLinear)
print(yExtrapQuadratic)
2.3690482905900723
0.8869495251579806

Now the quadratic and the linear extrapolations differ pretty significantly. The linear version is a better value in this case due to the very course grid being used here. Finally, let’s use (Equation 16.4) to estimate the landing location for the projectile.

from numpy import linspace,sin,cos,pi

vi = 30
θ = 40 * pi/180
t = linspace(0,3.5,10)
x = vi * cos(θ) * t 
y = vi * sin(θ) * t - 4.9 * t**2

xRange = x[-2] - y[-2] * (x[-1] - x[-2])/(y[-1] - y[-2])
print(xRange)
93.52444507627926

16.4 Flashcards

  1. What is the difference between interpolation and extrapolation?
  2. What is the equation for linear interpolation?
  3. What is the equation for linear extrapolation?
  4. What is the equation(s) for quadratic interpolation?
  5. What is the equation(s) for quadratic extrapolation?
  6. What is the equation for finding the zeros of a function?
  7. Recite Moses 1:39.

16.5 Exercises

  1. In the cell below, you will find the coordinates for a batted baseball flying through the air while experiencing non-negligible air drag.
    1. Plot the data.
    2. Use the linear zero-finding formula (Equation 16.4) to estimate the landing location of the ball.
    3. Use linear interpolation to calculate the height of the projectile between the 2nd and 3rd data point. Make a visual inspection of the plot to ensure that your interpolation is reasonable.
    4. Use quadratic interpolation to calculate the height of the projectile between the 2nd and 3rd data point. Make a visual inspection of the plot to ensure that your interpolation is reasonable.
x = [0, 15.805250773081712, 29.98211261814125, 42.526480802646844, 53.54837374010167, 63.375313562732956, 72.28376367575353, 80.43524080501983, 87.9042108343639, 94.71629032437068, 100.88093230888737, 106.41217684263896] 
y = [0, 15.80525077308171, 27.532112618141245, 35.45860314481743, 39.97307745470617, 41.548109055524456, 40.52592795902555, 37.14060411207481, 31.588726752728686, 24.07513301195207, 14.825650490213738, 4.076523019892836]
  1. This file contains worldwide carbon emission data from 1958 to 2017.
    1. Read the data file using any method of your choosing.
    2. Plot the data so you can see what it looks like.
    3. Use linear extrapolation to predict what carbon emissions will be in the year 2030.
    4. Use quadratic extrapolation to predict what carbon emissions will be in the year 2030.
    Hints:
    1. Remember that our extrapolation formulas assume that the spacing between grid points is equal. Hence, to extrapolate to 2030, \(x_1\) must correspond to 2004 \(x_2\) should correspond to \(2017\) for the linear extrapolation.
    2. For the quadratic extrapolation, \(x_1\) should correspond to 1991, \(x_2\) to 2004 and \(x_3\) to 2017. You could pick other years if you like as long as they are spaced out evenly.
# Solution here
  1. This file contains the change in average global temperature from 1958 to 2017.
    1. Read the data file using any method of your choosing.
    2. Plot the data so you can see what it looks like.
    3. Use linear extrapolation to predict what average global temperature change will be in the year 2030 if this trend continues.
    4. Use quadratic extrapolation to predict what average global temperature change will be in the year 2030 if this trend continues.
# Solution here