Lots of calculators offer the capability to solve arbitrary equations. Even if they are not, they implicitly use it for calculating square roots. In Algeo, for example, it is also used for snapping points on a graph to minimums and maximums.
Solving an equation accurately is quite hard, and is impossible to do in general. You can solve some special cases, like quadratic polynomials, but that covers only a fraction of practical problems. Instead, methods that solve equations are approximating – they do not return an exact value just try to find something close enough. Even so, they often can’t find a solution.
A very popular method, used in most calculators, is Newton’s method. As the name suggests, it was Newton who derived an early form of the algorithm. However, his version looked quite different and only worked for polynomials. There were many variations of his procedure, but the form we use today was proposed by Thomas Simpson.
The algorithm
Newton’s method is an iterative method: we start with an initial guess and improve it by repeating the same set of steps. Let’s do a step-by-step derivation of the formula, that will shed some light on why it is working.
Assume we want to solve the equation $f(x)=0$. Let’s call our current guess $x_n$. Here is an illustration of where we are right now:
Intentionally, I haven’t drawn the full plot because our algorithm doesn’t know what the function actually looks like. We have a strong guess anyway: the root will be somewhere on the left hand side. So the next guess, $x_{n+1}$, must be smaller than $x_n$. Looking at the plot, we can see that it is sort of straight: there are no squiggles, jumps and it is decreasing nicely. So a good guess is to draw a straight line in $x_n$, and find the root of that line instead of the original function:
Which line to use? We need the tangent, as it is the closest approximation of the function around $x_n$. From our calculus studies we know that the slope of the tangent in $x_n$ is given by the derivative, $f'(x_n)$. The equation of the red line is:
$$y=f'(x_n)(x-x_n)+f(x_n).$$This ensures that the slope is $f'(x)$ and in $x_n$ the line goes through $f(x_n)$. $x_{n+1}$ is the point where the line intersects the $x$-axis, thus we have to solve:
$$0=f'(x_n)(x_{n+1}-x_n)+f(x_n).$$Dividing by $f'(x_n)$ and reordering the equation we get:
$$x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}.$$ And this is the update step of Newton’s method solving the equation $f(x)=0$.
We can summarize the algorithm as follows:
- Start from an initial guess $x_0$.
- Draw the tangent in $x_0$.
- Calculate the intersection of the tangent and the $x$ axis, denote it $x_1$
- Repeat steps 2-3. until $x_i$ does not change much.
For how long should we run the method? It is quite fast, usually 5-7 steps are enough to get 10 decimal precision.
Limitations
You can prove that if the the function $f$ is differentiable, and the derivative is continuous, and you start close enough from the root, the root will be found. The problem is ‘starting close enough’. If you already know where the root is why would you run Newton’s method? In general, there is no way to have good starting point.
It’s no wonder the method can fail in practice. Sometimes it falls in an inifinite loop, sometimes it just grows to infinity. Also, if an equation has multiple roots, different initial guesses lead to different roots.
Alternatives
Another drawback is that you have to calculate the derivative of the function. However, simpler calculators can’t do that. Instead, they track two points for approximation, not one, and draw the line through these points:
As the method progresses, these two points get closer to each other, approximating the derivative better and better. In other words, the secant method also approximates the derivative, in addition to finding the roots.
You may have noticed in the derivation, that we are using a simple linear approximation. But what if we used a higher order polynomial, would that help us? The answer is yes, and leads to Householder’s method. It can use polynomials of any degree and the $d=1$ case is equivalent to Newton’s method. It requires fewer number of steps than Newton’s but each step is slower as it involves calculating higher order derivatives. Usually, the total runtime of Newton’s method is shorter.