DinS Written on 2018/4/24

Derivatives measure how change in the input affects the output

1. What’s derivative?

Definition: the derivative of f at point x is defined to be:

If the derivative exists, we say f is differentiable at x.

The derivative of f at x is written as f’(x), which implies that derivative is a function, or , which implies that derivative is a ratio.

Let’s first understand derivative from the ratio perspective.

If we ignore the lim side, is just calculating the slope of line AB. What about putting on the limit? If h->0, that means A and B are infinitely close to each other. By imagination we know that the line becomes a tangent line, and the derivative is the slope of the tangent line. That’s visually what a derivative is.

Let’s then understand derivative from the function perspective. The above example we get derivative of point A. That’s a number. Now A is on f(x), and there’re many points on f(x). By the same understanding we know that each point has its derivative, which is also a number. If we map all the points to its derivative, we get a new function. That’s the derivative function f’(x). This function tells us the derivative of a given point on f(x).

2. How to find derivatives?

Derivative is a very powerful tool in calculus. However, we’ll first dive into the field of how-to calculate derivatives and then proceed to what derivatives can do.

Question:

Let’s think of this question from three different ways. The key is to remember derivative measures the relationship between input and output.

(1) Algebra

The original input is x, the original output is x².

If we add a small h to the original input, it becomes (x+h), thus the output becomes (x+h)², which is x² + 2xh + h². When h is very small, h² is infinitely close to 0. We’ll just ignore it.

The change in input is (x+h) – x = h. The change in output is x² + 2xh – x² = 2xh.

By intuition

(2) Geometry

That black shape is what has changed after increasing x by h.

By intuition

(3) Proof by definition

Given f(x) = x²

One problem, three ways of thinking. This is actually the fun part of math.

3. Shortcut to derivatives

All derivatives can be calculated by definition and working with limit. However, it’s quite tough and boring. We can work out a few rules to make our life easier.

(1) Power rule

The proof is very similar to x², only with a bit of generalization.

(2) Sum rule

Proof:

We construct h(x) = f(x) + g(x)

Here you see why limit is the founding block for derivative. We use derivative to prove the sum rule, but in real problem we use sum rule directly to make matter simpler.

(3) Product rule

(4) Quotient rule

(5) Chain rule

This is the big part. If you’re working with something complicated, you’ll probably use this.

Question: given h(x) = g(f(x)), h’(x) = ?

Answer: h’(x) = g’(f(x)) * f’(x)

I’ll only give an intuition to explain why this is.

Where can we get “changes in g(f(x))”?

Where can we get “changes in f(x)”?

So here we can guess that these three derivatives have something in common. But what’s the exact relationship? We can do something about the three equations. We can divide h’(x) by f’(x).

If we multiply both sides by f’(x) we get h’(x) = g’(f(x)) * f’(x)

Of course this is not a proof, but gives you some insight. Let’s try to think about it in another way.

Without considering derivative, from input x we get f(x) as output and then pass f(x) as input and get g(f(x)) as output. This routine describes what h(x) is. Now what if we add a small change h to x?

If the input is (x+h), we get f(x+h) as output for sure. But that will not help us in this problem. We need to dig deeper in the meaning of derivative. Derivative measures how change in the input affects the output. That is to say, f’(x) is the ratio of change at point x. If we multiply ratio of change at x with actual change in x, we get the change in output, roughly. In other word, f’(x)*h is roughly the changes in output for f(x).

If we proceed to g(f(x)), the same logic applies too. g’(f(x)) is the ratio of change at point f(x), and the actual change at point f(x) is f’(x)*h. As a result the actual change in output for g(f(x)) is g’(f(x))*f’(x)h. Our final aim is h’(x).

This also gives the same answer.

Thinking and playing around is the fun part of math.

I’ll be lazy enough to skip the formal proof. Programmers are always lazy.

Let’s see chain rule in action.

Given f(x) = (1 + 2x)⁵, what is f’(x)?

We make g(x) = x⁵, h(x) = 1 + 2x. Thus f(x) = g(h(x)).

As a result f’(x) = g’(h(x)) * h’(x) = 5(1+2x)⁴ * 2 = 10(1+2x)⁴

With these rules you can handle many problems. Notice that since this is not a math class, I didn’t mention other common derivatives like logarithm and trigonometric functions. You can easily find them on the web. The key here is the understanding of derivative itself.