Four steps

One can distinguish four successive steps:

1. In the beginning was the world — Any attempt at modeling the real world must begin with reality: in order to solve a problem one must first understand it. Except that reality is complicated: one must make choices, viz. keeping what is relevant and getting rid of the superfluous.
2. Generating equations or an algorithm — One must then move from a concrete model to an abstract mathematical model (often in the form of equations).
3. Solving or simulating — These equations are then solved to find a general formula. If this is not possible, one makes numerical simulations (one will then have to use our custom-made software make calculations every time one wants to change a parameter).
4. Verification — The results of the model are compared with empirical data. For example, if the model makes a certain prediction, an experiment is made to confirm it.

This involves applied mathematics (2 and 3) and programming (3), but also a good knowledge of one want to model (1 and 4).

Example: prey and predators

In a green valley live rabbits and foxes. If there are many foxes, they decimate the rabbits, whose population falls and the foxes starve to death. If there are few foxes, rabbits multiply (no comments) and provide food to foxes, which can thrive. The two populations therefore oscillate over time.

Describing the system

If we want to model this ecosystem, what choices do we have to make?

• Do we take natural deaths into account? (This does not complicate the model too much.) If so, do we account for age?
• it forces us to repeatedly calculate a whole population pyramid by species, instead of a single population figure;
• we must know the mortality rate as a function of age (actuarial tables for rabbits and foxes).
• The foxes themselves have predators: do we include them in the model?
• As breeding requires a male and a female, do we need to know the population by sex or can we just have a global population for the species? To make a decision, we would need to know whether
• male and female rabbits have the same probability of being eaten by a fox;
• as many male as female rabbits are born;
• the population is large.
If we answer yes to every question, there will always be about as many males as females, so we do not need numbers broken down by sex (if the population is small and odd, there can be no equilibrium, e.g. 3 females and 4 males).

The choices depend, amongst others, on

• the goal: do we want, for example, specific results over a short period of time or are we more interested in the evolution of populations over the long term?
• available data: no need to have a population pyramid if we do not have birth and death rates by age;
• resources: with a large staff and big computers one can afford a more complex model.

You may have noticed that we did not write a single equation: mathematical modeling is not just about math, it also requires a good knowledge of what we want to model (and common sense).

Mathematical choices

So far, we only did the first step, the concrete description. We did not do any math (although we sometimes anticipated some of the future problems we might encounter depending on the choices we made). So, what are the mathematical choices to be made?

Obviously, in real life, there are no half-foxes; but in a mathematical algorithm we can decide to treat the populations as decimals. For example, if the population of 923 foxes increases by 10%, there will be 923 + 92.3 = 1 015.3 of them (with integers, there would be 1 015 or 1 016, i.e. a difference of barely 0.1%). If there are lots of foxes and rabbits, this is not a big problem. But with, say, elephants one cannot say that the population increases by 10% per year: a population of 6 elephants cannot actually change to 6.6: it can either not change or increase to 7 — this difference of more than 15% is not negligible.

With rabbits and foxes, knowing the total population of the species (or by sex) is sufficient. At most, there will be a population pyramid: the current number of 3-month-old female rabbits is the number of 2-month-old females in the previous month minus deaths, and the number of newborns is the sum of births (which is known if the birth rate is known by age, given that the distribution of ages is known). But with animals that are less numerous, have a long gestation and one young at a time with years between births (as with elephants), we will try to have more information individual per individual in the numerical model.1 (And in this case, the issue of non-integral populations cannot arise.)

Are we doing a deterministic or probabilistic (a.k.a. stochastic) model? For example, does the fox population increase by exactly 10% a year or only 10% on average? With elephants, a probabilistic model is more suitable,2 and it is easier to tune the probability of pregnancy or of death since we have individual data.

One can note in passing that these choices are not made in isolation: some choices are compatible, while others are not. With three criteria, it is possible to end up with only two possible numerical models, if the choice made on one criterion sets the others. With elephants, one would surely have data per individual (and thus a whole number of animals), and the mathematical model would probably be stochastic. For rabbits and foxes, there would be no individualized information, the number of animals could be decimal and the model could be deterministic.

Subtleties and complications

The result of the model (step 4) is seldom perfect: there are things the model does well, and others not. We thus take a new look at the previous three steps, looking for ways to improve the mathematical algorithm: we proceed to the next iteration, to the model 2.0. Once we are done, we make improvements, add details we had ignored the first time over, and so on.

Loops, back and forth, anticipation

The sequential aspect is misleading also because we do not follow the steps, independently, one after the other. We must anticipate the consequences of our decisions and organize our work.

We are already thinking about the next step. For example, during the modeling step, we have the mathematization step in mind: 'if I account for this phenomenon the equations will be much more complicated (perhaps unsolvable)'.

We sometimes go back. In the example, when came the time to implement the numerical model (steps 2–3), we realized that there were two possible models: with and without information about individuals. We then go back to step 1 and ask ourselves what makes most sense. In the end, we make different choices for rabbits and elephants (which we had not necessarily foreseen).

We think about the next cycle: what will we include in the first version, and what can wait until the next one? Parameters and mechanisms are hierarchized: what is needed immediately and what can wait, what we know how to solve and what will be more troublesome.

Modeling means simplifying

And simplifying means not faithfully representing reality. A famous joke (among people who know it) mocks physicists for considering cows to be spherical. Such a model may not be very useful to a veterinarian, but if the exact shape of the cow is irrelevant then it is better not to encumber ourselves with it.

We do not always know right from the beginning what is relevant and what is not. In fact, one of the goals may be to establish a hierarchy between important and unimportant criteria. Two different models hierarchizing differently will likely give two different results (and probably have different strengths and weaknesses).

It is often better to start with an oversimplified model. For one thing, the fundamental law of mathematical modeling is that an oversimplified model generally works pretty well (it is therefore a kind of anti Murphy). On the other hand, because with a simple model one can obtain results quickly and thus see what are the weaknesses of the model. Otherwise, one can but try to improve a mathematical model without knowing what aspects really need improving.

From the outside it is difficult to tell what is simple and what is complicated

A simple linear system of two equations with two unknowns can be solved by a 12-year old. If one extends it to a thousand equations and a thousand unknowns, the resolution method does not change (I recommend using a computer, or many interns, to do the calculations). However if we want to improve the model a little by transforming y = 2 x − 5 into y = 2 x − 5 + εx2 (where ε is a small correction) then everything collapses: one is no longer sure that there is a solution, that this solution is unique, etc.

A small change can thus make more difference than increasing the number of equations and variables 500-fold. If you do not understand applied mathematics (and especially how the equations are solved or simulated), you cannot tell what is easy, possible, difficult, impossible. A small modification may sometimes require a completely different method. So a mathematician tells you that it doubles the cost (if you ask soon enough) or that it is too late because it would require to start over (if you ask at the end), and you think he is pulling your leg.

So we must know as soon as possible what we want to do later. Initially, it is possible to choose between several techniques, each with its advantages and disadvantages. We will opt from the beginning for the one that will be best in the end. The choices made for the first version of the mathematical algorithm may therefore depend on the needs already anticipated for the fourth version.

1: For programmers: one could have an Elephant class, with one instance per individual, indicating the sex, the age, if that female is pregnant and how far along, etc.; while with foxes and rabbits one would not have one variable per animal but only tables for the population pyramids or the population itself.

2: Especially because the deterministic model implicitly uses the law of large numbers (i.e. the equivalence between probability and proportion).