One can distinguish four successive steps:
This involves applied mathematics (2 and 3) and programming (3), but also a good knowledge of one want to model (1 and 4).
In a green valley live rabbits and foxes. If there are many foxes, they decimate the rabbits, whose population falls and the foxes starve to death. If there are few foxes, rabbits multiply (no comments) and provide food to foxes, which can thrive. The two populations therefore oscillate over time.
If we want to model this ecosystem, what choices do we have to make?
The choices depend, amongst others, on
You may have noticed that we did not write a single equation: mathematical modeling is not just about math, it also requires a good knowledge of what we want to model (and common sense).
So far, we only did the first step, the concrete description. We did not do any math (although we sometimes anticipated some of the future problems we might encounter depending on the choices we made). So, what are the mathematical choices to be made?
Obviously, in real life, there are no half-foxes; but in a mathematical algorithm we can decide to treat the populations as decimals. For example, if the population of 923 foxes increases by 10%, there will be 923 + 92.3 = 1 015.3 of them (with integers, there would be 1 015 or 1 016, i.e. a difference of barely 0.1%). If there are lots of foxes and rabbits, this is not a big problem. But with, say, elephants one cannot say that the population increases by 10% per year: a population of 6 elephants cannot actually change to 6.6: it can either not change or increase to 7 — this difference of more than 15% is not negligible.
With rabbits and foxes, knowing the total population of the species (or by sex) is sufficient. At most, there will be a population pyramid: the current number of 3-month-old female rabbits is the number of 2-month-old females in the previous month minus deaths, and the number of newborns is the sum of births (which is known if the birth rate is known by age, given that the distribution of ages is known). But with animals that are less numerous, have a long gestation and one young at a time with years between births (as with elephants), we will try to have more information individual per individual in the numerical model.1 (And in this case, the issue of non-integral populations cannot arise.)
Are we doing a deterministic or probabilistic (a.k.a. stochastic) model? For example, does the fox population increase by exactly 10% a year or only 10% on average? With elephants, a probabilistic model is more suitable,2 and it is easier to tune the probability of pregnancy or of death since we have individual data.
One can note in passing that these choices are not made in isolation: some choices are compatible, while others are not. With three criteria, it is possible to end up with only two possible numerical models, if the choice made on one criterion sets the others. With elephants, one would surely have data per individual (and thus a whole number of animals), and the mathematical model would probably be stochastic. For rabbits and foxes, there would be no individualized information, the number of animals could be decimal and the model could be deterministic.
The result of the model (step 4) is seldom perfect: there are things the model does well, and others not. We thus take a new look at the previous three steps, looking for ways to improve the mathematical algorithm: we proceed to the next iteration, to the model 2.0. Once we are done, we make improvements, add details we had ignored the first time over, and so on.
The sequential aspect is misleading also because we do not follow the steps, independently, one after the other. We must anticipate the consequences of our decisions and organize our work.
We are already thinking about the next step. For example, during the modeling step, we have the mathematization step in mind: 'if I account for this phenomenon the equations will be much more complicated (perhaps unsolvable)'.
We sometimes go back. In the example, when came the time to implement the numerical model (steps 2–3), we realized that there were two possible models: with and without information about individuals. We then go back to step 1 and ask ourselves what makes most sense. In the end, we make different choices for rabbits and elephants (which we had not necessarily foreseen).
We think about the next cycle: what will we include in the first version, and what can wait until the next one? Parameters and mechanisms are hierarchized: what is needed immediately and what can wait, what we know how to solve and what will be more troublesome.
And simplifying means not faithfully representing reality. A famous joke (among people who know it) mocks physicists for considering cows to be spherical. Such a model may not be very useful to a veterinarian, but if the exact shape of the cow is irrelevant then it is better not to encumber ourselves with it.
We do not always know right from the beginning what is relevant and what is not. In fact, one of the goals may be to establish a hierarchy between important and unimportant criteria. Two different models hierarchizing differently will likely give two different results (and probably have different strengths and weaknesses).
It is often better to start with an oversimplified model. For one thing, the fundamental law of mathematical modeling is that an oversimplified model generally works pretty well (it is therefore a kind of anti Murphy). On the other hand, because with a simple model one can obtain results quickly and thus see what are the weaknesses of the model. Otherwise, one can but try to improve a mathematical model without knowing what aspects really need improving.
A simple linear system of two equations with two unknowns can be solved by a 12-year old. If one extends it to a thousand equations and a thousand unknowns, the resolution method does not change (I recommend using a computer, or many interns, to do the calculations). However if we want to improve the model a little by transforming y = 2 x − 5 into y = 2 x − 5 + ε x2 (where ε is a small correction) then everything collapses: one is no longer sure that there is a solution, that this solution is unique, etc.
A small change can thus make more difference than increasing the number of equations and variables 500-fold. If you do not understand applied mathematics (and especially how the equations are solved or simulated), you cannot tell what is easy, possible, difficult, impossible. A small modification may sometimes require a completely different method. So a mathematician tells you that it doubles the cost (if you ask soon enough) or that it is too late because it would require to start over (if you ask at the end), and you think he is pulling your leg.
So we must know as soon as possible what we want to do later. Initially, it is possible to choose between several techniques, each with its advantages and disadvantages. We will opt from the beginning for the one that will be best in the end. The choices made for the first version of the mathematical algorithm may therefore depend on the needs already anticipated for the fourth version.
1: For programmers: one could have an Elephant class, with one instance per individual, indicating the sex, the age, if that female is pregnant and how far along, etc.; while with foxes and rabbits one would not have one variable per animal but only tables for the population pyramids or the population itself.
2: Especially because the deterministic model implicitly uses the law of large numbers (i.e. the equivalence between probability and proportion).