Leading Ones in Population Distribution Numbers

Swedish Locality Population Distribution

This is a follow up to my first post on leading ones in number sets and Benford’s law. To see what I mean by leading ones, please have a look at it. In this post we will take a look at Swedish locality population numbers, and how leading ones in the population distribution follow from growth and decline.

Locality is the translation of the Swedish word tätort used by Statistiska Centralbyrån (SCB), a government agency responsible for producing statistics. In this post I have used Swedish locality population and area data updated by SCB as of 31 December 2017.

The histogram below shows the distribution of Swedish locality populations. As described in my first post on leading ones, numbers sort into “bins” according to both their leading digit and order of magnitude. The SCB data only include population sizes above 200, which is why the first bar at [100, 200) is missing.

Swedish locality populations

Plotting population sizes (e.g. countries, cities) is a common way to illustrate Benford’s law, and we can see why in the figure above. Though the exact levels differ and the first bar is missing, the overall behavior clearly shows the over representation of leading ones.

Leading Ones From Growth and Decline

As I pointed out in my first post on this subject, we expect to see Benford’s law in action when growth is a deciding factor. A similar argument holds for decline. But what when there is both growth and decline? After all, populations change both up and down over time.

To exaggerate a bit, consider alternating between growth and decline in such a way that the overall trend stays stationary. I showed that kind of behavior in a recent post on percentage change asymmetry. If you think about it for awhile you can probably figure out what will happen. But here we will test this using a simple computer simulation.

A Population Distribution Simulation

The simulation starts with a set of population numbers. Each number is then subsequently multiplied by a random growth/decline factor repeatedly for as long as desired. The starting population set is chosen to be every integer number from 100 to 9999. The histogram below shows the corresponding flat distribution, using a bin size of 200. The horizontal scale runs from 0 to 100000 to better compare to the final distribution shown further below.

initial population distribution

During the simulation, at each subsequent iteration, every individual number is multiplied by a random factor between 0.91 and 1.10. The reason for the asymmetric limits is to avoid a bias in either direction, following my discussion in this post. The multiplication step repeats 400 times, resulting in the distribution below.

final population distribution

The graphs above illustrate the change in population number distribution. But the interesting result is the leading digit distribution. The video below shows the development over time as we step through the 400 iterations. The initial flat distribution quickly skews towards leading ones, and in the end closely resembles Benford’s distribution.

Closing Notes

Why does this happen, even though there is no favored direction, and numbers are random? Why does the leading digit distribution still favor the lower digits? It has everything to do with percentage change. A percentage change of something big is larger than the same percentage change of something small. That holds for both growth and decline, and results in a low leading digit “staying” longer than a large digit.

So what can we take away from this little exercise? Well, that it is almost impossible to avoid Benford’s law when considering growth and decline mechanisms.

What about other types of number sets, not actually growing or declining, but still following Benford’s law? In a follow up blog post I look at Swedish locality area sizes.

Leave a Reply

Close Menu