Apologies to those viewing this
page in Netscape.
The Excel spreadsheets saved as a webpage, and then inserted into
Frontpage Express, seems to have gone awry. Any suggestions
on how to fix this?
Over two-thirds of all English surnames have become defunct in the 30 or so generations since 1350. Given the big E, expired, gone to meet their maker, over, no more.....
Historically, there have been two schools of thought about the eventual outcome of this process
| Heat Death | All surnames will eventually expire except 1 |
| Steady State | The rate of extinction plateau's out |
The Heat Death approach can be traced back to a mathematical puzzle posed by Francis Galton in the Educational Times 1873. This was instigated by a controversy over how many names of the 'quality' were disappearing at the time.
| Galton's Puzzle |
| "A large nation , of whom
we will only concern ourselves with the adult males, N in
number, and who each bear separate surnames, colonise a
district. Their law of population is such that, in each
generation, P0 per cent of the adult males
have no male children who reach adult life; P1
have only one such male child; P2 have 2, and
so on up to P5 who have 5. Find |
| I have now discovered an excellent introductory analysis to Branching Theory and surname extinction in Chapter 10.2 of Grinstead and Snell's Introduction to Probability |
| There is a more complex commentary at: http://fauam2.am.uni-erlangen.de/teaching/lectures/graef/wr2/wr 2.pdf |
A solution was proffered by the Rev.
Henry William Watson, and from his 1874 joint paper with Galton ,
the mathematical tool of branching emerged, the Galton-Watson
Process.
(Strictly-speaking this now more properly known as the
Galton-Watson-Bienaymé process, since it has now been discovered
that the French mathematician had discussed this problem some 30
year earlier than Galton)
| Branching-process theory..is
that part of mathematics which deals with the growth and
decay of populations of objects which multiply and
replace one another, generation by generation, according
to rules in which chance plays a prominent part Kendall, 1974 |
The mathematics of branching theory is somewhat beyond me, but the following outline thankfully involves no theory
| Call the probability of having
no sons P0, of having 1 son P1, of
having 2 sons P2, and so on, these
probabilities applying independently to all males in all
generations, and to be interpreted as sons living to
maturity. Designate the chance of extinction of a male
line starting with 1 person as x. Then the
chance of extinction of two separately and independently
developing male lines must be x2
, of three lines x3,
and so on. If a man has no sons, his (conditional) chance of extinction is 1, if he has one son, his chance of extinction is x, if he has two sons his chance of extinction is x2,and so on Keyfitz Applied Mathematical Demography p332 |
Hence this man's whole (unconditional) chance of extinction (x) through his sons is :-
| x = p0 + p1x +p2x2 + .... |
(...more to add here, when I get to grip with the mathematics!)
Watson -in a specific example- produced probabilities of extinction in the first ten generations as:-
| .237 | .347 | .410 | .450 | .478 | .497 | .511 | .521 | .528 | .534 |
This means that the disappearance of surnames is very rapid in the first generations (237 out of 1000 in the 1st generation, 100 in the 2nd, but only 28 in th 5th step, and only 6 in the 10th step.
The question is: will this process lead to extinction after innumerable generations?
Watson came to this conclusion, arguing:-
| All the surnames , therefore, tend to extinction in an indefinite time, and this result might have been anticipated generally, for a surname once lost can never be recovered, and there is an additional chance of loss in every successive generation |
However, Watson's equation has another
solution besides 1; the root .533 (He either overlooked this, or
discounted it under the influence of Newtonian certainty). Under
this model, only 55% of his surnames will ever go extinct
-commencing from a single originator for each surname at time
zero.
A surname has then a sporting chance of survival. Its extinction
is not pre-ordained.
Whether it does or not depends on the criticality theorem. This
in turn depends on the average number of offspring produced by
one individual. If this number is less than one, then as the
population is contracting, it is certain that the surname-holders
will eventually die out. If the average is more than 1, then with
the population expanding, the surname has a chance of survival.
But, a high birth rate in your line does not necessarily provide
an advantage over a low birth rate, in avoiding extinction. The
chance of extinction depends on the variation in
the number of children in later generations, and especially on
the chance of having zero children. As a counter example, a "population in which everyone married and
each couple had exactly one son (and one daughter) surviving to
maturity would be stationary, but the probability of extinction
of either line would be zero" (Keyfitz-
Applied Mathematical Demography).
Another point to note about an expanding population, is that each of us will either have zero descendants or many - the chance that any one of us will have exactly 1 descendant after 10 generations is remote.
On the other hand, the demographers James E Smith and Philip R Kunz have calculated that " in the United States in 1960 there was only a 25% chance that a man would have descendants with his last name 13 generations later"
| The surname
paradox How do the predictions of simple branching theory relate to the mix of surnames today? Well at first sight not at all. The branching process predicts that after k generations (from a single ancestor), the number of descendants will either be zero or a large number. But -as investigations of GRO registers and telephone directories in the UK and US show- 40% of the surnames listed are singletons. (In Ken Tucker's survey of US surnames, he found that -excluding the surname Smith- one was more likely to bear a rare surname than a leading surname)
And would a branching process that takes into account the interplay between a mixture of both multi- and single- originators be more successful in predicting the current distribution-profile of surnames? |
Marbles, surnames and a random walk
A drunk leans against a wall. He can move left or right with each step with equal probability. If he keeps at it long enough, he will return to his starting place, and indeed every point of the wall. Take the wall away, the drunk can now step north, south, east and west with equal probability. Again- given enough time- he is sure to return to his starting position.
If instead, I bet on whether I draw a red or black marble out of a pot with just 1 of each colour (and returning the chosen marble each time) then my winnings or losings take a similar 1 dimensional random walk. Given enough time, my winnings and losses cancel each other out.
But say I commence with a pot containing just 1 red and 1 black marble , still replacing the marble chosen, but instead this time adding a marble of the same colour. I have introduced positive feedback. Success will breed sucess, and the random drift will regulate itself. But what will be the outcome in the long-run. The probability of success in the first marble scenario was 1/2, what would it be in this one? Will it be the same, 0 or 1? The answer is that at the outset all values between 0 and 1 are equally likely.
"The first steps of this random walk settle its fate : the initial fluctuations lock themselves in. The random events of the past affect those in the future. What started as a fair contest between Black and Red can end up as a very one-sided affair. In every single experiment, a law seems to emerge: after a few thousand draws, we can very confidently predict that the chances for Red to turn up are such and such. But for every repetition of the experiment (starting anew with one red and one black marble in the urn, a different law emerges: the such and such isn't the same" Sigmund, Karl. Games of Life : Explorations in Ecology, Evolution and Behaviour
1 ![]()
Proportion of red marbles
0.5 0 ![]()
Time
Proportion of red marbles from 0 to 1 over time, for repeated trials
Does this marble model help us
understand the current distribution of surnames?
I like to think of a pot containing thousands of colours of
different hues, each representing a surname. The starting
position being that some hues are better represented than others.
Some hues will change as surname variations set in.
In this model, will those names which start well-represented
always be the winners, however many times the starting position
is re-commenced?
With the marble model, whatever happened
in the past affects the present.
Whatever decisions our ancestors took in making bynames
hereditary, chance decisions and fertility, over the first few
crucial generations locked-in that surname's future
Back to branching (and name survival)
Example 1
| Z | 0 | =1 | O | |||||||||
| Z | 1 | =2 | O | O | ||||||||
| Z | 2 | =3 | O | O | O | |||||||
| Z | 3 | =5 | O | O | O | O | O | |||||
| end | end | |||||||||||
| Z | 4 | =5 | O | O | O | O | O | |||||
| end | ||||||||||||
| Z | 5 | =4 | O | O | O | end | O |
| generated with xls2html converter Z is the generation for 5 generations, Z0 to Z5. These have been colour-banded to show clearly the number of offspring at each point. For example Z4=5 means the fourth generation has 5 members (a member being denoted by a 0 symbol) Individual lines might become extinct; as denotes by tramlines |
In this example, the number of offspring has been restricted to two.
The founder has 19 offspring (4 of whom do not re-produce)
The mean of the offspring distribution
equals 19/16 (1.1875), so the line will
have a
positive chance of persisting (ignoring gender)
Example 2
In this example (from Keyfitz p332), the probabilities are set as follows:-
As discussed earlier, the equation for x, the probability of extinction is
x=0.25+0.25x+0.25x2+0.25x3,
which simplifies to
x3+x2-3x+1=0
This equation has two roots (but setting
aside the root x=1), dividing by x-1
results in the quadratic
x2+2x-1=0
This resolves by completing the square to:-
x = -1 + \/_2 = 0.414
This hypothetical population has net reproduction rate of 1.5
(Editor's note: at this point, I hope you understand that I hardly understand a word of all this, but it feels as if it ought to be fascinating)
Historical probability rates
The following are reproduced to give a broad idea, and should be used with the greatest of circumspection
1) Lotka in his study of the child-bearing chances of the wives of white males in the 1920 US census, derived:-
2) Sturges and Hackett derived the following table from a study of english genealogical records
Number of males who marry 0 1 2 3 4 5 6 Probability .317 .364 .209 .080 .023 .005 .001 e.g. the probability is 0.364 that a family will produce 1 son destined to marry
Obviously these rates are not comparable between countries, eras, social or ethnic groups. And individual families will have different fertility rates. Presumably S & H was a limited study, and consequently needs qualifying.
Modelling surname extinction
The ESRC Surnames Project is attempting to :-
"build a model of how an initial population of surnames grows, diffuses and mutates through many generations"; a model that will produce results consistent with known national and local scalings. This model would be based on a random walk with diffusion on a large lattice, but some of the random steps would be large leaps - a Levy flight- in order to simulate the diffusion caused through migration.
Summary
Based on the work of :-
D.G.Kendall, Keyfitz, K. Sigmund, (Taneyhill,Dunn &
Hatcher),the Galton website, ESRC Surnames Project Symposium
paper (unpublished)
| Rain, midnight rain,
nothing but the wild rain On this bleak hut and solitude and me Remembering again that I shall die And neither hear the rain nor give it thanks For washing me cleaner than I have been Since I was born into this solitude........ |
.......... What should we do to rate the long alas But skeeter down a steeper gradient? And then some falls are still more fortunate, The meteors spent , the tragic heroes stunned Who go out like a light. But here the Chip,chip,chip will flake the stone by slow degrees For hour on hour, the fire will gutter down, The bird will call at longer intervals |
| extracts from the first and last stanzas of two favourite poems of melancholy | |
The question "How many surnames are
there" has little meaning, since unique surnames are
constantly appearing and disappearing. Some unique names
disappear through marriage: others (e.g. the recent fashion of
combining each partner's name into a new double-barrelled
equivalent) are created at marriage. Emigration and immigration
also play their parts.
There is a complex relationship between the ever-changing size of
the population and the number of names borne.
Perhaps the best way is to illustrate this at a handable level. You want to get some idea of the extent of surnames in you locality. You stand in your high-street or village green, and sample/interview a selection of people who pass you by. In recording the results, you decide to work with 2 constants
-s is the number of different surnames in a sample : n is the number of persons.
The first 20 people encountered all have different names (so s/n= 1). As you count more, one starts to get duplicates, so that after 200 people there might be three duplicated names, say, giving n= 200, s= 197; s/n= 0.985. After 1000, one might have only 950 names, giving s/n= 0.95.
What if one continues sampling day after
day? What happens with the relationship between s/n
and n? Will s/n
always decrease as n increases? Perhaps
-because the stock of surnames is limited- so as you counted more
and more people you would eventually reach a stage at which one
had met all the surnames before. s
would then remain constant, even if one comtinually included more
people (increased n).
Or is n so large, that one will always
encounter new surnames, however many people one samples? In
effect, this will mean that s/n is then
constant as n increases. And if this
option is correct, how quickly does one will approach a steady
state?
This is a fundamental question in the mass study of surnames, and relates to how many unique surnames ('singletons') there are in the total population.
S and N
s is the number of different surnames in
a sample : n is the number of persons.
Thus, in a sample of 100 people, each with a different surname,
S/N = 100/100 = 1
or each with the same surname
S/N = 1/100 = 0.01
The formula S/N is thus an indicator of the diversity of the surname pool. This might be on a local or regional level. However, with the advent of nationally available data eg telephone directories on cd, and electoral rolls on CD , it is now possible to take samples indicative of the national level (Beware sampling correctly at a national level is a task that needs some thought).
The above is based on Trevor's work (often verbatim)
We are entering a new era in surname study. Previously surnames were studied as individual entities : now they can be studied as a mass phenomenum.
Let's pose some big questions
that might now be answerable (and indeed are being tackled by
amateurs).
I am sure that you could add a few of your own. Here is one that
may concern you first:-
Last revised: October 03, 2004