The survival of surnames

 

Surname extinction

Surname pool

Other questions

 

Apologies to those viewing this page in Netscape.
The Excel spreadsheets saved as a webpage, and then inserted into Frontpage Express, seems to have gone awry. Any suggestions
on how to fix this?

 

 

 

 

Surname extinction

Over two-thirds of all English surnames have become defunct in the 30 or so generations since 1350. Given the big E, expired, gone to meet their maker, over, no more.....

Historically, there have been two schools of thought about the eventual outcome of this process

Heat Death All surnames will eventually expire except 1
Steady State The rate of extinction plateau's out

 

The Heat Death approach can be traced back to a mathematical puzzle posed by Francis Galton in the Educational Times 1873. This was instigated by a controversy over how many names of the 'quality' were disappearing at the time.

Galton's Puzzle
"A large nation , of whom we will only concern ourselves with the adult males, N in number, and who each bear separate surnames, colonise a district. Their law of population is such that, in each generation, P0 per cent of the adult males have no male children who reach adult life; P1 have only one such male child; P2 have 2, and so on up to P5 who have 5.

Find
(1) What proportion of the surnames will have become extinct after r generations; and
(2) how many instances there will be of the same surname being held by m persons"

I have now discovered an excellent introductory analysis to Branching Theory and surname extinction in Chapter 10.2 of Grinstead and Snell's Introduction to Probability
There is a more complex commentary at: http://fauam2.am.uni-erlangen.de/teaching/lectures/graef/wr2/wr 2.pdf

A solution was proffered by the Rev. Henry William Watson, and from his 1874 joint paper with Galton , the mathematical tool of branching emerged, the Galton-Watson Process.
(Strictly-speaking this now more properly known as the Galton-Watson-Bienaymé process, since it has now been discovered that the French mathematician had discussed this problem some 30 year earlier than Galton)

Branching-process theory..is that part of mathematics which deals with the growth and decay of populations of objects which multiply and replace one another, generation by generation, according to rules in which chance plays a prominent part
Kendall, 1974

 

The mathematics of branching theory is somewhat beyond me, but the following outline thankfully involves no theory

Call the probability of having no sons P0, of having 1 son P1, of having 2 sons P2, and so on, these probabilities applying independently to all males in all generations, and to be interpreted as sons living to maturity. Designate the chance of extinction of a male line starting with 1 person as x. Then the chance of extinction of two separately and independently developing male lines must be x2 , of three lines x3, and so on.
If a man has no sons, his (conditional) chance of extinction is 1, if he has one son, his chance of extinction is x, if he has two sons his chance of extinction is x2,and so on
Keyfitz Applied Mathematical Demography p332

Hence this man's whole (unconditional) chance of extinction (x) through his sons is :-

x = p0 + p1x +p2x2 + ....

(...more to add here, when I get to grip with the mathematics!)

Watson -in a specific example- produced probabilities of extinction in the first ten generations as:-

.237 .347 .410 .450 .478 .497 .511 .521 .528 .534

This means that the disappearance of surnames is very rapid in the first generations (237 out of 1000 in the 1st generation, 100 in the 2nd, but only 28 in th 5th step, and only 6 in the 10th step.

The question is: will this process lead to extinction after innumerable generations?

Watson came to this conclusion, arguing:-

All the surnames , therefore, tend to extinction in an indefinite time, and this result might have been anticipated generally, for a surname once lost can never be recovered, and there is an additional chance of loss in every successive generation

However, Watson's equation has another solution besides 1; the root .533 (He either overlooked this, or discounted it under the influence of Newtonian certainty). Under this model, only 55% of his surnames will ever go extinct -commencing from a single originator for each surname at time zero.
A surname has then a sporting chance of survival. Its extinction is not pre-ordained.
Whether it does or not depends on the criticality theorem. This in turn depends on the average number of offspring produced by one individual. If this number is less than one, then as the population is contracting, it is certain that the surname-holders will eventually die out. If the average is more than 1, then with the population expanding, the surname has a chance of survival.
But, a high birth rate in your line does not necessarily provide an advantage over a low birth rate, in avoiding extinction. The chance of extinction depends on the variation in the number of children in later generations, and especially on the chance of having zero children. As a counter example, a
"population in which everyone married and each couple had exactly one son (and one daughter) surviving to maturity would be stationary, but the probability of extinction of either line would be zero" (Keyfitz- Applied Mathematical Demography).

Another point to note about an expanding population, is that each of us will either have zero descendants or many - the chance that any one of us will have exactly 1 descendant after 10 generations is remote.

On the other hand, the demographers James E Smith and Philip R Kunz have calculated that " in the United States in 1960 there was only a 25% chance that a man would have descendants with his last name 13 generations later"

The surname paradox
How do the predictions of simple branching theory relate to the mix of surnames today? Well at first sight not at all. The branching process predicts that after k generations (from a single ancestor), the number of descendants will either be zero or a large number. But -as investigations of GRO registers and telephone directories in the UK and US show- 40% of the surnames listed are singletons.
(In Ken Tucker's survey of US surnames, he found that -excluding the surname Smith- one was more likely to bear a rare surname than a leading surname)
  • Does this imply that the ratio of singletons was once extremely high, and is gradually declining?
  • Or is the decline being arrested by the influx of new names or the creation of double-barrelled names?

And would a branching process that takes into account the interplay between a mixture of both multi- and single- originators be more successful in predicting the current distribution-profile of surnames?

 

Marbles, surnames and a random walk

A drunk leans against a wall. He can move left or right with each step with equal probability. If he keeps at it long enough, he will return to his starting place, and indeed every point of the wall. Take the wall away, the drunk can now step north, south, east and west with equal probability. Again- given enough time- he is sure to return to his starting position.

If instead, I bet on whether I draw a red or black marble out of a pot with just 1 of each colour (and returning the chosen marble each time) then my winnings or losings take a similar 1 dimensional random walk. Given enough time, my winnings and losses cancel each other out.

But say I commence with a pot containing just 1 red and 1 black marble , still replacing the marble chosen, but instead this time adding a marble of the same colour. I have introduced positive feedback. Success will breed sucess, and the random drift will regulate itself. But what will be the outcome in the long-run. The probability of success in the first marble scenario was 1/2, what would it be in this one? Will it be the same, 0 or 1? The answer is that at the outset all values between 0 and 1 are equally likely.

"The first steps of this random walk settle its fate : the initial fluctuations lock themselves in. The random events of the past affect those in the future. What started as a fair contest between Black and Red can end up as a very one-sided affair. In every single experiment, a law seems to emerge: after a few thousand draws, we can very confidently predict that the chances for Red to turn up are such and such. But for every repetition of the experiment (starting anew with one red and one black marble in the urn, a different law emerges: the such and such isn't the same" Sigmund, Karl. Games of Life : Explorations in Ecology, Evolution and Behaviour

                 
                 
1
 

             




Proportion of red marbles

 

               
                 
                 
                 
                 
                 
0.5                
                 
                 
                 
                 
                 
0                
     
Time

         
                 
                 

Proportion of red marbles from 0 to 1 over time, for repeated trials

 

Does this marble model help us understand the current distribution of surnames?
I like to think of a pot containing thousands of colours of different hues, each representing a surname. The starting position being that some hues are better represented than others. Some hues will change as surname variations set in.
In this model, will those names which start well-represented always be the winners, however many times the starting position is re-commenced?

With the marble model, whatever happened in the past affects the present.
Whatever decisions our ancestors took in making bynames hereditary, chance decisions and fertility, over the first few crucial generations locked-in that surname's future


Back to branching (and name survival)

Example 1

Z 0 =1           O        
                         
                         
                         
Z 1 =2     O           O  
                         
                         
Z 2 =3   O     O       O  
                         
                         
                         
Z 3 =5 O   O   O     O   O
                         
                         
                         
              end         end
Z 4 =5 O O   O     O   O  
                         
                         
        end                
Z 5 =4 O   O   O   end   O  
generated with xls2html converter
Z is the generation for 5 generations, Z0 to Z5.
These have been colour-banded to show clearly the number of offspring at each point. For example Z4=5 means the fourth generation has 5 members (a member being denoted by a 0 symbol)
Individual lines might become extinct; as denotes by tramlines

In this example, the number of offspring has been restricted to two.

The founder has 19 offspring (4 of whom do not re-produce)

The mean of the offspring distribution equals 19/16 (1.1875), so the line will have a
positive chance of persisting (ignoring gender)

Example 2

In this example (from Keyfitz p332), the probabilities are set as follows:-

As discussed earlier, the equation for x, the probability of extinction is

x=0.25+0.25x+0.25x2+0.25x3,

 

which simplifies to

x3+x2-3x+1=0

This equation has two roots (but setting aside the root x=1), dividing by x-1
results in the quadratic

x2+2x-1=0

This resolves by completing the square to:-

x = -1 + \/_2 = 0.414

This hypothetical population has net reproduction rate of 1.5

(Editor's note: at this point, I hope you understand that I hardly understand a word of all this, but it feels as if it ought to be fascinating)

Historical probability rates

The following are reproduced to give a broad idea, and should be used with the greatest of circumspection

1) Lotka in his study of the child-bearing chances of the wives of white males in the 1920 US census, derived:-

2) Sturges and Hackett derived the following table from a study of english genealogical records

Number of males who marry 0 1 2 3 4 5 6
Probability .317 .364 .209 .080 .023 .005 .001
e.g. the probability is 0.364 that a family will produce 1 son destined to marry

Obviously these rates are not comparable between countries, eras, social or ethnic groups. And individual families will have different fertility rates. Presumably S & H was a limited study, and consequently needs qualifying.

Modelling surname extinction

The ESRC Surnames Project is attempting to :-

"build a model of how an initial population of surnames grows, diffuses and mutates through many generations"; a model that will produce results consistent with known national and local scalings. This model would be based on a random walk with diffusion on a large lattice, but some of the random steps would be large leaps - a Levy flight- in order to simulate the diffusion caused through migration.

Summary

 

Based on the work of :-
D.G.Kendall, Keyfitz, K. Sigmund, (Taneyhill,Dunn & Hatcher),the Galton website, ESRC Surnames Project Symposium paper (unpublished)

Rain, midnight rain, nothing but the wild rain
On this bleak hut and solitude and me
Remembering again that I shall die
And neither hear the rain nor give it thanks
For washing me cleaner than I have been
Since I was born into this solitude........
..........
What should we do to rate the long alas
But skeeter down a steeper gradient?
And then some falls are still more fortunate,
The meteors spent , the tragic heroes stunned
Who go out like a light. But here the
Chip,chip,chip will flake the stone by slow degrees
For hour on hour, the fire will gutter down,
The bird will call at longer intervals
extracts from the first and last stanzas of two favourite poems of melancholy

The surname pool

The question "How many surnames are there" has little meaning, since unique surnames are constantly appearing and disappearing. Some unique names disappear through marriage: others (e.g. the recent fashion of combining each partner's name into a new double-barrelled equivalent) are created at marriage. Emigration and immigration also play their parts.
There is a complex relationship between the ever-changing size of the population and the number of names borne.

Perhaps the best way is to illustrate this at a handable level. You want to get some idea of the extent of surnames in you locality. You stand in your high-street or village green, and sample/interview a selection of people who pass you by. In recording the results, you decide to work with 2 constants

-s is the number of different surnames in a sample : n is the number of persons.

The first 20 people encountered all have different names (so s/n= 1). As you count more, one starts to get duplicates, so that after 200 people there might be three duplicated names, say, giving n= 200, s= 197; s/n= 0.985. After 1000, one might have only 950 names, giving s/n= 0.95.

What if one continues sampling day after day? What happens with the relationship between s/n and n? Will s/n always decrease as n increases? Perhaps -because the stock of surnames is limited- so as you counted more and more people you would eventually reach a stage at which one had met all the surnames before. s would then remain constant, even if one comtinually included more people (increased n).
Or is n so large, that one will always encounter new surnames, however many people one samples? In effect, this will mean that s/n is then constant as n increases. And if this option is correct, how quickly does one will approach a steady state?

This is a fundamental question in the mass study of surnames, and relates to how many unique surnames ('singletons') there are in the total population.


S and N

s is the number of different surnames in a sample : n is the number of persons.
Thus, in a sample of 100 people, each with a different surname,
S/N = 100/100 = 1

or each with the same surname
S/N = 1/100 = 0.01

The formula S/N is thus an indicator of the diversity of the surname pool. This might be on a local or regional level. However, with the advent of nationally available data eg telephone directories on cd, and electoral rolls on CD , it is now possible to take samples indicative of the national level (Beware sampling correctly at a national level is a task that needs some thought).

The above is based on Trevor's work (often verbatim)



Other Big Questions

We are entering a new era in surname study. Previously surnames were studied as individual entities : now they can be studied as a mass phenomenum.

Let's pose some big questions that might now be answerable (and indeed are being tackled by amateurs).
I am sure that you could add a few of your own. Here is one that may concern you first:-

Last revised: October 03, 2004