| Macro Level Sampling | Trevor covers the mathematics of surnames at a national level; estimates of total size; surname extinction |
| Micro Level Sampling | An analysis of surnames at a discrete level -parish, ward or town (in outline at the moment) |
Perhaps
the most surprising thing about surnames is how many very rare
names there are.
We are well aware of the Smiths and Joneses, but there are very
few such common names. On the UK Info Disc 2000, which claims to
include the UK electoral roll, about 42% of names occur once, 16%
of names occur twice, 7% occur three times, and so on with ever
decreasing numbers. Phonebooks are not such a good sample of the
population, but they show a similar pattern. In the telephone
directories for all England and Wales from about 1980 (when there
were far fewer ex-directory subscribers than today), about 45% of
names occur just once, which agrees well with the Info Disc
figure.Similar results have been found in a study of names in the
Swiss telephone directories (Barrai, I., Scapoli, C.,
Beretta, M., Nesti, C., Mamolini, E., and Rodriguez-Larralde, A.,
Annals of Human Biology (1996), vol 23, pp 431-455).
The reason why we are aware of the common names is that there are more people with them.There are more than half a million Smiths in Britain.However, as far as we can estimate, there are perhaps 200 000 people with names which only occur once in this country, around 140 000 people whose names occur twice, and about 110 000 people whose names occur three times.
Mathematics - and the total number of names
If
we plot the logarithm of the number of surnames (n) occurring
once, twice, three times, etc, against the logarithm of the
number of times (x) a surname occurs (once, twice, etc,) we get
close to a straight line.
The graph shows the distribution of a thousand names selected at
random from the UK Info Disc 2000.
Leading
names -like Smith and Jones- are positioned close to where the
graphline dips to the horizontal x axis; conversely the hundreds
of names which occur only once are clustered near to the vertical
axis.
![]() |
Mathematically
this means that n must be related to x by an expression of the
form
n = b x-a - expression 1
where b
and a are constants.
It can be seen from the graph that the 1000-name sample is fitted well if b = 324 and a = 1.49 approximately. However, if we try to match this up to the frequencies of the commoner surnames, we find that there are fewer common names than we would expect from this expression. Matching up the 1000-name sample shown in the graph with the common-names information suggests that overall the frequency distribution of names in Britain is given by something like
n = 200000 x-1.5 1.025-c - expression 2
where c = x0.4
Expression
2 fits the data well all the way from the 200000 or so unique
names to the half-a million people called Smith.It is just a
curve fitted to the data and has no fundamental significance, but
it enables us to estimate the total number of surnames in
Britain, something which is surprisingly hard to measure directly.
Expression
2 suggests that there are a little under half a million surnames
in Britain.
All
of the above assumes that we define every spelling variation as a
different surname.If we try to group variants together - for
example regarding Clark and Clarke as the same name, then the
number of surnames depends solely on how we group the names.
The problem of getting a sample
The
above results depend on taking a random sample of names in a
population, so that each name, however common or rare, has an
equal chance of being selected. Having selected the name, we
count how many people have that name. Random sampling of names is
difficult. It is easier to pick people at random from a
population to form a sample, and then count how many times each
surname occurs in the sample, but this gives a different result,
and does not generally give a true picture of the frequency
distribution of names in the population. The random-person method
gives a form of distribution usually called the Yule distribution.
It crops up in various other systems, such as the distribution of
word frequency in texts, and was explained on the basis of very
simple assumptions by Herbert Simon, a Nobel Prize-winner for
economics. (Simon,
H.A., Biometrika, (1955) vol 42, pp 425-440)
I discussed this distribution in relation to surnames in an
article in the Journal of One-Name Studies vol 6 No.6, pp 119-124
(April 1998), although various aspects of that article have been
overtaken by later work summarised above.
How does the distribution arise?
It
seems incredible that, after 25 generations or so of hereditary
surnames in England, there are so many unique or very rare
surnames.One factor is that surname generation is still going on,
through immigration, double-barrelling, or deliberate creation of
new spelling variants.It ought to be possible to explain the form
of the distribution shown in the graph in terms of generation and
extinction processes, but this has not yet been done convincingly.
Probability theory has tackled one aspect - the chance that a
name with just one holder will become extinct. The probability of
extinction turns out to be an astonishing 89%, although the exact
figure depends on the assumptions made about the probability of
having no sons, one son, two sons, etc. This calculation is given
in various texts on advanced probability, such as 'Probability' by Peter Whittle (John
Wiley, 1970) ISBN 0-471-01657-8, pp 124-125.This means that it is very probable that any one
male 25 generations
ago will have no male descendants today, and so a name with just
one holder would have become extinct.
If there is indeed a 0.89 probability of extinction of a name with just one holder, the probability of a name with n holders instead of just one becoming extinct is 0.89n. For example, if there were five holders of a name, the chances of none of them eventually having any male descendants is 0.895, which equals about 0.56, so in this case there is still a better than even chance that the name will become extinct. However, if there are 50 holders of a name in the first generation, the probability that the name will become extinct is only 0.3% - a 99.7% chance that the name will survive.
The theoretical result has been roughly confirmed by Christopher Sturges and Brian Haggett, who did a computer simulation to calculate the number of descendants from each member of an original population. They found that after 23 generations, about 76% of the original population had no male descendants, so if a name had just one holder in the original population, there was a 76% probability that it would become extinct - the difference from 89% is probably mainly because they made different assumptions about the chances of having particular numbers of sons in each family. This work was published as Sturges and Haggett, 'Inheritance of English Surnames' (Hawgood Computing, 1987) ISBN 0-948151-02-1.
This
contribution is dedicated to a Miss Smith I know who at the time
of writing is about to marry a Mr Fegyveres.
Perhaps they will have many sons!
©
2000 Trevor Ogden
The Micro-Level
Parish
Level
Rex
Watson in Local
Population Studies 15 sampled a group of 8 contiguous Cambridgeshire
parishes.
He did this by extracting the occurrences of the top 50 surnames
in each parish and allocating these to three
separate periods - 1538-1640...1641-1740..1741-1840.
It was then possible to compare the number of surnames common to
each parish, and to compare periods. The result
was an idea of how names survived or spread over time in these
parishes. A comparison was also made with the Lancashire chapelry
of Colne. Another technique used in this article was to take
names common to pairs of parishes and graph these against the
distance apart. Not surprisingly, it was found that the number of
names common to pairs of parishes tends to decrease, as the
distance apart increases.
| Comparative cumulative frequency table for 2 parishes |
| Colne (Lancs) | 1599-1653 | South Cambs. | 1539-1640 | |||
| 1 Chapelry | % of total | Cum Freq | Rank | 8 Parishes | % of total | Cum Freq |
| Hartley | 9.9 | 9.9 | 1 | Prime | 1.6 | 1.6 |
| Hargreaves | 4.2 | 14.1 | 2 | Taylor | 1.3 | 2.9 |
| Smith | 3.9 | 18 | 3 | Fuller | 1.3 | 4.2 |
| Emmott | 3.5 | 21.5 | 4 | Rayment | 1.2 | 5.4 |
| Robinson | 2.7 | 24.2 | 5 | Newman | 1.1 | 6.5 |
| Blakey | 2.5 | 26.7 | 6 | Beavis | 1.1 | 7.6 |
| Baldwin | 2.1 | 28.8 | 7 | Rogers | 1.1 | 8.7 |
| Walton | 1.9 | 30.7 | 8 | Gillson | 1.0 | 9.7 |
| Hogate | 1.7 | 32.4 | 9 | Collis | 1.0 | 10.7 |
| Wilson | 1.7 | 34.1 | 10 | Barnes | 0.95 | 11.65 |
| c65% | 50 | c35% |
| 'LPS 15 -Rex Watson 'A study of surname distribution in a group of Cambridgeshire parishes, 1538-1840 |
Note
the difference between Lancashire and Cambridgeshire. The table
reveals that there are far more names in these Cambridgeshire
parishes than the Lancashire equivalent, but that " a name of the Lancashire
area was on the whole likely to be possessed by more people than
was the case in Cambridgeshire" This can perhaps be best seen in the
following ogive.
This higher number of holders of a particular name probably meant
a higher survival rate in Lancashire

Is the inference that Lancashire names are more stable than Cambridgeshire ones?
.......................................................................................................................................................................
Grace Wyatt has reconstituted the parish of Nantwich and has studied the relative persistence of its surnames.
| Period | Number of names | Names carried forward | New names | Losses |
| 1680-9 | 347 | - | - | - |
| 1740-9 | 422 | 175 | 247 | - |
| 1800-9 | 418 | 173 | 245 | 169 |
| 102 surnames were present in all 3 periods : 8% of the total | ||||
| Grace Wyatt Population change and stability in a Chesire parish during the Eighteenth Century -Local Population Studies 43, 1989). | ||||
Surname losses (other than migration) :-
"...for most of the period under review, the population was barely replacing itself. The average family size was only 4.33 children, and as infant and child mortality together was approaching 500 per thousand in each generation, it may be predicted that something like one quarter of all families would not have had sons of marriageable age to carry on the name. However, this event would not automatically mean the disappearance of a name as more than one family might have possessed it."
Conversely, some names proliferated
| Name | 1680-9 | 1740-9 | 1800-9 |
| Walker | 1 | 5 | 14 |
| Dutten | 3 | 8 | 12 |
| Wilkinson | 1 | 6 | 12 |
| Davies | 1 | 8 | 11 |
| Bowker | 1 | 4 | 8 |
Conclusion:-
"The study of surnames [in
this parish] has shown that whilst there is a great deal of
change among surnames over time, some surnames persist over long
periods".
Such a stable core though not very large, could still be detected.
..................................................................................................................................................................................
Core Families - External influences
Betty Halse is researching population mobility in the moorland
village of Levisham in north Yorkshire. Local Population Studies (65) contains an interim
report on this historically community of small farmers. A by-product
of her research has been the identification of core families in
the parish over a long period 1541 to 1881.
She has found that during the period 1541-1850, out of the
original 17 core family names, only 6 survived to be recorded in
the period 1801-1850, and only 3 after that.
The rate of introduction of new names into the parish records for
each 50 year period (1541-1800) ranged from 17 to 24 . For 1801-1850,
it doubled to 45. New names appeared in the parish following the
1770 Enclosure award, as "a new style of farming developed which
required both new capital and new ideas, and this attracted new
men who were not rooted in the village in the way their
predecessors had been"
Core families -Personal influences
At the other end of England, Barry Stapleton in his study of the
Hampshire parish of Odiham discerns a tendency for low fertility
and high infant mortality to result in a higher chance for a
single surviving off-spring to succeed to the whole of their
parent's property. High fertility and low infant mortality might
result in a partible inheritance, and over several generations
result in a decline in that family's fortunes, and therefore
might propel the family out of an area.
(B Stapleton Family
strategies: patterns of inheritance in Odiham, Hampshire, 1525-1850
in the journal
Continuity and Change, 1999, Vol 14 pp 385-402)
| Which of
the above factors most influenced the stability of a core
family in a region- the flexibility to respond to outside factors e.g. a changing economic environment ? or the more personal factor of fertility rate? |
Town
Michael
Williams conducted a name analysis of the Monmouth town of
Caerleon in his 'Researching local history'. His purpose was to study
the relationship between traditional Welsh patronymics and the
incursion of English surnames.
Williams commenced with the town's baptismal parish register in
1700, and ended in 1899.
(For completion chapel baptisms were also included)
For each block of 25 year, the following was collected:-
From the resulting data, graphs were charted of:-
The
resulting graphs show that before 1750, Caerleon had a largely
hereditary patronymic profile. On further local history research,
a marked fluctuation in the graphs were accounted for by:-
Metal workers
arriving from the West Midlands to work in a new iron forge and
tin works in 1750-1760.
| Michael Williams' study shows the importance of not treating surname analysis in isolation, but in conjuction with the history of a community |
Surname
profile- A City- 1871
Table
is meant as an example only -treat with care, as this is my
interpretation of the column headings
| % | City | City |
| 1871 Census- District A? | 1871 Census-District B? | |
| Number of names | Households sampled | Households sampled |
| 10 | 12.6 | 15.9 |
| 20 | 36.6 | 50.2 |
| 30 | 72.2 | 109.3 |
| 40 | 123 | 200 |
| 50 | 185 | |
| 60 | 278 | |
| Source | 1200 households in frame | 13200 households in frame |
More help required here. Has anyone performed a surname profile analysis of an urban enumeration district? And correlated the results with other information e.g. age, birthplace, occupation
Further
Work
Possible group
projects might be:-
Even more intricate work could be done on the numbers of type of surname (if identifiable as locative, personal etc), and percentage influx of new names
At present I am commencing a study of the surnames listed in the Portsmouth Burgess Rolls of 1900/01. Will Portsmouth -with its strong naval tradition- be a melting pot of UK names? The Burgess Rolls will reveal only a section of the population -but that section will be one that has put down roots, and consequently the surnames are less likely to be held by transients. Ideally, I should compare the results with a sample take from the 1881 Portsmouth census.
This will be a testbed of the techniques already discussed to reveal:-
Also, I am
interested in how the initial results of the start ward, are
affected as the area grows larger by feeding in the results from
more wards. Will more previous singletons merge, than are bought
in by the new ward? How will S/N change?
If a name predominates in 1 or 2 wards, does that suggest a
localised kinship?
Initial results
As these are the
results for the first 2 wards, they are meant just to illustrate
the techniques, rather than being definitive. They will change as
I include neighbouring wards.
There is a remarkable consistency between the figures for the 2
wards The number of surnames is perhaps 10% larger, due to
spelling inconsistencies e.g. Cleal and Cleall are treated as
individual names.
| Mile End | St Mary's | ||||||
| S | 1329 | 1309 | |||||
| N | 2291 | 2237 | |||||
| S/N | 0.58 | 0.57 | |||||
| names | householders | names% | names | householders | names% | ||
| Ones | 917 | 917 | 69% | 926 | 926 | 70% | |
| Twos | 222 | 444 | 17% | 208 | 416 | 16% | |
| Threes | 82 | 246 | 6% | 77 | 231 | 6% | |
| Fours | 34 | 136 | 2.6% | 27 | 108 | 2.1% | |
| Fives | 25 | 125 | 1.9% | 17 | 85 | 1.3% | |
| Sixes | 14 | 84 | 1.0% | 12 | 72 | 0.9% | |
480
surnames are common to both Mile End and St Mary's Ward - of
which 220 are singletons. In other words, over 60 % of surnames
are different in the contiguous wards (whose centres are only a
couple of miles apart). The combining of the 2 wards results in a
larger influx of singletons, than the number deleted by merger.
Deleted: 220 : Influx 706. Some of these 706 unite with multiples
in the core ward. Result: the % of singletons in the combined
ward declines to 64%
| Ones | 1385 | 64% |
| Twos | 262 | 12% |
| Threes | 153 | 7% |
| Fours | 91 | 4% |
| Fives | 46 | 2% |
| Sixes | 23 | 1% |
Core
Ward: Mile End
Cumulative Frequency graph of the 1329 names
Note the straight line where the singletons commence

Graph
2
The top names in Mile End compared
to 1853 England and Wales equivalents
Welsh names seem to be under-represented in this Ward : this may
be compensated in other wards.
Portsmouth has had an association with South Wales, e.g.migration
from Pembroke Docks.

Graph
3
The baseline is the leading names
of England and Wales in 1853, against which the ward is being
compared.
(Assuming a stability in surname percentages in the intervening
50 years)
Columns= Mile End 1900
Again, Welsh names are under-represented; whilst the name White
is present in almost double its national percentage.

Possible Extinctions
Chigwidgeon
Janenay
Local Names (25 mile radius)
Cawte, Edney, Oakshott, Privett, Stares, Brading, Burt, Damerum, Mengham,
Sheffield
1841 : A Comparison
David Hey in 'A
History of Sheffield' devotes a few pages to the surnames of the city. (Incidentally,
this is the only instance that I know of the study of the
surnames of an urban area, rather than a county or hundred. More,
please.
In the 1841 Sheffiled census, he finds that the following are the
most common surnames
1841 Census Sheffield : Rank: Name: Occurrences |
||||||||||||||||||||||
1 |
Smith |
1624 |
5 |
Hall |
660 |
9 |
Johnson |
580 |
13 |
Brown |
567 |
17 |
Green |
521 |
21 |
Barker |
466 |
|||||
2 |
Taylor |
1022 |
6 |
Turner |
628 |
10 |
Ward |
573 |
14 |
Haigh |
563 |
18 |
Marshall |
503 |
22 |
Clark |
452 |
|||||
3 |
Wilson |
925 |
7 |
Shaw |
605 |
11 |
Thompson |
572 |
15 |
Jackson |
541 |
19 |
White |
486 |
23 |
Roberts |
450 |
|||||
4 |
Walker |
678 |
8 |
Wright |
586 |
12 |
Rodgers |
571 |
16 |
Parkin |
531 |
20 |
Lee |
485 |
24 |
Robinson |
443 |
|||||
An unrealistic comparison, I know, not comparing like-with-like timewise, but indulge me
The surname
profile is markedly different to that of Portsmouth (Mile End)
1900.
Only 3 names are common to both - Smith, Taylor and Brown
Note the complete absence of Jones (does not even reach the top
50 names). Welsh migration to Sheffield came later.
Wilson and Walker are northern names, and would be expected to
have a high occurrence, as is Parkin. Johnson and Haigh suggest
migration from Scotland
A more
contemporary post-war analysis can be found in Johnston (1971) Resistance to migration and
the mover/stayer dichotomy : aspects of kinship and population
stability in an English rural area.
Johnston was
examining the kinship factor as a deterrent to migration through
a study of the common surname groups of the Yorkshire Nidderdale
in the period 1951-1961 through the use of the electoral rolls.
As a check on the localisation of the surnames under
consideration, as a comparison, he counted the number of entries
in the following telephone directories, and adjusted the
resulting count with a weighting factor.
The following is a much pared version of his results, to
concentrate on urban areas in a general north to south swathe,
and the less localised of his surnames.
| Name | Middlesbrough | Leeds | Leicester | London | Portsmouth | Exeter | Mean |
| Ashby | 26 | 38 | 68 | 54 | 32 | 46 | 44 |
| Barker | 722 | 571 | 439 | 200 | 164 | 168 | 377 |
| Clough | 64 | 143 | 17 | 9 | 7 | 14 | 42 |
| Fawcett | 268 | 202 | 32 | 17 | 28 | 20 | 95 |
| Harris | 232 | 357 | 756 | 704 | 857 | 114 | 503 |
| Hudson | 352 | 619 | 207 | 130 | 15 | 114 | 240 |
| Pratt | 225 | 155 | 293 | 84 | 207 | 177 | 190 |
| Simpson | 768 | 524 | 463 | 235 | 285 | 257 | 422 |
| Smith | 2857 | 3024 | 4219 | 2000 | 2857 | 1857 | 2802 |
| Thompson | 1857 | 1083 | 732 | 416 | 500 | 457 | 841 |
The weighted counts that are above the median are highlighted. Nearly all the names seem to exhibit a north or south bias. The surnames Ashby and Smith appear to be names predominantly found in the Midlands. (The repeat of 2857 in the Smith row is not a typo). It will be interesting to see if the high occurrence of Harris is repeated in my Portsmouth study.
If
you came to this page directly, then please access
Modern British Surname
Studies
Last revised: March 25, 2001.