Surname profiling

 

 

Macro Level Sampling Trevor covers the mathematics of surnames at a national level; estimates of total size; surname extinction
Micro Level Sampling An analysis of surnames at a discrete level -parish, ward or town (in outline at the moment)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Most names are very rare

 

Perhaps the most surprising thing about surnames is how many very rare names there are.
We are well aware of the Smiths and Joneses, but there are very few such common names. On the UK Info Disc 2000, which claims to include the UK electoral roll, about 42% of names occur once, 16% of names occur twice, 7% occur three times, and so on with ever decreasing numbers. Phonebooks are not such a good sample of the population, but they show a similar pattern. In the telephone directories for all England and Wales from about 1980 (when there were far fewer ex-directory subscribers than today), about 45% of names occur just once, which agrees well with the Info Disc figure.Similar results have been found in a study of names in the Swiss telephone directories
(Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E., and Rodriguez-Larralde, A., Annals of Human Biology (1996), vol 23, pp 431-455).

 

The reason why we are aware of the common names is that there are more people with them.There are more than half a million Smiths in Britain.However, as far as we can estimate, there are perhaps 200 000 people with names which only occur once in this country, around 140 000 people whose names occur twice, and about 110 000 people whose names occur three times.

 

Mathematics - and the total number of names


If we plot the logarithm of the number of surnames (n) occurring once, twice, three times, etc, against the logarithm of the number of times (x) a surname occurs (once, twice, etc,) we get close to a straight line.
The graph shows the distribution of a thousand names selected at random from the UK Info Disc 2000
.

Leading names -like Smith and Jones- are positioned close to where the graphline dips to the horizontal x axis; conversely the hundreds of names which occur only once are clustered near to the vertical axis.


Mathematically this means that n must be related to x by an expression of the form

 

n = b x-a         - expression 1

 

where b and a are constants.

It can be seen from the graph that the 1000-name sample is fitted well if b = 324 and a = 1.49 approximately. However, if we try to match this up to the frequencies of the commoner surnames, we find that there are fewer common names than we would expect from this expression. Matching up the 1000-name sample shown in the graph with the common-names information suggests that overall the frequency distribution of names in Britain is given by something like

 

n = 200000 x-1.5 1.025-c    - expression 2

where  c = x0.4

 

Expression 2 fits the data well all the way from the 200000 or so unique names to the half-a million people called Smith.It is just a curve fitted to the data and has no fundamental significance, but it enables us to estimate the total number of surnames in Britain, something which is surprisingly hard to measure directly.

Expression 2 suggests that there are a little under half a million surnames in Britain.

All of the above assumes that we define every spelling variation as a different surname.If we try to group variants together - for example regarding Clark and Clarke as the same name, then the number of surnames depends solely on how we group the names.

 

The problem of getting a sample

 

The above results depend on taking a random sample of names in a population, so that each name, however common or rare, has an equal chance of being selected. Having selected the name, we count how many people have that name. Random sampling of names is difficult. It is easier to pick people at random from a population to form a sample, and then count how many times each surname occurs in the sample, but this gives a different result, and does not generally give a true picture of the frequency distribution of names in the population. The random-person method gives a form of distribution usually called the Yule distribution. It crops up in various other systems, such as the distribution of word frequency in texts, and was explained on the basis of very simple assumptions by Herbert Simon, a Nobel Prize-winner for economics. (Simon, H.A., Biometrika, (1955) vol 42, pp 425-440)
I discussed this distribution in relation to surnames in an article in the Journal of One-Name Studies vol 6 No.6, pp 119-124 (April 1998), although various aspects of that article have been overtaken by later work summarised above.

How does the distribution arise?

 

It seems incredible that, after 25 generations or so of hereditary surnames in England, there are so many unique or very rare surnames.One factor is that surname generation is still going on, through immigration, double-barrelling, or deliberate creation of new spelling variants.It ought to be possible to explain the form of the distribution shown in the graph in terms of generation and extinction processes, but this has not yet been done convincingly.
Probability theory has tackled one aspect - the chance that a name with just one holder will become extinct. The probability of extinction turns out to be an astonishing 89%, although the exact figure depends on the assumptions made about the probability of having no sons, one son, two sons, etc. This calculation is given in various texts on advanced probability, such as '
Probability' by Peter Whittle (John Wiley, 1970) ISBN 0-471-01657-8, pp 124-125.This means that it is very probable that any one male 25 generations ago will have no male descendants today, and so a name with just one holder would have become extinct.

If there is indeed a 0.89 probability of extinction of a name with just one holder, the probability of a name with n holders instead of just one becoming extinct is 0.89n. For example, if there were five holders of a name, the chances of none of them eventually having any male descendants is 0.895, which equals about 0.56, so in this case there is still a better than even chance that the name will become extinct. However, if there are 50 holders of a name in the first generation, the probability that the name will become extinct is only 0.3% - a 99.7% chance that the name will survive.

The theoretical result has been roughly confirmed by Christopher Sturges and Brian Haggett, who did a computer simulation to calculate the number of descendants from each member of an original population. They found that after 23 generations, about 76% of the original population had no male descendants, so if a name had just one holder in the original population, there was a 76% probability that it would become extinct - the difference from 89% is probably mainly because they made different assumptions about the chances of having particular numbers of sons in each family. This work was published as Sturges and Haggett, 'Inheritance of English Surnames' (Hawgood Computing, 1987) ISBN 0-948151-02-1.

This contribution is dedicated to a Miss Smith I know who at the time of writing is about to marry a Mr Fegyveres.
Perhaps they will have many sons!

© 2000 Trevor Ogden

 

 

 

 


The Micro-Level

Parish Level

Rex Watson in Local Population Studies 15 sampled a group of 8 contiguous Cambridgeshire parishes.
He did this by extracting the occurrences of the top 50 surnames in each parish and allocating these to three
separate periods - 1538-1640...1641-1740..1741-1840.
It was then possible to compare the number of surnames common to each parish, and to compare periods. The result
was an idea of how names survived or spread over time in these parishes. A comparison was also made with the Lancashire chapelry of Colne. Another technique used in this article was to take names common to pairs of parishes and graph these against the distance apart. Not surprisingly, it was found that the number of names common to pairs of parishes tends to decrease, as the distance apart increases.

Comparative cumulative frequency table for 2 parishes
Colne (Lancs) 1599-1653     South Cambs. 1539-1640  
1 Chapelry % of total Cum Freq Rank 8 Parishes % of total Cum Freq
Hartley 9.9 9.9 1 Prime 1.6 1.6
Hargreaves 4.2 14.1 2 Taylor 1.3 2.9
Smith 3.9 18 3 Fuller 1.3 4.2
Emmott 3.5 21.5 4 Rayment 1.2 5.4
Robinson 2.7 24.2 5 Newman 1.1 6.5
Blakey 2.5 26.7 6 Beavis 1.1 7.6
Baldwin 2.1 28.8 7 Rogers 1.1 8.7
Walton 1.9 30.7 8 Gillson 1.0 9.7
Hogate 1.7 32.4 9 Collis 1.0 10.7
Wilson 1.7 34.1 10 Barnes 0.95 11.65
             
    c65% 50     c35%
'LPS 15 -Rex Watson 'A study of surname distribution in a group of Cambridgeshire parishes, 1538-1840

Note the difference between Lancashire and Cambridgeshire. The table reveals that there are far more names in these Cambridgeshire parishes than the Lancashire equivalent, but that " a name of the Lancashire area was on the whole likely to be possessed by more people than was the case in Cambridgeshire" This can perhaps be best seen in the following ogive.
This higher number of holders of a particular name probably meant a higher survival rate in Lancashire

 

Is the inference that Lancashire names are more stable than Cambridgeshire ones?

.......................................................................................................................................................................

 

Grace Wyatt has reconstituted the parish of Nantwich and has studied the relative persistence of its surnames.

 

Period Number of names Names carried forward New names Losses
1680-9 347 - - -
1740-9 422 175 247 -
1800-9 418 173 245 169
102 surnames were present in all 3 periods : 8% of the total
Grace Wyatt Population change and stability in a Chesire parish during the Eighteenth Century -Local Population Studies 43, 1989).

 

Surname losses (other than migration) :-

"...for most of the period under review, the population was barely replacing itself. The average family size was only 4.33 children, and as infant and child mortality together was approaching 500 per thousand in each generation, it may be predicted that something like one quarter of all families would not have had sons of marriageable age to carry on the name. However, this event would not automatically mean the disappearance of a name as more than one family might have possessed it."

Conversely, some names proliferated

Name 1680-9 1740-9 1800-9
Walker 1 5 14
Dutten 3 8 12
Wilkinson 1 6 12
Davies 1 8 11
Bowker 1 4 8


Conclusion:-

"The study of surnames [in this parish] has shown that whilst there is a great deal of change among surnames over time, some surnames persist over long periods".
Such a stable core though not very large, could still be detected.

..................................................................................................................................................................................


Core Families - External influences

Betty Halse is researching population mobility in the moorland village of Levisham in north Yorkshire.
Local Population Studies (65) contains an interim report on this historically community of small farmers. A by-product of her research has been the identification of core families in the parish over a long period 1541 to 1881.
She has found that during the period 1541-1850, out of the original 17 core family names, only 6 survived to be recorded in the period 1801-1850, and only 3 after that.
The rate of introduction of new names into the parish records for each 50 year period (1541-1800) ranged from 17 to 24 . For 1801-1850, it doubled to 45. New names appeared in the parish following the 1770 Enclosure award, as "
a new style of farming developed which required both new capital and new ideas, and this attracted new men who were not rooted in the village in the way their predecessors had been"


Core families -Personal influences

At the other end of England, Barry Stapleton in his study of the Hampshire parish of Odiham discerns a tendency for low fertility and high infant mortality to result in a higher chance for a single surviving off-spring to succeed to the whole of their parent's property. High fertility and low infant mortality might result in a partible inheritance, and over several generations result in a decline in that family's fortunes, and therefore might propel the family out of an area.
(B Stapleton
Family strategies: patterns of inheritance in Odiham, Hampshire, 1525-1850 in the journal Continuity and Change, 1999, Vol 14 pp 385-402)

Which of the above factors most influenced the stability of a core family in a region-
the flexibility to respond to outside factors e.g. a changing economic environment ?
or the more personal factor of fertility rate?


Town
Michael Williams conducted a name analysis of the Monmouth town of Caerleon in his 'Researching local history'. His purpose was to study the relationship between traditional Welsh patronymics and the incursion of English surnames.
Williams commenced with the town's baptismal parish register in 1700, and ended in 1899.
(For completion chapel baptisms were also included)

For each block of 25 year, the following was collected:-

From the resulting data, graphs were charted of:-

The resulting graphs show that before 1750, Caerleon had a largely hereditary patronymic profile. On further local history research, a marked fluctuation in the graphs were accounted for by:-
Metal workers arriving from the West Midlands to work in a new iron forge and tin works in 1750-1760.

Michael Williams' study shows the importance of not treating surname analysis in isolation, but in conjuction with the history of a community

 

 

Surname profile- A City- 1871

Table is meant as an example only -treat with care, as this is my interpretation of the column headings

% City City
  1871 Census- District A? 1871 Census-District B?
Number of names Households sampled Households sampled
10 12.6 15.9
20 36.6 50.2
30 72.2 109.3
40 123 200
50 185  
60 278  
Source 1200 households in frame 13200 households in frame

More help required here. Has anyone performed a surname profile analysis of an urban enumeration district? And correlated the results with other information e.g. age, birthplace, occupation

 

 

Further Work
Possible group projects might be:-

Even more intricate work could be done on the numbers of type of surname (if identifiable as locative, personal etc), and percentage influx of new names

 


Bringing it all together

At present I am commencing a study of the surnames listed in the Portsmouth Burgess Rolls of 1900/01. Will Portsmouth -with its strong naval tradition- be a melting pot of UK names? The Burgess Rolls will reveal only a section of the population -but that section will be one that has put down roots, and consequently the surnames are less likely to be held by transients. Ideally, I should compare the results with a sample take from the 1881 Portsmouth census.

This will be a testbed of the techniques already discussed to reveal:-

Also, I am interested in how the initial results of the start ward, are affected as the area grows larger by feeding in the results from more wards. Will more previous singletons merge, than are bought in by the new ward? How will S/N change?
If a name predominates in 1 or 2 wards, does that suggest a localised kinship?

 

Initial results

As these are the results for the first 2 wards, they are meant just to illustrate the techniques, rather than being definitive. They will change as I include neighbouring wards.
There is a remarkable consistency between the figures for the 2 wards The number of surnames is perhaps 10% larger, due to spelling inconsistencies e.g. Cleal and Cleall are treated as individual names.

  Mile End       St Mary's    
               
S 1329       1309    
N 2291       2237    
S/N 0.58       0.57    
 
  names householders names%   names householders names%
Ones 917 917 69%   926 926 70%
Twos 222 444 17%   208 416 16%
Threes 82 246 6%   77 231 6%
Fours 34 136 2.6%   27 108 2.1%
Fives 25 125 1.9%   17 85 1.3%
Sixes 14 84 1.0%   12 72 0.9%

 

480 surnames are common to both Mile End and St Mary's Ward - of which 220 are singletons. In other words, over 60 % of surnames are different in the contiguous wards (whose centres are only a couple of miles apart). The combining of the 2 wards results in a larger influx of singletons, than the number deleted by merger. Deleted: 220 : Influx 706. Some of these 706 unite with multiples in the core ward. Result: the % of singletons in the combined ward declines to 64%

 

Ones 1385 64%
Twos 262 12%
Threes 153 7%
Fours 91 4%
Fives 46 2%
Sixes 23 1%

 

Core Ward: Mile End
Cumulative Frequency graph of the 1329 names
Note the straight line where the singletons commence

 

 

Graph 2
The top names in Mile End compared to 1853 England and Wales equivalents
Welsh names seem to be under-represented in this Ward : this may be compensated in other wards.
Portsmouth has had an association with South Wales, e.g.migration from Pembroke Docks.

 

 

Graph 3
The baseline is the leading names of England and Wales in 1853, against which the ward is being compared.
(Assuming a stability in surname percentages in the intervening 50 years)
Columns= Mile End 1900
Again, Welsh names are under-represented; whilst the name White is present in almost double its national percentage.

 

Possible Extinctions

Chigwidgeon
Janenay

Local Names (25 mile radius)

Cawte, Edney, Oakshott, Privett, Stares, Brading, Burt, Damerum, Mengham,

 


Sheffield 1841 : A Comparison
David Hey in '
A History of Sheffield' devotes a few pages to the surnames of the city. (Incidentally, this is the only instance that I know of the study of the surnames of an urban area, rather than a county or hundred. More, please.

In the 1841 Sheffiled census, he finds that the following are the most common surnames

1841 Census Sheffield : Rank: Name: Occurrences

1

Smith

1624

 

5

Hall

660

 

9

Johnson

580

 

13

Brown

567

 

17

Green

521

 

21

Barker

466

2

Taylor

1022

 

6

Turner

628

 

10

Ward

573

 

14

Haigh

563

 

18

Marshall

503

 

22

Clark

452

3

Wilson

925

 

7

Shaw

605

 

11

Thompson

572

 

15

Jackson

541

 

19

White

486

 

23

Roberts

450

4

Walker

678

 

8

Wright

586

 

12

Rodgers

571

 

16

Parkin

531

 

20

Lee

485

 

24

Robinson

443

An unrealistic comparison, I know, not comparing like-with-like timewise, but indulge me

The surname profile is markedly different to that of Portsmouth (Mile End) 1900.
Only 3 names are common to both - Smith, Taylor and Brown
Note the complete absence of Jones (does not even reach the top 50 names). Welsh migration to Sheffield came later.
Wilson and Walker are northern names, and would be expected to have a high occurrence, as is Parkin. Johnson and Haigh suggest migration from Scotland

 

 

A more contemporary post-war analysis can be found in Johnston (1971) Resistance to migration and the mover/stayer dichotomy : aspects of kinship and population stability in an English rural area.
Johnston was examining the kinship factor as a deterrent to migration through a study of the common surname groups of the Yorkshire Nidderdale in the period 1951-1961 through the use of the electoral rolls. As a check on the localisation of the surnames under consideration, as a comparison, he counted the number of entries in the following telephone directories, and adjusted the resulting count with a weighting factor.
The following is a much pared version of his results, to concentrate on urban areas in a general north to south swathe, and the less localised of his surnames.

Name Middlesbrough Leeds Leicester London Portsmouth Exeter Mean
Ashby 26 38 68 54 32 46 44
Barker 722 571 439 200 164 168 377
Clough 64 143 17 9 7 14 42
Fawcett 268 202 32 17 28 20 95
Harris 232 357 756 704 857 114 503
Hudson 352 619 207 130 15 114 240
Pratt 225 155 293 84 207 177 190
Simpson 768 524 463 235 285 257 422
Smith 2857 3024 4219 2000 2857 1857 2802
Thompson 1857 1083 732 416 500 457 841

The weighted counts that are above the median are highlighted. Nearly all the names seem to exhibit a north or south bias. The surnames Ashby and Smith appear to be names predominantly found in the Midlands. (The repeat of 2857 in the Smith row is not a typo). It will be interesting to see if the high occurrence of Harris is repeated in my Portsmouth study.


If you came to this page directly, then please access
Modern British Surname Studies
Last revised: March 25, 2001
.