Surname Distribution Studies

Vermeer Sources/Methods/Tools
Preface Spatial Analysis
Contemporary Geodemographics
1881 Census  
Of Boundaries and maps
e-Maps Further info
Printed maps  
The study of English local surnames
Guppy's 'local' names

If you came to this page directly, then please access
Modern British Surnames

 

 

 

 

Brief Checklist

Not an exhaustive list; just the basics

 

Preface Cautionary Note
Background Essential background reading :-
Colin Rogers
The Surname Detective
David Hey
Family Names and Family History
Oxford Companion to Local and Family History
Contemporary distribution Consult Telephone Directories
Electoral Registers
Survey of Contemporary Names (The distribution of 16,000 names available here)
1881 Census Extraction from census, manually or with LDS Companion
1851 Census No National Index; but most Family History Societies have indexed 'their' county
GRO data Births, marriages and deaths by registration district
Hearth Tax Many counties available in print or will be through the Roehampton program
Poll Tax The real thing : back almost to the era when heriditary surnames were formed

 

 

 

Preface

Before plunging enthusiastically into this topic, the following preface of the case against, might be salutary.

A) Variant or Not?

In the 1960's, and using the GRO birth indexes for 1850, Francis Leeson mapped the distribution of his name and what he regarded as its variants -Lee, Lees, Leigh, Leigh, Lea, Ley, Leese, Leeson, Leason. The resulting plots revealed discrete areas, which remained the same even when compared with a 1960's telephone survey..

A reply was made to this article by Dr Reaney, who criticised the distributions on several fronts:-

1) A plot of the modern spelling does not necessarily equate with the original form or distribution "Lee,Lea, Ley, Lay and Leigh are all one surname. They all go back ultimately to OE leah and both surnames and place-names have a variety of forms; the different modern spellings may be partly due to ME grammar, partly due to the local dialect or simply to mere chance...Parish Registers did not begin until long after surnames becam fixed; they are not necessarily proof of the original distribution."
It would have taken maybe just 1 fertile family to migrate in 1530, to give a false impression of the home of a name. Especially if that name did not appear elsewhere in mediaeval documents.

2) A plot cannot be made comparing a root name and its variants, unless one is totally sure that the supposed variant did derive etymologically from the root name. Reaney points out, that in his opinion, Leese (from OE laes- 'pasture') and Leeson - derived from 'son of Lece'- are not variants of the root names Lee, Lea, Ley, Lay and Leigh.
Reaney has subsequently been criticised for over-reliance on etymology, but I think his general points should be borne in mind by anyone plotting any kind of surname distribution.

B) Surname corruption

George Redmond in his study of Yorkshire surnames has shown the amazing variability of surnames.
"In addition to the obvious variations associated with the distortion of vowel sounds and the confusion when pronouncing consonants, the author draws attention to the remarkably high incidence of elision and truncation, as well as the introduction of so-called prosthetic consonants such as Y, W or S to preface some surnames beginning with a vowel. He also notes that the final consonant of a first name may transfer to the surname, citing Thomas Anderson alias Saunderson and John Nellis alias Ellis."
Book Review in The Escutcheon of - Surnames and Genealogy
Dramatic changes could also occur to the final syllable of surnames. For example -Whithalghe/Whitalk/Whitack and Astmough/Astmall/Asman/Asmond. Surnames such as these seem to have had very little stress on the final syllable - it was left to the listener to decide their own interpretation -often in perpetuity.

If you collect the occurrences of a name from say the Hearth Tax, how do you know that the name is what you think it is? -unless one investigate the genealogy of each bearer.
Surname dictionaries will be of little help, because they tend to ignore local corrupted forms. Surname dictionaries concentrate on the earliest form of a name : surname corruption comes much later

As George Redmonds says, each occurrence of a surname should be treated as being unique.

End of the cautionary preface

 

Snapshots


Part 1 - Contemporary


Some of the potentially really useful national and comprehensive sources are inaccessible to us - such as the National Health Service Central Register at Southport or the Social Security Central Register at Newcastle upon Tyne.

Plotting by postcode

  example number mailboxes households covered
All unit postcodes PO1 2ST 1.7 million 24 million 15-16 average
Postcode sector PO1 2 9,100 2,000  
Postcode districts PO1 2,900 20,000  
Postcode area PO 125 200,000  
above figures are not exact : check how many delivery points you own postcode covers

The following represents a rough guide to the percentages of the Scotland/Wales/England population in each postcode area and the proportion that are aged under 18. Non-mainland postcodes not yet included are for Belfast, Jersey, Guernsey and the Isle of Man. The figures should be taken as a rough guide.They were compiled prior to the publication of the ONS Census 2001 postcode area figures- but still appear to be in line

P code P area % GB pop % aged 0-17
AB Aberdeen 0.8  
AL St Albans 0.4 23
B Birmingham 3.0 25
BA Bath 0.7 22
BB Blackburn 0.8 26
BD Bradford 0.9 26
BH Bournemouth 0.9 19
BL Bolton 0.6 24
BN Brighton 1.4 15
BR Bromley 0.5 22
BS Bristol 1.6 22
CA Carlisle 0.5 21
CB Cambridge 0.7 21
CF Cardiff 1.7 24
CH Chester 1.1 23
CM Chelmsford 1.0 23
CO Colchester 0.7 21
CR Croydon 0.6 25
CT Canterbury 0.8 22
CV Coventry 1.3 23
CW Crewe 0.5 23
DA Dartford 0.7 24
DD Dundee 0.5  
DE Derby 1.2 23
DG Dumfries 0.3  
DH Durham 0.5 21
DL Darlington 0.6 22
DN Doncaster 1.2 24
DT Dorchester 0.4 21
DY Dudley 0.7 22
E London E 1.3 26
EC London EC 0.05 16
EH Edinburgh 1.4  
EN Enfield 0.5 23
EX Exeter 0.9 20
FK Falkirk 0.4 -
FY Blackpool 0.5 21
G Glasgow 2.1  
GL Gloucester 1.0 22
GU Guildford 1.2 23
HA Harrow 0.7 23
HD Huddersfield 0.4 23
HG Harrogate 0.2 22
HP Hemel Hempstead 0.8 24
HR Hereford 0.3 22
HS Harris 0.05  
HU Hull 0.7 23
HX Halifax 0.3 24
IG Ilford 0.5 25
IP Ipswich 1.0 22
IV Inverness 0.3  
KA Kilmarnock 0.6  
KT Kingston-upon-Thames 0.9 22
KW Kirkwall 0.1  
KY Kirkcaldy 0.6  
L Liverpool 1.5 24
LA Lancaster 0.6 21
LD Llandrindod Wells 0.1 22
LE Leicester 1.6 23
LL Llandudno 0.9 22
LN Lincoln 0.5 22
LS Leeds 1.3 22
LU Luton 0.5 26
M Manchester 1.8 23
ME Medway 0.9 24
MK Milton Keynes 0.8 25
ML Motherwell 0.6  
N London N 1.3 23
NE Newcastle-upon-Tyne 2.0 22
NG Nottingham 1.9 22
NN Northampton 1.0 24
NP Newport 0.8 24
NR Norwich 1.2 21
NW London NW 0.9 21
OL Oldham 0.8 26
OX Oxford 1.0 22
PA Paisley 0.6  
PE Peterborough 1.4 22
PH Perth 0.3  
PL Plymouth 0.9 22
PO Portsmouth 1.4 21
PR Preston 0.9 22
RG Reading 1.3 23
RH Redhill 0.8 17
RM Romford 0.8 17
S Sheffield 2.3 22
SA Swansea 1.2 22
SE London SE 1.5 23
SG Stevenage 0.6 24
SK Stockport 1.0 23
SL Slough 0.6 23
SM Sutton 0.4 23
SN Swindon 0.7 23
SO Southampton 1.1 22
SP Salisbury 0.4 22
SR Sunderland 0.4 22
SS Southend-on-Sea 0.9 23
ST Stoke-on-Trent 1.1 22
SW London SW 1.5 18
SY Shrewsbury 0.6 22
TA Taunton 0.5 21
TD Galashiels 0.2 20
TF Telford 0.3 24
TN Tunbridge Wells 1.1 23
TQ Torquay 0.5 20
TR Truro 0.5 21
TS Cleveland 1.0 24
TW Twickenham 0.8 22
UB Southall 0.6 25
W London W 0.9 18
WA Warrington 1.0 23
WC London WC 0.07 15
WD Watford 0.4 23
WF Wakefield 0.8 24
WN Wigan 0.5 23
WR Worcester 0.5 22
WS Walsall 0.7 24
WV Wolverhampton 0.6 23
YO York 0.9 21
ZE Lerwick 0.04  


Scotland

The population of Scotland at the 2001 Census was 5,062,011.
The population percentage of Scottish postcode areas oF 5,062,011 was about:-

AB 9.12   KA 7.23
DD 5.36   KW 0.99
DG 2.92   KY 6.86
EH 16.01   ML 7.23
FK 5.13   PA 6.38
G 23.01   PH 3.00
HS 0.52   TD 1.76
IV 4.05   ZE 0.43
Scottish Sector postcode populations : 2001 Census

 

Postcode Atlases
  • Geoplan Postcode Atlas -Geoplan (1997) isbn:0952761815 -also on CD-Rom
  • Postcode Atlas of Great Britain and Northern Ireland -Collins (Dec 2004) isbn 0007191979



UK Electoral Rolls on CDROM

Pluses

Minuses

Although the disk is expensive to purchase, there is a fee-based extraction service available from People Finders UK.

The Ward is a unit common both to Electoral and contemporary Census geography. To learn how the modern census is administered, plus a list of all hierarchical divisions -county, district, ward, enumeration district, visit the Census Dissemination Unit
1"Only 85% of those who said they did not vote in the 2001 general election were actually registered to do so and 29% of young people aged 18-24 and 19% of minority ethnic groups indicated in a sample survey that the reason for not voting was that they were not registered"
2"Looking at ethnic minority communities, 27% of black non-voters and 15% of Asian non-voters reported that they were not registered, although these figures were drawn from a small base-size"

UK parliamentary elections- numbers registered to vote
2001 44,403,238
1997 43,846,152
1992 43,275,316

Changes to the register tend to affect between 0.1% and 0.5% of electorate in any given month 
Sources: 1The Electoral Registration Process : Report and Recommendations (The Electoral Commission 2003) and
2Election 2001: the official results (Politico's 2001)

 

UK-INFO Disk

Pluses

Minuses

Up to now, the UK-Info disk could not be recommended for surname distribution analysis, where accuracy in the totality of numbers is so important. The latest disk seems at first to have a much better coverage as a percentage of the population. This is due however to the many duplications in entries caused by Postcode changes. Ensuring that the source is one of 'clean data' is vital in our study.

Telephone Directories

These now come in a variety of formats - Online, Cd-Rom, and printed. However, the telephone directory -whatever its format- suffers from a major proviso -the increasing number of unlisted telephone numbers.

"Although the national average for ex-dir is about 37% the figures do vary enormously between counties, being lowest in northern England and *much* higher in southern England.  So for any surname you will get perhaps 80% listed if they live in a northern county, but less than 50% listed in southern counties, especially East/West Sussex, Hampshire, Surrey, Kent etc.    This imbalance in ex-dir status can be significant in surnames with small numbers, but probably less so with the more common surnames." (John Wynn)
 

The latest online version -PhoneNetUk - is extremely disappointing for our purposes. A regional qualifier is mandatory (under the terms of the licensing authority) , so no national searches are possible. The inclusion of postal codes is erratic, and where they do appear are truncated to the outward code alone.
With the CD, national searches are allowable, but only the first 200 hundred entries are displayed (with full postcode). A tweak is possible to derive statistics of a surname by region, if the number of occurrences exceeds 200. A visit to the local library will probably be required to consult the printed telephone directories.

Colin Rogers has listed the disadvantages of using printed telephone directories:-

 

He adds:-

"British Telecom has an Archives and Historical Information Centre at 2-4 Temple Avenue, London EC4Y OHL which is open to the public...it holds an almost complete set of telephone directories from 1879 when the first publically available system was introduced into Great Britain."

Mr Rogers is sceptical about the usefulness of pre-1950 telephone directories for our purposes; the coverage of the population being so small. However, they might be useful as pointers for the study of relatively high frequent names.

 

National Health Service Central Register [NHSCR]

This database of 60 million names is not available in its entirety - but you can look at an individual frequency. The NHS Central Register is prone to list inflation, and some of the results are surprising, so treat with extreme caution. The whole database does have linguistic possibilities. For a paraphrased potted history of the NHSCR

Survey of Contemporary Surnames

Despite these limitations, a major and significant survey was conducted of the surnames of Britain, using the printed telephone directories 1980-1996. The survey was led by Patrick Hanks and Kate Hardcastle in order to establish those names deemed to be of significance for 'A Dictionary of Surnames' OUP, 1988. The result was 16,000 surnames with a frequency of more than 20 occurrences in any particular directory.
A full listing of the distribution of all the names can be found by following this link

This is a major survey, whose results are important to anyone wishing to compare surname frequencies and distributions, especially between 1881 and today. Of particular use in identifying homophonic surnames that have completely different distributions e.g. Adie and Adey. One Scottish: the other West Midlands.

 



International data sources
The publication of national telephone directories on CD has been used by geneticists to study isonymic rates for individual countries. Onomastic studies based on national datasets are much rarer, but hopefully will increase.

  Format dataset size (names) Publication based on data source
Austria 1996 telephone CD 4 million Barrai I and others. 'Elements of the Surname Structure of Austria.'
Annals of Human Biology 27, no. 6(November 2000-December 2000): 607- 22.
Belgium telephone CD
[
future online source]
  Barrai I.; Rodriguez-Larralde A.; Manni F.; Ruggiero V.; Tartari D.; Scapoli C. 'Isolation by Language and Distance in Belgium 'Annals of Human Genetics, January 2003, vol. 68, no. 1, pp. 1- 16(16)
Canada 1996 telephone CD 12 million D K Tucker 'Distribution of forenames, surnames and forename pairs in Canada' Names 50 no. 2 (June 2002), 105-132
Denmark Danish Central Civil Register 6.5+ million Sondergaard, Georg. 'Computer Databank of Danish Names' Names , no. 38(1990): 21-30.
Estonia Corpus Nominum Gentilium Estonicorum [online] c 74,000  
Finland     Poyhonen, Juhani. Suomalainen Sukunimikartasto . [Atlas of Finnish Surnames]. Helsinki: Suomalaisen Kirjallisuuden Seura, 1998.
France Insee datasets of births 1891-1915 and 1916-1940   Darlu, Pierre, Anna Degioanni, and Jacques Ruffie. 'Quelques Statistiques Sur La Distribution Des Patronymes En France.' Population [Paris]52, no. 3(1997): 607-34.
Germany telephone CD ?   Rodriguez-Larralde, A.; Barrai, I.; Scapoli, C. 'Isonymy and Isolation by Distance in Germany'. Human biology, 1998, vol. 70, no. 6, pp. 1041}
Israel   4 million+ Eliassaf, Nissim. 'Names Survey in the Population Administration : State of Israel.' Names , no. 29 (1981): 273- 84
Italy telephone CD ?   Barrai, I.; Rodriguez-Larralde, A.; Scapoli, 'Isonymy and Isolation by Distance in Italy'. Human biology, 1999, vol. 71, no. 6, pp. 947
Italy- Sicily telephone CD ?   Rodriguez Larralde, A. and others. 'Isonymy and the Genetic Structure of Sicily.' Journal of Biosocial Science 26, no. 1(1994): 9-24.
Japan     Miyazima S and others. 'Power-Law Distribution of Family Names in Japanese Societies.' Physica A 278, no. 1-2(April 2000): 282-88.
Netherlands Instituut Meertens [online] 27,000 'Grinding one's teeth. Linkage of surnames in the Database of Surnames in The Netherlands' by Leendert Brouwer 21st International Congress of Onomastic Sciences Uppsala, August 19-24, 2002
Norway      
New Zealand      
Russia     Balanovsky O.P., Buzhilova A.P., and Balanovskaya E.V. 'The Russian Gene Pool: Gene Geography of Surnames.' Russian Journal of Genetics 37, no. 7 ( July 2001 )
Spain telephone CD ?   Rodriguez-Larralde, A.; Gonzales-Martin, A.; Scapoli, C.; Barrai, I. 'The Names of Spain: A Study of the Isonymy Structure of Spain'. American Journal of Physical Anthropology, 2003, vol. 121, no. 3, pp.280-292
Switzerland 1994 Helvetic Telephone Directory   Barrai, I. and others. 'Isonymy and the Genetic Structure of Switzerland .1. The Distributions of Surnames.' Annals of Human Biology 23, no. 6(1996): 431-55
USA 1997 telephone directory CD 100 million D K Tucker 'Distribution of forenames, surnames and forename pairs in the USA' Names 49, no. 2 (2001): 69-96.
Venezuela telephone CD ?   Rodriguez-Larralde, Alvaro; Morales, Jorge; Barrai, Italo 'Surname Frequency and the Isonymy Structure of Venezuela'.American Journal of Human Biology, 2000, vol. 12, no. 3, pp. 352

Isonymic tables

 

 


Part 2 - Censuses

 

1881 Distribution

The 1881 census transcription -despite its known faults- is a marvellous tool for considering the frequency and distribution of names in the late nineteenth century.

The Guild of One-Name Studies has done important work in establishing baselines upon which to commence a study of individual names. The following table of conventions is based on the work of the 1881 Project- co-ordinated by Geoff Riggs

slt The number of surname occurrences at a sub-national level local
Snt The National total of surname occurrences National
n The population size of the area under study local
N The National Population size National
     
slt/Snt The percentage of occurrences local
slt/N The frequency : usually expressed per 1,000 or per 10,000 local
Snt/N The overall frequency National
(slt/Snt)/(n/N) The Density National

The density is an important indicator. If a surname was evenly distributed it would have a density of 1.
Geoff Riggs shows in his articles that reliance merely on the number of occurrences (s) is a misleading indicator.

For example, below are the 1881 county figures for my own name :-

County 1881 Population Number Total Occur % of 2514   significance   per 1000 Rank
  n s 2514 s/n s/S (s/S)/(n/N)      
HEREF 121,062 160 2514 6.36 0.06 13.65   1.322 1
BERKS 218,363 227 2514 9.03 0.09 10.10   1.040 2
WILTS 258,965 105 2514 4.18 0.04 3.94   0.405 3
GLOS 572,433 194 2514 7.72 0.08 3.29   0.339 4
HANTS 593,470 181 2514 7.20 0.07 2.96   0.305 5
WORCS 380,283 113 2514 4.49 0.04 2.89   0.297 6
SURREY 1,436,899 341 2514 13.56 0.14 2.31   0.237 7
RUTLAND 21434 4 2514 0.16 0.00 1.81   0.187 8
WARWICK 737,339 116 2514 4.61 0.05 1.53   0.157 9
NOTTS 391,815 50 2514 1.99 0.02 1.24   0.128 10
BUCKS 176,323 22 2514 0.88 0.01