![]() |
Sources/Methods/Tools
|
||||||
Of Boundaries and maps
|
|||||||
The study of
English local surnames
|
|||||||
If you came to this page directly, then please access
Modern British Surnames
Not an exhaustive list; just the basics
| Preface | Cautionary Note |
| Background | Essential background
reading :- Colin Rogers The Surname Detective David Hey Family Names and Family History Oxford Companion to Local and Family History |
| Contemporary distribution | Consult
Telephone Directories Electoral Registers Survey of Contemporary Names (The distribution of 16,000 names available here) |
| 1881 Census | Extraction from census, manually or with LDS Companion |
| 1851 Census | No National Index; but most Family History Societies have indexed 'their' county |
| GRO data | Births, marriages and deaths by registration district |
| Hearth Tax | Many counties available in print or will be through the Roehampton program |
| Poll Tax | The real thing : back almost to the era when heriditary surnames were formed |
Before plunging enthusiastically into this topic, the following preface of the case against, might be salutary.
A) Variant or Not?
In the 1960's, and using the GRO birth indexes for 1850, Francis Leeson mapped the distribution of his name and what he regarded as its variants -Lee, Lees, Leigh, Leigh, Lea, Ley, Leese, Leeson, Leason. The resulting plots revealed discrete areas, which remained the same even when compared with a 1960's telephone survey..
A reply was made to this article by Dr Reaney, who criticised the distributions on several fronts:-
1) A plot of the
modern spelling does not necessarily equate with the original
form or distribution "Lee,Lea, Ley, Lay and Leigh
are all one surname. They all go back ultimately to OE leah
and both surnames and place-names have a variety of forms; the
different modern spellings may be partly due to ME grammar,
partly due to the local dialect or simply to mere chance...Parish
Registers did not begin until long after surnames becam fixed; they
are not necessarily proof of the original distribution."
It would have taken maybe just 1 fertile family to migrate in
1530, to give a false impression of the home of a name.
Especially if that name did not appear elsewhere in mediaeval
documents.
2) A plot cannot
be made comparing a root name and its variants, unless one is
totally sure that the supposed variant did derive etymologically
from the root name. Reaney points out, that in his opinion, Leese
(from OE laes- 'pasture') and Leeson - derived from 'son
of Lece'- are not variants of the root names Lee, Lea, Ley, Lay
and Leigh.
Reaney has subsequently been criticised for over-reliance on
etymology, but I think his general points should be borne in mind
by anyone plotting any kind of surname distribution.
B) Surname
corruption
George Redmond in
his study of Yorkshire surnames has shown the amazing variability
of surnames.
"In
addition to the obvious variations associated with the distortion
of vowel sounds and the confusion when pronouncing consonants,
the author draws attention to the remarkably high incidence of
elision and truncation, as well as the introduction of so-called
prosthetic consonants such as Y, W or S to preface some surnames
beginning with a vowel. He also notes that the final consonant of
a first name may transfer to the surname, citing Thomas Anderson
alias Saunderson and John Nellis alias Ellis."
Book Review in
The Escutcheon
of - Surnames
and Genealogy
Dramatic changes could also occur to the final syllable of
surnames. For example -Whithalghe/Whitalk/Whitack and
Astmough/Astmall/Asman/Asmond. Surnames such as these seem to
have had very little stress on the final syllable - it was left
to the listener to decide their own interpretation -often in
perpetuity.
If you collect the
occurrences of a name from say the Hearth Tax, how do you know
that the name is what you think it is? -unless one investigate
the genealogy of each bearer.
Surname dictionaries will be of little help, because they tend to
ignore local corrupted forms. Surname dictionaries concentrate on
the earliest form of a name : surname corruption comes much later
As George Redmonds says, each occurrence of a surname should be treated as being unique.
End of the cautionary preface
Snapshots
Some of the
potentially really useful national and comprehensive sources are
inaccessible to us - such as the National Health Service Central
Register at Southport or the Social Security Central Register at
Newcastle upon Tyne.
Plotting by postcode
| example | number | mailboxes | households covered | |
| All unit postcodes | PO1 2ST | 1.7 million | 24 million | 15-16 average |
| Postcode sector | PO1 2 | 9,100 | 2,000 | |
| Postcode districts | PO1 | 2,900 | 20,000 | |
| Postcode area | PO | 125 | 200,000 | |
| above
figures are not exact : check how many delivery points
you own postcode covers |
||||
The following represents a rough guide to the percentages of the Scotland/Wales/England population in each postcode area and the proportion that are aged under 18. Non-mainland postcodes not yet included are for Belfast, Jersey, Guernsey and the Isle of Man. The figures should be taken as a rough guide.They were compiled prior to the publication of the ONS Census 2001 postcode area figures- but still appear to be in line
| P code | P area | % GB pop | % aged 0-17 |
| AB | Aberdeen | 0.8 | |
| AL | St Albans | 0.4 | 23 |
| B | Birmingham | 3.0 | 25 |
| BA | Bath | 0.7 | 22 |
| BB | Blackburn | 0.8 | 26 |
| BD | Bradford | 0.9 | 26 |
| BH | Bournemouth | 0.9 | 19 |
| BL | Bolton | 0.6 | 24 |
| BN | Brighton | 1.4 | 15 |
| BR | Bromley | 0.5 | 22 |
| BS | Bristol | 1.6 | 22 |
| CA | Carlisle | 0.5 | 21 |
| CB | Cambridge | 0.7 | 21 |
| CF | Cardiff | 1.7 | 24 |
| CH | Chester | 1.1 | 23 |
| CM | Chelmsford | 1.0 | 23 |
| CO | Colchester | 0.7 | 21 |
| CR | Croydon | 0.6 | 25 |
| CT | Canterbury | 0.8 | 22 |
| CV | Coventry | 1.3 | 23 |
| CW | Crewe | 0.5 | 23 |
| DA | Dartford | 0.7 | 24 |
| DD | Dundee | 0.5 | |
| DE | Derby | 1.2 | 23 |
| DG | Dumfries | 0.3 | |
| DH | Durham | 0.5 | 21 |
| DL | Darlington | 0.6 | 22 |
| DN | Doncaster | 1.2 | 24 |
| DT | Dorchester | 0.4 | 21 |
| DY | Dudley | 0.7 | 22 |
| E | London E | 1.3 | 26 |
| EC | London EC | 0.05 | 16 |
| EH | Edinburgh | 1.4 | |
| EN | Enfield | 0.5 | 23 |
| EX | Exeter | 0.9 | 20 |
| FK | Falkirk | 0.4 | - |
| FY | Blackpool | 0.5 | 21 |
| G | Glasgow | 2.1 | |
| GL | Gloucester | 1.0 | 22 |
| GU | Guildford | 1.2 | 23 |
| HA | Harrow | 0.7 | 23 |
| HD | Huddersfield | 0.4 | 23 |
| HG | Harrogate | 0.2 | 22 |
| HP | Hemel Hempstead | 0.8 | 24 |
| HR | Hereford | 0.3 | 22 |
| HS | Harris | 0.05 | |
| HU | Hull | 0.7 | 23 |
| HX | Halifax | 0.3 | 24 |
| IG | Ilford | 0.5 | 25 |
| IP | Ipswich | 1.0 | 22 |
| IV | Inverness | 0.3 | |
| KA | Kilmarnock | 0.6 | |
| KT | Kingston-upon-Thames | 0.9 | 22 |
| KW | Kirkwall | 0.1 | |
| KY | Kirkcaldy | 0.6 | |
| L | Liverpool | 1.5 | 24 |
| LA | Lancaster | 0.6 | 21 |
| LD | Llandrindod Wells | 0.1 | 22 |
| LE | Leicester | 1.6 | 23 |
| LL | Llandudno | 0.9 | 22 |
| LN | Lincoln | 0.5 | 22 |
| LS | Leeds | 1.3 | 22 |
| LU | Luton | 0.5 | 26 |
| M | Manchester | 1.8 | 23 |
| ME | Medway | 0.9 | 24 |
| MK | Milton Keynes | 0.8 | 25 |
| ML | Motherwell | 0.6 | |
| N | London N | 1.3 | 23 |
| NE | Newcastle-upon-Tyne | 2.0 | 22 |
| NG | Nottingham | 1.9 | 22 |
| NN | Northampton | 1.0 | 24 |
| NP | Newport | 0.8 | 24 |
| NR | Norwich | 1.2 | 21 |
| NW | London NW | 0.9 | 21 |
| OL | Oldham | 0.8 | 26 |
| OX | Oxford | 1.0 | 22 |
| PA | Paisley | 0.6 | |
| PE | Peterborough | 1.4 | 22 |
| PH | Perth | 0.3 | |
| PL | Plymouth | 0.9 | 22 |
| PO | Portsmouth | 1.4 | 21 |
| PR | Preston | 0.9 | 22 |
| RG | Reading | 1.3 | 23 |
| RH | Redhill | 0.8 | 17 |
| RM | Romford | 0.8 | 17 |
| S | Sheffield | 2.3 | 22 |
| SA | Swansea | 1.2 | 22 |
| SE | London SE | 1.5 | 23 |
| SG | Stevenage | 0.6 | 24 |
| SK | Stockport | 1.0 | 23 |
| SL | Slough | 0.6 | 23 |
| SM | Sutton | 0.4 | 23 |
| SN | Swindon | 0.7 | 23 |
| SO | Southampton | 1.1 | 22 |
| SP | Salisbury | 0.4 | 22 |
| SR | Sunderland | 0.4 | 22 |
| SS | Southend-on-Sea | 0.9 | 23 |
| ST | Stoke-on-Trent | 1.1 | 22 |
| SW | London SW | 1.5 | 18 |
| SY | Shrewsbury | 0.6 | 22 |
| TA | Taunton | 0.5 | 21 |
| TD | Galashiels | 0.2 | 20 |
| TF | Telford | 0.3 | 24 |
| TN | Tunbridge Wells | 1.1 | 23 |
| TQ | Torquay | 0.5 | 20 |
| TR | Truro | 0.5 | 21 |
| TS | Cleveland | 1.0 | 24 |
| TW | Twickenham | 0.8 | 22 |
| UB | Southall | 0.6 | 25 |
| W | London W | 0.9 | 18 |
| WA | Warrington | 1.0 | 23 |
| WC | London WC | 0.07 | 15 |
| WD | Watford | 0.4 | 23 |
| WF | Wakefield | 0.8 | 24 |
| WN | Wigan | 0.5 | 23 |
| WR | Worcester | 0.5 | 22 |
| WS | Walsall | 0.7 | 24 |
| WV | Wolverhampton | 0.6 | 23 |
| YO | York | 0.9 | 21 |
| ZE | Lerwick | 0.04 |
Scotland
The population of Scotland
at the 2001 Census was 5,062,011.
The population percentage of Scottish postcode areas oF 5,062,011
was about:-
| AB | 9.12 | KA | 7.23 | |
| DD | 5.36 | KW | 0.99 | |
| DG | 2.92 | KY | 6.86 | |
| EH | 16.01 | ML | 7.23 | |
| FK | 5.13 | PA | 6.38 | |
| G | 23.01 | PH | 3.00 | |
| HS | 0.52 | TD | 1.76 | |
| IV | 4.05 | ZE | 0.43 | |
| Scottish Sector postcode populations : 2001 Census | ||||
Postcode Atlases
|
UK
Electoral Rolls on CDROM
Pluses
Minuses
Although the disk
is expensive to purchase, there is a fee-based extraction service
available from People
Finders UK.
| The Ward is a unit common both to Electoral and contemporary Census geography. To learn how the modern census is administered, plus a list of all hierarchical divisions -county, district, ward, enumeration district, visit the Census Dissemination Unit |
| 1"Only 85% of
those who said they did not vote in the 2001 general
election were actually registered to do so and 29% of
young people aged 18-24 and 19% of minority ethnic groups
indicated in a sample survey that the reason for not
voting was that they were not registered" 2"Looking at ethnic minority communities, 27% of black non-voters and 15% of Asian non-voters reported that they were not registered, although these figures were drawn from a small base-size" UK parliamentary elections- numbers registered to vote
Changes to the
register tend to affect between 0.1% and 0.5% of
electorate in any given month |
UK-INFO Disk
Pluses
Minuses
Up to now, the UK-Info disk could not be recommended for surname distribution analysis, where accuracy in the totality of numbers is so important. The latest disk seems at first to have a much better coverage as a percentage of the population. This is due however to the many duplications in entries caused by Postcode changes. Ensuring that the source is one of 'clean data' is vital in our study.
Telephone Directories
These now come in a variety of formats - Online, Cd-Rom, and printed. However, the telephone directory -whatever its format- suffers from a major proviso -the increasing number of unlisted telephone numbers.
"Although the national average for ex-dir is about 37% the figures do vary enormously between counties, being lowest in northern England and *much* higher in southern England. So for any surname you will get perhaps 80% listed if they live in a northern county, but less than 50% listed in southern counties, especially East/West Sussex, Hampshire, Surrey, Kent etc. This imbalance in ex-dir status can be significant in surnames with small numbers, but probably less so with the more common surnames." (John Wynn)
The latest online
version -PhoneNetUk - is extremely
disappointing for our purposes. A regional qualifier is mandatory
(under the terms of the licensing authority) , so no national
searches are possible. The inclusion of postal codes is erratic,
and where they do appear are truncated to the outward code alone.
With the CD, national searches are allowable, but only the first
200 hundred entries are displayed (with full postcode). A tweak
is possible to derive statistics of a surname by region, if the
number of occurrences exceeds 200. A visit to the local library
will probably be required to consult the printed telephone
directories.
Colin Rogers has listed the disadvantages of using printed telephone directories:-
He adds:-
"British Telecom has an Archives and Historical Information Centre at 2-4 Temple Avenue, London EC4Y OHL which is open to the public...it holds an almost complete set of telephone directories from 1879 when the first publically available system was introduced into Great Britain."
Mr Rogers is sceptical about the usefulness of pre-1950 telephone directories for our purposes; the coverage of the population being so small. However, they might be useful as pointers for the study of relatively high frequent names.
National Health Service Central Register [NHSCR]
This database of
60 million names is not available in its entirety - but you can
look at an individual frequency. The NHS Central Register
is prone to list inflation, and some of the results are
surprising, so treat with extreme caution. The whole
database does have linguistic possibilities. For a paraphrased potted
history of
the NHSCR
Survey of Contemporary Surnames
Despite these
limitations, a major and significant survey was
conducted of the surnames of Britain, using the printed telephone
directories 1980-1996. The survey was led by Patrick Hanks and
Kate Hardcastle in order to establish those names deemed to be of
significance for 'A Dictionary of Surnames' OUP, 1988. The result was 16,000 surnames
with a frequency of more than 20 occurrences in any particular
directory.
A full
listing of the distribution of all the names can be found by
following this
link
| This is a major survey, whose results are important to anyone wishing to compare surname frequencies and distributions, especially between 1881 and today. Of particular use in identifying homophonic surnames that have completely different distributions e.g. Adie and Adey. One Scottish: the other West Midlands. |
International
data sources
The publication of national
telephone directories on CD has been used by geneticists to study
isonymic rates for individual countries. Onomastic studies based
on national datasets are much rarer, but hopefully will increase.
| Format | dataset size (names) | Publication based on data source | |
| Austria | 1996 telephone CD | 4 million | Barrai I and others. 'Elements of the
Surname Structure of Austria.' Annals of Human Biology 27, no. 6(November 2000-December 2000): 607- 22. |
| Belgium | telephone CD [future online source] |
Barrai I.; Rodriguez-Larralde A.; Manni F.; Ruggiero V.; Tartari D.; Scapoli C. 'Isolation by Language and Distance in Belgium 'Annals of Human Genetics, January 2003, vol. 68, no. 1, pp. 1- 16(16) | |
| Canada | 1996 telephone CD | 12 million | D K Tucker 'Distribution of forenames, surnames and forename pairs in Canada' Names 50 no. 2 (June 2002), 105-132 |
| Denmark | Danish Central Civil Register | 6.5+ million | Sondergaard, Georg. 'Computer Databank of Danish Names' Names , no. 38(1990): 21-30. |
| Estonia | Corpus Nominum Gentilium Estonicorum [online] | c 74,000 | |
| Finland | Poyhonen, Juhani. Suomalainen Sukunimikartasto . [Atlas of Finnish Surnames]. Helsinki: Suomalaisen Kirjallisuuden Seura, 1998. | ||
| France | Insee datasets of births 1891-1915 and 1916-1940 | Darlu, Pierre, Anna Degioanni, and Jacques Ruffie. 'Quelques Statistiques Sur La Distribution Des Patronymes En France.' Population [Paris]52, no. 3(1997): 607-34. | |
| Germany | telephone CD ? | Rodriguez-Larralde, A.; Barrai, I.; Scapoli, C. 'Isonymy and Isolation by Distance in Germany'. Human biology, 1998, vol. 70, no. 6, pp. 1041} | |
| Israel | 4 million+ | Eliassaf, Nissim. 'Names Survey in the Population Administration : State of Israel.' Names , no. 29 (1981): 273- 84 | |
| Italy | telephone CD ? | Barrai, I.; Rodriguez-Larralde, A.; Scapoli, 'Isonymy and Isolation by Distance in Italy'. Human biology, 1999, vol. 71, no. 6, pp. 947 | |
| Italy- Sicily | telephone CD ? | Rodriguez Larralde, A. and others. 'Isonymy and the Genetic Structure of Sicily.' Journal of Biosocial Science 26, no. 1(1994): 9-24. | |
| Japan | Miyazima S and others. 'Power-Law Distribution of Family Names in Japanese Societies.' Physica A 278, no. 1-2(April 2000): 282-88. | ||
| Netherlands | Instituut Meertens [online] | 27,000 | 'Grinding one's teeth. Linkage of surnames in the Database of Surnames in The Netherlands' by Leendert Brouwer 21st International Congress of Onomastic Sciences Uppsala, August 19-24, 2002 |
| Norway | |||
| New Zealand | |||
| Russia | Balanovsky O.P., Buzhilova A.P., and Balanovskaya E.V. 'The Russian Gene Pool: Gene Geography of Surnames.' Russian Journal of Genetics 37, no. 7 ( July 2001 ) | ||
| Spain | telephone CD ? | Rodriguez-Larralde, A.; Gonzales-Martin, A.; Scapoli, C.; Barrai, I. 'The Names of Spain: A Study of the Isonymy Structure of Spain'. American Journal of Physical Anthropology, 2003, vol. 121, no. 3, pp.280-292 | |
| Switzerland | 1994 Helvetic Telephone Directory | Barrai, I. and others. 'Isonymy and the Genetic Structure of Switzerland .1. The Distributions of Surnames.' Annals of Human Biology 23, no. 6(1996): 431-55 | |
| USA | 1997 telephone directory CD | 100 million | D K Tucker 'Distribution of forenames, surnames and forename pairs in the USA' Names 49, no. 2 (2001): 69-96. |
| Venezuela | telephone CD ? | Rodriguez-Larralde, Alvaro; Morales, Jorge; Barrai, Italo 'Surname Frequency and the Isonymy Structure of Venezuela'.American Journal of Human Biology, 2000, vol. 12, no. 3, pp. 352 | |
Part 2 - Censuses
The 1881 census transcription -despite its known faults- is a marvellous tool for considering the frequency and distribution of names in the late nineteenth century.
The Guild of One-Name Studies has done important work in establishing baselines upon which to commence a study of individual names. The following table of conventions is based on the work of the 1881 Project- co-ordinated by Geoff Riggs
| slt | The number of surname occurrences at a sub-national level | local |
| Snt | The National total of surname occurrences | National |
| n | The population size of the area under study | local |
| N | The National Population size | National |
| slt/Snt | The percentage of occurrences | local |
| slt/N | The frequency : usually expressed per 1,000 or per 10,000 | local |
| Snt/N | The overall frequency | National |
| (slt/Snt)/(n/N) | The Density | National |
The density is an
important indicator. If a surname was evenly distributed it would
have a density of 1.
Geoff Riggs shows in his articles that reliance merely on the
number of occurrences (s) is a misleading indicator.
For example, below are the 1881 county figures for my own name :-
| County | 1881 Population | Number | Total Occur | % of 2514 | significance | per 1000 | Rank | ||
| n | s | 2514 | s/n | s/S | (s/S)/(n/N) | ||||
| HEREF | 121,062 | 160 | 2514 | 6.36 | 0.06 | 13.65 | 1.322 | 1 | |
|---|---|---|---|---|---|---|---|---|---|
| BERKS | 218,363 | 227 | 2514 | 9.03 | 0.09 | 10.10 | 1.040 | 2 | |
| WILTS | 258,965 | 105 | 2514 | 4.18 | 0.04 | 3.94 | 0.405 | 3 | |
| GLOS | 572,433 | 194 | 2514 | 7.72 | 0.08 | 3.29 | 0.339 | 4 | |
| HANTS | 593,470 | 181 | 2514 | 7.20 | 0.07 | 2.96 | 0.305 | 5 | |
| WORCS | 380,283 | 113 | 2514 | 4.49 | 0.04 | 2.89 | 0.297 | 6 | |
| SURREY | 1,436,899 | 341 | 2514 | 13.56 | 0.14 | 2.31 | 0.237 | 7 | |
| RUTLAND | 21434 | 4 | 2514 | 0.16 | 0.00 | 1.81 | 0.187 | 8 | |
| WARWICK | 737,339 | 116 | 2514 | 4.61 | 0.05 | 1.53 | 0.157 | 9 | |
| NOTTS | 391,815 | 50 | 2514 | 1.99 | 0.02 | 1.24 | 0.128 | 10 | |
| BUCKS | 176,323 | 22 | 2514 | 0.88 | 0.01 | 1.21 | 0.125 | 11 | |
| MDSX | 2,920,485 | 311 | 2514 | 12.37 | 0.12 | 1.03 | 0.106 | 12 | |
| OXON | 179,559 | 19 | 2514 | 0.76 | 0.01 | 1.03 | 0.106 | 13 |
| The data
in the above and below tables is derived from considering
figures derived from 1) counties, and 2) from the smaller
registration districts I have used Steve Archer's LDS Companion to extract the number of references and location in both tables from the 1881 Census Cdrom. Alternatively, I could have collected them manually from the fiche version, and using M Bryant Rosiers Index to Census Registration Districts, assigned them to their correct area. The former is far simpler. Population figures are taken from the statistics section in this site. |
Surrey has the
highest number of absolute numbers, but if one considers the
density, then Herefordshire is the leading county, with Berkshire
close behind. There is a wide margin to the next county,
Wiltshire.
I found this surprising, as a contemporary survey indicates
Berkshire as the main county, whilst the IGI favours
Worcestershire. The name Dance seems not to have a discrete
source, but seems to have arisen independently in several
counties from Worcestershire, through Gloucestershire, Wiltshire,
Berkshire,Hampshire.
But then, in surname distribution studies, nothing is often clear cut- as the following table of data arrranged by registration district reveals:-
| Regn Dist | Regn Cnty | Count | 1881 Population | s/n | s/S | Density | Significance |
| Marlborough | Wiltshire | 47 | 9,588 | 0.02 | 0.03 | 4.9 | 51.47 |
| Ledbury | Herefordshire | 53 | 12,691 | 0.01 | 0.02 | 4.2 | 43.85 |
| Wokingham | Berkshire | 53 | 15,996 | 0.01 | 0.01 | 3.3 | 34.79 |
| Bradfield | Berkshire | 46 | 16,719 | 0.00 | 0.01 | 2.8 | 28.89 |
| Newent | Gloucestershire | 25 | 11,030 | 0.00 | 0.01 | 2.3 | 23.80 |
| Catherington | Hampshire | 6 | 2,747 | 0.00 | 0.01 | 2.2 | 22.93 |
| Castle Ward | Northumberland | 43 | 19,720 | 0.00 | 0.01 | 2.2 | 22.89 |
| Andover | Hampshire | 47 | 15,700 | 0.00 | 0.02 | 2.1 | 22.07 |
| Hartley Wintney | Hampshire | 44 | 21,326 | 0.00 | 0.01 | 2.1 | 21.66 |
| Cirencester | Gloucestershire | 42 | 21.125 | 0.00 | 0.02 | 2.0 | 20.87 |
| Hungerford | Berkshire | 33 | 17,802 | 0.00 | 0.02 | 1.9 | 19.46 |
generated with xls2html converter
The distribution is best seen as a map
| This map shows the
heartland of the Dance surname. Three foci can be discerned:-
The map was created with
Genmap v2 |
![]() |
.
![]() |
Marlborough is the 1881 Dance hotspot.Within the
Registration District, the name is located in just 2
parishes:-
Map created from the HDS Historic Parishes of England and Wales CD |
GRO data
One-namers collect
the GRO data for their name as a matter of course. An examination
of members' pages on the Guild site reveals many excellent
examples.
Professor David Hey has built a database from the death
registrations for the years 1842-1846 for surnames beginning with
the letters A,E,K and R. This has resulted in a computer database
of over 220,000 surnames, covering an estimated 12.5 % of the
whole set of surnames. He has published his results in his book Family Names and Family
History
An examination of the GRO data for my name 1840-45 shows the main counties to be Berkshire, Hampshire, Worcestershire
Part 3
| This
section considers the geographical tools available to
analyse the spatial dispersion of surnames. Amongst those introduced are:-
|
-Index numbers
In the section above, an example was given of the Density of a particular surname (my own). This could be applied to every surname in the national database under study, to produce an index value for each name. It is the norm, however, to express Index numbers around base 100. One can either multiply the Significance value (see above) by tha factor, or use the following equation to produce the same result
| Si = | ___ Slt ______ |
* 100 |
(Snt/N) * n |
where
| Slt | the local count of your name |
| Snt | The sum of all the local counts |
| n | The population of the local area |
| N | The national population of area = Snt |
An index number of
200 would indicate that for that surname there are twice as many
surname-holders in that area, than one would expect given the
total number nationally.
High frequency surnames exhibit a range of index values that is
very constricted. For example, in the late 1990's, the surname Smith
ranged from a minimum value of 50 to a maximum value of 249.
This should be compared with low frequency names that have ranges
0 - 3,000
At the extreme, some names with very small populations have very
high index scores of c9,000
If one looks in which areas (in this case, postcodes) the index values reach a peak, the results seem inconsistent
Postcodes with the highest number of peaks |
|||||||
| London WC | London EC | Norwich | York | Hull | Ipswich | Truro | Taunton |
| 727 | 703 | 517 | 381 | 371 | 360 | 355 | 353 |
Postcodes with the lowest number of peaks |
|||||||
| Kingston upon Thames | London SW | Manchester | Llandudno | Blackpool | Cardiff | Leeds | Reading |
| 45 | 46 | 60 | 62 | 67 | 94 | 98 | 98 |
Why have London
Postcodes some of the highest and lowest number of surname peaks?
Those who are experienced users of Surname Atlas may have noticed that some surnames seem
to display unaccountably heavy concentrations in the Isle of Man
or Jersey. This is a distortion that is probably introduced
through the large population ranges of geographical areas, as
well as the large surname ranges. If you are working with
contemporary data, and therefore postcodes, please be aware that
postcode area populations vary from 3% (Birmingham) down to 0.04%
( I
must do a similar exercise on 1881 Registration district areas).
|
| Least resident-populated
Postcode areas % UK population |
||
| KW | Kirkwall | 0.09 |
| LD | Llandrindod Wells | 0.09 |
| WC | London WC | 0.07 |
| EC | London EC | 0.05 |
| HS | Harris | 0.05 |
| ZE | Lerwick | 0.04 |
In the following grid, column b represents a matrix of postcode
areas and clusters of similar surnames -large and small. This the
top lefthand cell represents frequently occurring names in highly
populated postcode areas (the Smiths etc in Birmingham etc); and conversely, the
bottom righthand cell represents low frequency surnames in
sparsely populated postcode areas (e.g. London EC).
The key represents the standard deviations. Most surnames fall
within an irregular but graduated range of 40-420 standard
deviations. Those in islands or 'pockets' have much wider ranges;
and the standard deviations for low frequency names in
low-populated areas are excessive.
In effect
a 'cluster' of small names is far more likely to appear of
significance than a 'cluster' of large names. The size of the
postcode in which the cluster appears can also bolster this bias.
| a | b | c | key | ||||||||||
Surnames Large to Small |
Large...<.Postcode area>...Small | Islands | 45-49 | 300-349 | |||||||||
| Orkney etc | 50-99 | 350-399 | |||||||||||
| Shetland etc | 100-149 | 400-450 | |||||||||||
| Outer Hebrides | 150-199 | 500-549 | |||||||||||
| London EC | 200-249 | 1000+ | |||||||||||
| London WC | 250-299 | 1500+ | |||||||||||
For this reason, the index value ideally needs to be standardised. An equation has been formulated that does this- but is not-as yet- in the public domain.
(This section is based upon elements from an unpublished UCL symposium paper by D Lloyd)
- Mean Separation Distance
This is a measure of how dispersed your name is.
The clearest way that I can think of understanding how to apply the formula is through the following example.
Consider 4 places (parishes, registration districts) A,B,C,D each with holders of your surname numbering 100, 50, 20, 10 repectively. Enter these numbers into the following grid, as well as entering a measurement of the distance of all the other places from each other (noted here as dBA,dCA ,dDA = distance of B,C, and D from A). This is represented in this case by the third, fourth anf fifth columns below.
| Numbers | distA | distB | distC | |
|---|---|---|---|---|
| A | 100 | - | ||
| B | 50 | 15 | - | |
| C | 20 | 25 | 10 | - |
| D | 10 | 30 | 7 | 5 |
Formula 1= The enumerator
| (B x dBA)
+ (C x dCA)
+ (D x dDA)
+ (C x dCB) + (D x dDB)
+ (D x dDC) To enter the relevant numbers simply start at the red number 15 in the grid, and work down each column in turn (B x 15) + (C x 25) + (D x 30) + (C x 10) + (D x 7) + (D x 5) By
substitution of the nameholders in each place this
completes to = 1870 |
This is known as the Total Separation Distance. This figure must now be divided by the Total of Separated Persons cited in the demoninator to result in a Mean Separation Distance of the name. The citiations will for every time the placenumber has been used in the first formula, but this time also including the surname number in place A.
Formula 2= The denominator= Total of Separated Persons
| (A + B + C + D) + C + (D +D) = 100 + 50 + 20 + 10 + 20 + 10 + 10= 220 |
The Mean Separation Distance in this case is 1870/220 = 8.5 km
| You might
like to copy the grid and formulae into a spreadsheet and
experiment with the numbers. For example, if the distances remain the same , but the numbers in place A were much higher , say a concentration of 1000 (and not 100), the MSD would drop to 0.18. If the numbers were equal (say 10) in each place, then the MSD= 13.14 If the numbers remain unaltered, but the distances from A are increased by 100km each, then the resulting MSD =106.22 And if the number of nameholders are the
same, and the distance increased by 100 km, then the MSD=98.85 |
| A problem will be how
to measure accurately (and in a consistent fashion) the
distances between places, when those place are areal
units, like parishes and registration districts. How does
one determine what is the centroid of each is? GenMap has has a tool to measure
straightline distances between registration districts
(but you will still have to guesstimate where the
centroid is) |
-Mean Separation Distance of the Place
The above allows you to compare the dispersion of 1 surname with another, but does not give one a national perspective of the relative dispersion of all names by place. To do so, one would have to feed all the MSD's back into the original locations.
For example, in Parish A, has a total population of 90 people, comprising just 3 surnames (p,q,r) with occurrences of 16, 8, 3 and associated Mean Separation Distances (19, 16 and 6).
The MSD of the whole parish would then be calculated as:
p19 + q16 + r12/ total parish population
by substitution
(16 x 19) + (8 x 16) + (3 x 6)/90 = 304 + 128 +18/90 = 450/90= MSD of the parish = 5
This is a daunting exercise for anyone without access to large computing power, but it has been done by the University of Essex for each parish of England and Wales in the 1881 census, and the output plotted onto a map, and published in Local Population Studies no 72 (2004)
This map reveals certain broad belts of low separation distances
Surname density seems to cut across these areas: except a core area of the South Lancashire seems to fit within the 1st belt. This seems to be an area of low surname density, in which the holders ramified greatly within the same region. As opposed say to North Wales, where the high migration to England co-existed with low surname density.
-Nearest Neighbour Analysis
I have in mind here the comparison of the dispersal of 2 or more names in a modern context. For example:-
NNA is a measure of distribution, and not of 'pattern' and does have limitations. The higher the number of points, the higher the reliability of the result; and 30 surname plots would be considered the minimum.
Procedure
| Dobs |
| /re |
The Value of the nearest neighbour statistic (Rn) can range from 0 (extremely clustered) to 2.15 ( an ordered and uniform distribution). A value of 1 would suggest a random distribution
It is now up to the surname analyst to explain the resulting distribution
Be aware that: the above equations are based on 2 assumptions:-
These are severe restrictions in the case of surname study: as quite a few factors come into play - propinquity of kin, economic conditions, lines of transport, geomorphology. And if the area under study changes, then this affects the density of the points.
|
So this technique
may perhaps be useful for comparing the temporal change in a
surname distribution within a specified area, such as a parish or
registration district - provided its boundaries have not changed
in the interim, or for comparing the distribution of 2 surnames
in the same area, or jsut perhaps the distribution of a
widely-dispersed surname, using the area of England, Scotland, or
Wales as a baseline.
But the resulting index number for a surname, does not imply that
the distributions are the same as it is "possible for
arrangement of points which are very dissimilar to have identical
mean nearest-neighbour distances"
-Lorenz Curves
These are plots of cumulative percentage (normally used in economic and social history to plot accumulated wealth).However, they can be used here to plot accumulated name frequency (x axis) against accumulated area (y axis). A surname that aligned on the diagonal (a-d) would have a perfectly even distribution of name-holders such that 10% of the area sample contains 10% of the surname-holders, 50% of its area contains 50% of the name-holder population. Any curve that tends to corner b would lllustrate a name in which a large % of the name-holders are concentrated in a small % of land
......c..................................................................................................................d |
||||
| cum % of area |
|
|||
|
||||
a......................................................................................................b
All Lorenz curves should be compared against the diagonal. It is possible -using the Gini coefficient- to measure the area between the diagonal and the curve, as a fraction of the area below the diagonal. A high concentration of surname-holders in a small area would yield a high Gini co-efficient.
Lorenz curves are useful for comparing the differential growth or decline of any two features over time so these curves could be used to compare :-
Comprehensive area values can be obtained from the Census Abstracts, and individual volumes of the Victoria County History,
and selected values from this site
Comparing Census surname distributions over time
There were 383
changes to the boundaries of registration districts from 1841 to
1911, and almost 20,000 to parishes from 1876 to 1972. "These changes mean
it is very difficult to compare one census with its predecessor
and make the creation of long run time series of raw data
impossible"
This problem
might not affect the study of a single name, but would have to be
taken into consideration by those who are studying the varying
distribution of a class of names.
For example, if one wished to study how the age-distribution of
Welsh surnames in a London registration district varied between
1841 to 1901
Gregory and Ell consider possible ways to overcome the problem for census geographers.
Source: Ian
Gregory and Paul Ell Breaking the boundaries: geographical
approaches to integrating 200 years of the census
Journal of the Royal Statistical Society A 168(2) 2005,
pp419-437
Part 4 - Socio-economic
-Geodemographics and surnames
Lots of potential for analysis here- though I am not entirely convinced of the validity of this approach on the microscale
"There is no formal proof and no "theory of geodemographics" either, only the concept that "birds of a feather flock together". All the evidence is empirical..the systems are used simply because they do work.."
R Flowerdew/ B Leventhal- Under the microscope (Market Research Society symposium paper)"Some of the most persuasive evidence that geodemographic mapping does affect perceptions is the condemnation of this work by other researchers"
D Dorling Mapping p13
Geodemographic schemes use census and private data to create a
profile of a neighbourhood. These profiles serve as a likely
indication of the area's relative affluence, and the possible
life-style of its inhabitants. A classification scheme is used to
assign profiles into a hierarchical order
Two well-known geodemographic products are:-
| Acorn -A Classification of Residential Neighbourhoods |
Mosaic - (used by the credit agency, Experian) |
||||
| UK
2001 Classification (Main classes) |
est % Uk Pop |
Main classes | 52 sub-groups | ||
| Wealthy achievers | Wealthy executives Affluent greys Flourishing families |
8.6 7.7 8.8 |
High
income familes Suburban semis |
Professionals
and wealthy people living in very affluent suburbs includes satellite villages as well as suburbs |
|
| Urban prosperity | Prosperous
Professionals Educated Urbanites Aspiring singles |
2.2 4.6 3.9 |
Blue Collar Low rise council Council flats |
Least
expensive owner-occupied housing; includes junior
white-collar Local authority or housing association tenants includes municipal overspill estates |
|
| Comfortably Off | Starting out Secure families Settled suburbia Prudent pensioners |
2.5 15.5 6.0 2.6 |
Victorian
low status Town houses/flats Stylish singles |
Wide
mix of lifestyles for mainly young families and childless
elderly Lower and middle income- typically junior admin grades Typically inner-city; well-educated occupants |
|
| Moderate Means | Asian communities Post Industrial families Blue collar roots |
1.6 4.8 8.0 |
Independent elders Mortgaged families |
Owner-occupiers or
sheltered accommodation: low incomes Typically newly-built private housing; young families on town peripheries |
|
| Hard Pressed | Struggling
families Burdened singles High rise hardship Inner-city adversity |
14.1 4.5 1.6 2.1 |
Country dwellers Institutional areas |
Outside the
commuter belt; wide range of lifestyles & affluence A catch-all category for militayr housing, boarding schools, hospitals etc |
|
| Unclassified | 0.3 | Mosaic has recently been revised e.g. to accommodate changing affluence/lifestyles e.g. in the Asian community | |||
| Census variables: {Age, sex, socioeconomic status, Occupation, tenure} | Census variables: {Age,
marital status, recent movers, household composition
& size, employment type, travel to work,
unemployment, car ownership, housing tenure, amenities,
housing type, socioeconomic status} Non-census variables: {County Court Judgements, Credit activity, Electoral Roll, Postcode Address File, Directors, Retail accessibility. c 350 variables (census and non-census) in all |
||||
| source used for table % | Photographs that illustrate areas deemed to be typical in Mosaic. | ||||
| On average there are 3.1 different household level Mosaic types in a postcode : Only 22% postcodes consist entirely of 1 Mosaic type at household level : A maximum of 18 different types in a postcode (Source: Richard Webber) | |||||
| More detailed
classification for both schemes on their websites Useful source; Presentations to the MRS Census and Geodemographics Group |
|||||
However, Acorn is the more usable to the surname analyst, as the profile assigned to a unit postcode is readily available
A One-namer could obviously tabulate the current overall socio-economic status of their name. Although a minimum number of name-holders would be needed (100+?). If a name is still predominantly located within a specific region, such an analysis would divulge what percentage are rural/urban; associated with town centres, suburbs etc.
This is such a new area, that I am wary that there must be pitfalls in applying a scheme to find the socio-economic value of a surname. And would such a profile have any validity?
|
Perhaps of more significance
would be to define a group of names, and to perform the same
profiling. This has been done for the names traditionally
associated with one small region (i.e. 'local'
surnames). The analysis (of this unpublished academic study) found that 'local'
names were more associated with lower status profiles. and
neighbourhoods.
| To see
what can be achieved in this area of name
pattern analysis using geodemographics, then download Richard Webber 'Neighbourhood segregation and social mobility among the descendants of Middlesbrough's 19th century immigrants- (CASA Working Paper- 88) Note:
|
This type of approach needs to be repeated for other parts of the
country
| The following are possible areas for socio-economic surname studies | |
| Above-average
concentrations of financially privileged and socially excluded, in close proximity:- |
Camden, Haringey,
Westminster Aberdeen, Edinburgh, Stirling |
| High proportions of elderly people living in council accommodation | Nottingham, Barking, Dagenham |
| Eclectic ethnic mixes (London postcodes) | London -E7, E12, EC1N,W2,W3,W1BN17, London-N15,SE15,SE8,SW9,SW5,SW7,SW8, UB1 |
| source: J. of targeting, measurement and analysis for marketing (2001) vol 10, 1 p64 | |
Portsmouth would be an interesting case-study. It is unique in the UK as being an island city, with a strong-sense of place, and a clannishness associated with long-established families.
-Household
Composition and surnames
It is possible to glean further information from the nature of the household entry , by analysing the possible combinations of gender and surnames
Possible household categories
| Family | 1 male: 1 female, sharing the same surname |
| Extended Family | Family with at least one other adult of the same surname |
| Pseudo Family | 1 male: 1 female, but with different surnames |
| Single male | |
| Male homesharers | 2 or more males with 2 or more surnames |
| Multi-occupancy dwelling | More than 5 surnames at one address |
This is a
simplified version of the household analysis in Mosaic. The use
of such a simplified system does have drawbacks e.g. a brother
living with a widowed sister would appear to be a pseudo family.
Given name frequencies could be used to help decide if extended
families comprise of parents or offspring (Mosaic classifies
your forename into 50 clusters - each with a similar age
distribution)
.."it would seem that type of neighbourhood, age and gender represent three items of information which are 'orthogonal', ie complementary to each other in that they operate in three quite independent domains. Given that both gender and age can be inferred from a person's first name with a fair degree of reliability (especially when also using public information such as years at their current address on the electoral roll and the presence and name (if present) of a partner) then it would seem that most behaviours could be predicted for any consumer from their name and address with a fairly high degree of success "
R Webber "father of UK Geodemographics"
and in the USA
"The development process also uncovered a correlation between cluster membership and given names. In the 35 million-name database, there were many names that appeared with unusual frequency in only one cluster. For that reason, all of the clusters were given high-indexing first names, resulting in titles like "Jules & Roz" (affluent and physically active urbanites with children), "Denise" (single mothers on a tight budget), and"Elmer" (very sedentary older men)...
[Certain] people defy categorization and have been lumped into a potpourri group known as the "Omegas." Nearly 9 percent of all U.S. households are Omegas."
Source : J Bickert, 1995
Names 'typical' of their age groups core age 38-44 Michelle, Sharon Kevin, Gary 44-64 Pamela, Janet Philip, Brian 65-84 Sylvia, Brenda Kenneth, Raymond 85+ Hilda, Ethel Percy, Herbert Source: 'Geographics,GIS and neighbourhood targeting' Wiley, 2005 p 72 Female names are more fashion-driven than male names. If they are combined with a male partner name, then geodemographers are pretty confident in their estimates of that couple's age-range. This is re-inforced by the length of residency - a statistic that is consistently lower where lower-age groups are involved.
As for surnames -
"...in Scotland the percentages of electors with self-evidently Scottish names is significantly higher among consumers in highland and island communities than among consumers in student areas, defence establishments and areas of high-incomes singles and families in inner areas of Glasgow. Indeed the percentage with Scottish names has proved a more effective indicator than the Census indicator 'speaking Gaelic' in identifying areas with the most traditionally Scottish way of life"
source: R Webber Designing geodemographic classifications to meet contemporary business needs Interactive Marketing 5(3) 2004, p 233-234
Example
I have just played around and collected household data for name D in the PO postcode area
| Postcode | a | b | c | d | e | f | g | h | i |
| Households | 1 | 2 | 3 | 1 | 1 | 4 | 5 | 3 | 2 |
| Main types | 1 | 2 | 1 | 1 | 1 | 2 | 2 | 3 | 2 |
| 1 mod means | 1 mod
means 1 hard-pressed |
3
mod-means |
1 comfortably-off | 1 mod means | 3
wealthy- achievers 1 comfortably-off |
4
wealthy- achievers 1 comfortably-off |
1
wealthy- achievers 1 comfortably-off 1 hard-pressed |
1
comfortably-off 1 hard-pressed |
|
| sub-types | 1 | 2 | 3 | 2 | 1 | 4 | 3 | 3 | 2 |
| Postcode | j | k | l | m | n | o | p | q | r |
| Household | 1 | 1 | 2 | 2 | 1 | 9 | 1 | 1 | 1 |
| Main types | 1 | 1 | 2 | 1 | 1 | 3 | 1 | 1 | 1 |
| 1
mod-means |
1 comfortably-off | 1
urban prosperty; 1 hard-pressed |
2 comfortably-off | 1 wealthy-achievers | 6
comfortably-off 1 moderate means 2 hard-pressed |
1 hard-pressed | 1 urban-prosperity | 1
wealthy achievers |
|
| sub-types | 1 | 1 | 1 | 1 | 3 | 1 | 1 | 1 |
Number % National
AveragePostcodes
involvedHouseholds Types Numbers Wealthy achievers 11 25.6 25.1 5 10 Singles(Young) 6 Urban prosperity 2 4.7 10.7 2 2 Singles (Mature) 17 Comfortably Off 14 32.6 26.6 8 14 Doubles 17 Moderate means 8 18.6 14.5 6 8 3+ 2 Hard-Pressed 8 18.6 22.4 6 7
- This name is a broad mid-southern name
- In this Postal area, it is associated with the suburban sprawl, rather than urban heartlands
- Mature residents are to the north of Portsmouth, or Bognor. There is negligible movement to retire to the IOW
- Judging from the forenames, and residential areas, the age-profile is quite high. Many of the 2 person households would appear to be pensioners. The young singles are in a minority
It might be a fruitful exercise to correlate geodemographic status against household composition for a name/class of names
Another possible broader-brush classification scheme at local authority level
Does the above have implications for the philosophy of identity. The standard position is that a name has reference but no meaning i.e. it is a label that refers to one object. If it refers to more than one, then its usage is that of a common noun. Foe example, if I talk about a polar bear, the mind conjures up a class of bear with all its associations, snow, whiteness, polar region. If I say 'John' there is no equivalent class of 'Johns' sharing all the same attributes, into which my one John neatly fits.
But with geodemographic clusters, someone with a distinctive name might group into defined socio-economic, lifestyle groups. Not everyone, but a significant number.
So are Geodemographic clusters "common nouns" or
does the lifestyle communality imply meaning ???
This is in all probability early-morning tosh- I certainly have not thought it through
Other
Spatial Analysis Tools
-Index of concentration
-Location quotient
-Cluster Analysis
Still to be written (sometime):-
If you came to
this page directly, then please access
Modern British Surname
Studies
Last revised: February 13, 2006.