Statistics and Surnames

 

Vermeer Portrait Introduction
Guide, data and ancillary themes
Glossary

 


Modern British Surname Studies





Guide and Data
My guide to Official Statistics
Victorian Population Census Statistics (in progress)
Birth/Marriage/Death Annual Registration Statistics
1881 Registration Districts (England & Wales) -Population Statistics
Selected events in civil registration
Census inputs/output areas

Official statistics reproduced here are Crown Copyright, and done so with permission

Ancillary themes
The IGI -statistical aspects
International comparisons

.


Introduction
Interest in the statistical aspects of surnames centres on either:-

The names themselves The emphasis is on the individual name, and on counting actual numbers of an individual name by area and time. Determining the incidence . Members of the Guild of One-Name Studies have collected data on rarer names. There is still much research to conduct on the incidence of leading surnames.
Surname profiling Actual names themselves are immaterial. What is being determined is how many occurrences there are of names appearing once, twice, thrice and so on.
From this, comes an estimate of the average number of names per unit, and therefore (perhaps) the size of the surname stock. However, this whole area is fraught with difficulties -the problem of how to determine a representative sample when the pool of surnames is far from homogenous, the problem of determining what the minimum size of the sample should be...
Trevor Ogden -a leading researcher in this field- takes us further into this topic
Local history of surnames The marriage of quantitative/qualitative techniques through a case study of Portsmouth surnames in 1900

 

 

 

 

The names themselves

Initially I hoped that someone else far more experienced would produce such a website to help a beginner like myself.
Hopefully that will eventually happen. In the meantime, I have collated various statistical methods useful for one-name study, from a variety of sources. I trust that others will advise on their strengths and weaknesses
As they say, it has been an educative experience

Background official statistics for use with the mentioned methods can be found in the data pages

A Surname signature
I have often wondered if it would be possible to condense the distribution and incidence data of a surname into a formula say, so that comparisons could be made against other names, or against the same name at various periods.

One possibility advanced has been the Smallshaw Name Identification Factor

The aim of Name Identification is simplicity of method. The GRO (England and Wales) birth registrations are summed for 1870 and 1970, then averaged. The leading county is noted. Thus:-

Smallshaw 8 Lancashire

The Smallshaw factor has been criticised for producing misleading results in the case of rare names, or names that migrated to England, whilst still having a formidable presence in Scotland, say. The single year 1870 might not be statistically representative for a name, either, and that a 10 year run (1860-69 and 1970-79) would be more accurate.
A simple appendix denoting the area and year range might help?
Smallshaw 8 Lancashire (10EWS)

Another possibility is a visual representation. The following is the Jones pair, from the graphical representation and distribution maps of 100 common surnames that appear in G.W Lasker (1985) 'Surnames and genetic structure'. Lasker -as previously mentioned - surveyed all the GRO (England and Wales) marriage entries in the March 1975 quarter.

The graphs vividly depict the predominance in Wales and the Midlands, and the relatively low occurrence of the name Jones in the north of England. Presumably this technique could be extended to include Scotland as well ?

How the data behind the graphs was actually formulated is not covered in as much step-by-step detail as I would like:-

"The graphs depict for each surname, the probabilities of local excesses or deficiencies in frequency from west to east and from south to north. The number of occurrences of each surname expected in each district if all surnames had a uniform geographic distribution was subtracted from the observed number of occurrences and a measure of the probability of each deviation occurring by chance was recorded. The west-east and south-north distributions of these values were fitted by the curves..The degree of departure of points on these curves from zero is thus an indication of the probability of increased (or decreased) frequency of the surname at the longitude or latitude at the number of kilometres east (left-hand diagram) or north (right hand diagram) of the Ordnance reference point."

One-namers habitually collect all the GRO references to their 'name'. Is there any way a data template could be programmed for the automatic creation of graphs such as the above?

The Smallshaw factor is suited to localised surnames : The Lasker graphs to distributed surnames.
Would the marriage of the two methods create an acceptable surname signature?

 

Estimating the total size of a surname

Can one estimate how many bearers of a name there have existed, since parish records began? The seminal work on local demography is Wrigley and Schofield's monumental 'The Population History of England 1541-1871'. This tome rests on the analysis of the registers of 404 English parishes to produce estimates of the like of population growth, fertility, age at marriage etc. Graham Fidler in 'How Big is your One-Name Study' condensed some of the essential figures into the following table, which reveals the estimated number of births, marriages and deaths for each 50 years since 1541.

Year Crude Total Births- Millions Crude Total Deaths -Millions Crude Total Marriages -Millions
1541-1599 6.8 5.2 2.0
1600-1649 7.4 5.9 1.9
1650-1699 6.9 6.8 1.7
1700-1749 8.8 7.8 2.2
1750-1799 12.1 9.0 2.9
1800-1849 21.9 13.8 4.7
1850-1880 22.4 13.8 5.2
Total 86.2 62.4 20.8

Graham found that in the current telephone directories, that there are 1,625 Fidlers -equivalent to 101 entries per million.
(Article written before the appearance of the 1881 Census index, whose figures would have been a preferable baseline)
Using the above table, suggests that there have been 8,700 Fidler births since 1541, and as the work of Martin Ecclestone indicates, the IGI should contain -on average- 50% of these.

In my own case,(and Dance being a predominantly English surname), there were 2515 Dances in England and Wales in 1881. The relevant national population was 26,046,142, resulting in a ratio of 96.56 per million. Multiplying by the factor of 86.2 results in 8,323 total births since 1541. Contrast this with the result by the Bardsley method below.

Provisos:-
Assumes that the present rate has been consistent in the past
Applies only to a name that is predominantly distributed in the English counties

The Wrigley and Schofield figures were further refined by Alan Bardsley in 'How many Smiths are there?' JOONS April 1996. This article attempts to both provide a longer time frame by adding GRO data post 1871, and to provide factors for the estimation of a name population in any decade. How the factors are derived is not explained in detail, so I will quote verbatim from the article:-

"The basic information needed for each year is the annual population and birth rate. From these a multiplication factor can be calculated which gives the relationship between the births in any one year and the total for all time. Similarly a factor for the total alive in any one year and the total for all time can be derived.
The former can be used with the annual birth registers and the latter with the census records.
...To find the total number of births from 1541 to 1996 multiply the number (of births) for any one year by the factor given for Births.
Similarly if you know the total number of individuals in any one year multiply by the factor given for Totals to give a total of individuals from 1541 to 1996."

Year Births
factor
Total
factor
  Year Births
factor
Total
factor
1541 1901 64.6   1771 789 27.8
1551 1502 59.5   1781 716 25.4
1561 1602 60.1   1791 602 23.1
1571 1644 54.8   1801 606 20.5
1581 1472 49.7   1811 449 18.0
1591 1551 45.9   1821 372 15.2
1601 1388 43.5   1831 373 13.1
1611 1185 40.5   1841 319 11.5
1621 1107 38.2   1851 280 10.2
1631 1322 36.6   1861 253 9.1
1641 1092 35.2   1871 229 8.0
1651 1189 34.2   1881 207 7.0
1661 1227 34.8   1891 203 6.3
1671 1303 36.0   1901 196 5.6
1681 1207 36.3   1911 207 5.1
1691 1131 36.3   1921 212 4.8
1701 1035 35.4   1931 289 4.6
1711 1185 34.2   1941 301 4.4
1721 1059 33.5   1951 267 4.2
1731 935 34.0   1961 225 4.0
1741 1028 32.1   1971 239 3.7
1751 907 31.0   1981 286 3.7
1761 837 29.1   1991 271 3.6

Alan continues:-

"For example..the number of Bardsleys in 1851, from a census count, is about 2,500 and from the multiplier of 10.2 for the population in that year I would conclude that there have been about 25,000 Bardsleys in the UK since 1541 and from the 1991 ratios working backwards that there should be currently about 92 births a year and a current population of 6,900."

In my own case, there were 2,515 Dance's enumerated in 1881 in England and Wales. So by the above table, multiplying by a factor of 7.00 leads to a total of 17,605 live births since 1541.

Alan does warn that "there will be large errors for individual years when you try to apply the calculations to small groups" and later that "significant differences are bound to occur when the number of births per year is small, say less than ten per annum. The vagaries of procreation and pestilence play havoc with the statistics. The greater the number of years that births can be counted over the more accurate will be the estimate. "

I know from the 1996 Electoral Rolls that there are currently some 4,500 Dance's alive in the UK. Multiplying by the latest listed factor, results in a figure of 16,250. Which fits in with the range.

Provisos:-

The Wrigley and Scofield data is based on English counties: the RG figures are for England and Wales. This may cause some slight discrepancy.


Estimating at a single point, post 1801

Clive Essery details on his site possible causes of any errors:-

And some interesting Excel graphs

A further refinement would be to compare the birth rate of your name against that of the national crude birth rate

 


How many alive today?

1) Find some organisation with access to the Electoral Registers on CD-Rom, and plead with them. Remember that the results will be for those of voting age alone, and need to be adjusted by a percentage of those nationally under-17.

2) The UK-Info disc claims to contain the Electoral Register -but it is full of duplications. It may not be practicable to eliminate these.

3) Multiply the 1881 figures, by the percentage increase in the national population since 1881.

4) If you have the GRO data, use Clive Essery's approach

5) Donovan Murrells has devised the following technique, from a full set of GRO data:-

For the current last 10 years of available GRO data, calculate the average age at death
Subtract this figure from the last year of GRO data available.
Count the number of births, less the number of deaths in this period
This figure will almost give you the total alive
You need just to factor in for those who have lived passed the average age at death, and are consequently still alive. This factor can be derived from ONS population data


For example

For 1980-1990, the average age at death is 80
In the period, 1910-1990, there have been 5,000 births, but 1000 have died in the period.
This leaves 4,000 plus another factor for those alive in 1990. In this case, I have decided the factor is 250.

5b) Rex Leaver in his 'Families on the move' article takes an even more direct approach:-

"The population bearers of a particular surname can be estimated for any date after, say 1966 (100 years after the index began to show age at death). From each cohort of births in the previous 100 years the appropriate deaths can be subtracted year by year until the survivors are identified at the date in question. This total is then the base from which the population in any earlier year is calculated by adding deaths and subtracting births in the relevant interval. (Deaths abroad through emigration, war service and other causes can undermine the precision of these calculations, but they still leave the estimates reasonably accurate" (Local Historian, May 1990)


J D Porteous

These are the methods used in J Douglas Porteus 'Surname Geography' Transactions of the Institute of British Geographers. Professor Porteous attempts to marry data derived from the IGI, GRO and a current questionnaire. Whether this data nests together without weighting might be a subject for discussion. Nonetheless the principle of using a variety of sources to produce a long timelime is judicious. The article has some excellent graphs, but the author does not always explain how he derived his data, so it is difficult to fully evaluate the effectiveness of his methods.
For example, he talks of pre-censal crude county birth rates with no indication of where this data exists.

illustrative figures

1) Vital statistics graphs -Plot the birth/death figures on the same graph. Add the median line for both.
Is there the normal rapid increase in the number of events before circa 1910, followed by an equally rapid decine

2) Ranking- Sort the births by county, and then by period. Rank the counties for each period, and draw a line graph for each of the keading counties

3) Barcharts-Convert the above data to barchart format. A county may remain predominant throughout, or be displaced by others.

4) Relative birth rates (births per million population) by county by period. Numbers placed on a series of county maps, and shaded accordingly. Time frames: 1538-1637, 1638-1737, 1738-1837, 1838-1865, 1866-1894, 1895-1923, 1924-1952, 1953-1979
Source of County population estimates pre-1801? Rickman? Mitchell?
illustrative example needed

5) Location quotient
Comparison of the birth rate of your family name, against the county birth rate, and plotted on a graph over time.
(Unity refers to a county birth rate equalling the national rate)
Porteous cites as his sources the IGI, and the civil registration indexes. It is not evident how he is deriving a county birth rate pre-1837 from the IGI (and rather should it be the crude baptism rate)

6) Timelines
For each significant county, create a box with space for 8 numbers
Enter the relevant numbers in the box, which refer to at least one birth event recorded in the period

0 1538-1637
1 1638-1737
2 1738-1837
3 1838-1865
4 1866-1894
5 1895-1923
6 1924-1952
7 1953-1979
8 1980-2000

Thus:-

North Riding - - 2 - - - 6 7 8
West Riding 0 1 2 3 4 5 6 7 8
Lincolnshire - 1 2 3 - - - - 8
London - - - - - 5 6 7 8

Each row could be positioned on a map, or presented as an overall table.

If a particular area, substitute the number of events in the boxes

  1538-1637 1638-1737 1738-1837 1838-1865 1866-1894 1895-1923 1924-1952 1953-1979 1980-2000
North Riding - - 5 - - - 10 8 6
West Riding 5 5 7 7 9 15 8 7 4

 

Time-lines can be indepth for a county. As an example, here are the Worcestershire IGI Dances, expressed as an all event timeline. A red box signifies an event. A grey box signifies that there is no IGI coverage of the parish-time, and that the actual registers may hold further Dance's. Each time-interval is 20 years.

 

7) Correlation

Rank correlation of x major counties in terms of births, 1838-1979
  1838-1865 1866-1894 1895-1923 1924-1952 1953-1979
1838-1865 - 0.429 0.943* 0.943* 0.873
1866-1894   - 0.929* 0.929* 0.771
1895-1923     - 1.000* 0.995*
1924-1952       - 0.995*
1953-1979         -
df=4; significant at 0.05%

Explanation of how this correlation table is derived is needed
Its significance is that it reveals in the case of this name, a persistent place-loyalt at county-level, rather than long-distance migration

8) Time-scale paths
A series of bifurcating lines (x axis =time; y axis =distance from origin in miles (logarithmic scale), which shows how far migration has affected the holders of a surname who originate from a common ancestor
Illustrative graph

 


To come:-

1881 census
Geoff Riggs on the Goons 1881 Census project
-
definition of Density


If you came to this page directly, then please access
Modern British Surname Studies
Last revised: October 08, 2000
.