Tips from the Experts
Understanding the Limitations of Soundex
The Soundex is a surname indexing system for the United States 1880,
1900, 1910, and 1920 United States censuses. In addition, New York
passenger arrivals after 1910 and other records are also indexed with
the Soundex system. Legacy [Note: Legacy's Genealogical Software]
contains within it a Soundex calculator. [.....]
The Soundex is a phonetically coded surname index based on the way
a surname sounds rather than the way it is spelled. Same-sounding
surnames, like SMITH, SMITHE, SCHMIDT and SMYTH, are code identically
and are filed together. This aids the researcher in finding names
that sound alike, but have different spellings. The Soundex is not
a perfect system and there are variations in how a surname can be
coded. This is critical to know when you don't find the person you
expected in the Soundex.
The American Soundex System code consist of the first letter of the
name followed by three digits. These three digits are determined by
dropping the letters a, e, i, o, u, h, w and y and adding three digits
from the remaining letters of the name according to the table below.
There are only two additional rules. (1) If two or more consecutive
letters have the same code, they are coded as one letter. (2) If there
are an insufficient numbers of letters to make the three digits, the
remaining digits are set to zero.
Soundex Table
1. b, f, p, v
2. c, g, j, k, q, s, x, z
3. d, t
4. l
5. m, n
6. r
Examples:
Miller M460
Peterson P362
Peters P362
Auerbach A612
Uhrbach U612
Moskowitz M223
Moskovitz M213
Ashcroft A226
To Calculate a Soundex Code by Hand:
1. Print name on a piece of paper.
2. Cross out spaces, punctuation, accents and other marks.
3. Cross out any of the following characters A, E, I, O, U, H, W,
Y (unless first letter of surname).
4. Cross out the second letter of duplicate characters.
5. Cross out the second letter of adjacent characters with the same
Soundex number.
6. Convert characters in positions 2 to 4 to a number.
B, P, F, V = 1
C, S, K, G, J, Q, X, Z = 2
D, T = 3
L = 4
M, N = 5
R = 6
7. Fill any unused positions with zeros e.g.. Lee is L000, Bailey
is B400. There is always one letter followed by 3 numbers.
Soundex Limitations:
· Names that sound alike do not always have the same Soundex
code. For example, Lee (L000) and Leigh (L200) are pronounced identically,
but have different Soundex codes because the silent g in Leigh is
given a code.
· Names that sound alike but start with a different first
letter will always have a different Soundex code. Thus, names such
as Carr (C600) and Karr (K600) should be calculated separately.
· Soundex is based on English pronunciation so European names
may not soundexed correctly. For example, some French surnames with
silent last letters will not code according to pronunciation. This
is true with French name such as Beaux - where the x is silent. Sometimes
this surname is also spelled Beau (B000) and is pronounced identically
to Beaux (B200), yet they will have different Soundex codes. Although
I have given only a French example, this could be true of any name
that does not use English pronunciation.
· Sometimes names that don't sound alike have the same Soundex
code. When I am searching for the surname Powers (P620), I have to
wade through Pierce, Price, Perez and Park which all have the same
Soundex code. Yet Power (P600), a common way to spell Powers 100 years
ago, has a different Soundex code.
· Surnames with prefixes were usually coded without the prefix,
but not always. If you are searching for a surnames such as DiCaprio
or LaBianca, you should try the Soundex for both with and without
the prefix.
· US Census Soundex confusion arises with names such as Ashcraft.
When the original Soundex coder didn't code the H and didn't consider
the H as a separator between the adjacent letters with the same code
S and C , then the S and C would be considered adjacent letters to
be coded only once and the Soundex will be A261. In the 1920 NY Census,
Ashcraft is found under A261.
Those who coded the Soundex for the United States censuses may or
may not have used this rule. They sometimes considered the H as a
separator, and did not code the S and C as adjacent letters that would
only be assigned one letter, but rather gave a number code to each
letter. When coding a name like Ashcraft, the Soundex calculator in
Legacy recognizes these variations in approaches and displays both
A226 and A261.
The important thing to know is that the US Census was not consistent
with using the letter H and W as separators between adjacent letters.
If you are trying to calculate the Soundex for a name with the letters
W or H that separate two adjacent letters, it is best to calculate
the Soundex using the two different methods to locate the name in
the US census. This would be true of any name that has any of the
letters C,S,G,J,K,Q,X,Z on both sides of the letter H or W such as
SHC, SHS, CHS, KHZ, SWS, KWS, CWK.
. A surname of more than one word, or a surname that commonly comes
before a given name, such as Native Americans and Chinese surnames,
may have been coded under the name which appears last, even though
it might not be the actual surname. In the case of multi-word surnames,
only the last word may have been coded.
The National Archives offers a free brochure titled "Using the
Census Soundex," General Information Leaflet 55 (Washington,
DC: National Archives and Records Administration, 1995). The brochure
is available by sending a message to [email protected]
(include your name, postal address, and "GIL 55 please").
Here are three recommended articles
on the subject:
National Archives: The
Soundex Indexing System http://www.archives.gov/research_room/genealogy/census/soundex.html
Kathi Reid: Surname
to Soundex Converter http://www.geocities.com/Heartland/Hills/3916/soundex.html
Gary Mokotoff: Soundexing
and Genealogy
http://www.avotaynu.com/soundex.html
|