In the early 1990s, Mir Aimal Kansi, a known terrorist, was issued a visa and entered the United States from Pakistan. He purchased an AK-47, and on Jan. 25, 1993, he gunned down two CIA agents and wounded three others in front of the agency’s headquarters. He then hopped a flight out of the country, dodging authorities until his capture in 1997. How did Kansi elude government watch lists, apply for a visa, purchase a weapon and plane tickets? Easy: He dropped a single letter from his name.
In Urdu, Pakistan’s native tongue, Kasi is a common variation of Kansi. The Soundex system used by the U.S. government failed to consider this nuance when checking his documentation. Soundex is the name-searching system still used by 90 per cent of American businesses, almost every government department and major airline — even though it was originally developed for the 1890 census. It takes a person’s last name, strips out the vowels and assigns codes to similar-sounding consonants to create a four character code — the first letter followed by three digits to represent the consonants.
But many perfectly innocent travelers have the same Soundex codes as dangerous felons. For example, internationally sought-after terrorist mastermind Osama Bin Laden has the same Soundex code (L350) as Johnny “Rotten” Lydon, former lead singer of British punk rock group The Sex Pistols — whose only crimes have been against music and fashion.
Another reason Soundex is so inaccurate is that it’s blind to the cultural differences between names around the world. It treats three-syllable Asian names in the same manner as it treats eight-syllable Arabic or Hispanic names. The last name Zhang in China becomes Chang in Taiwan, Khiu in Thailand, Cheung in Singapore and Teoh in Malaysia. Soundex is incapable of recognizing that those names may indicate the same person.
Language Analysis Systems Inc. (LAS), is trying to solve this vexing issue. Its name-recognition technology is being used by the Department of Homeland Security, the FBI and banks. LAS customers can identify the cultural classification of a name, explore a phonology-based search engine that ranks results based on similarity of pronunciation, and generate alternative romanized spellings for a name.
With a database of one billion names collected from around the world, LAS believes that it can cut down on the problem of false positives. That should be good news for the security personnel at airports who are sick of asking Johnny Rotten to remove all of those chains.