Spammers have begun slipping their junk past optical character recognition software through a variety of animated .GIF cut-and-paste techniques, according to John Graham-Cumming, an antispam activist who maintains The Spammers’ Compendium and also founded Electric Cloud.
On a recent blog post , Graham-Cumming explains one of the OCR-evading methods first brought to his attention by Nick FitzGerald, a New Zealand antivirus consultant and regular contributor to The Spammers’ Compendium.
“I don’t know how widespread it is,” Graham-Cumming told me. “It’s probably pretty new.”
From the blog post: “The first image is the .GIF’s background and is displayed for 10ms then the second image is layered on top with a transparent background so that the two images merge together and the image the spammer wants you to see appears. That image remains on screen for 100,000 ms (or 1 minute 40 seconds). After that the image is completely blanked out by the third frame.
“My favorite touch is that it’s not the entire image that’s transparent, not even the white background, but just those pixels necessary to make the black pixels underneath show through. If you look carefully above you can see that some of the pixels appear yellow (which is the background color of this site), indicating where the transparency is.”
In our interview, Graham-Cumming acknowledged more than begrudging admiration for what this spammer has achieved.
“What’s really neat about what this guy has done is that he takes a piece of text and he randomly kills pixels in it so that each frame is unreadable,” he told me. “But when you merge them, you get a readable piece of text. It is immensely clever. He’s used animation with transparency in .GIF so what happens is that although this is actually animated you don’t see the animation because the two frames which have got the pixels killed on them are animated together so fast…that it looks like a static image.”
Despite the fact that Graham-Cumming headlined his blog item “Why OCRing spam images is useless,” he tempered that assessment in our talk.
“Saying OCR is useless is an overstatement, of course,” he said. “There will be some value in OCRing because the history of spam shows that there are bleeding-edge spammers who fight to get through every filter and there’s a large pool of spammers who use out-of-date software, essentially, so it’s always worth going with techniques that worked a few months ago.”
And so the arms race continues.
QuickLink: 061753