I was recently redesigning my homepage, and I wanted to include my email address. I knew that only n00b looz3rz display their addy in plain site for spambots to harvest, so I applied a little light obfuscation, like they do on php.net and million other sites: “myname at jasonpriem dot com.”
“Take that, spammer scum!” I thought as I finished, basking in my newfound invulnerability to the v1@gr@-hawking vermin. After all, if lots of people use address munging, it must work, right?
Darn it, now I’ve got to start reading about it. So I did. And after a few hours of reading blogs and writing code, I am now an Expert With Advice (hey, this is the internet). And the advice is this:
Stop trying to obfuscate your email address. Stop now.
Spam is a problem for you–obfuscation makes it a problem for your users.
After all, they’re the ones who are going to have to do all the de-munging. Are they always going to notice that they have to remove “.invalid” from the end? Do they all know that the English “at” means “@”? Do they have time to edit text in their address lines? Address munging is fundamentally inelegant, because it intentionally works against clarity.
People have been making this argument for a very long time. It’s particularly relevant nowadays, though, because of the growing promise of the semantic web. We want data to be machine readable, because then we can do cool stuff with it. FOAF and the hCard microformat are pretty pointless if they don’t have real email addresses to work with. “Hide the data from the machines” is a good strategy for fighting Skynet, but not for the future of the web. Ok, reason two:
Address munging just doesn’t work.
It can’t. It’s putting glasses on Superman. Although in theory a valid email can be pretty hard to identify, in practice, emails addresses use a very limited vocabulary–and computers are good at identifying limited vocabularies. Don’t forget, everyone has been using the same old [at] and “dot” tricks for decades–this is security through obscurity at its very worst.
But don’t take my word for it. I took a couple hours and worked up a demo email obfuscation decoder that breaks the vast majority of text-based obfuscations; it’s also got an input field for you to test out your own munges (some other people have built similar demos, too). It’s not perfect, but it correctly decodes most obfuscations–and remember that this is a novice programmer, working for an afternoon. It’s that easy. Supporters of obfuscation argue that spammers will go after the low-hanging fruit; folks, text-based obfuscation is the low-hanging fruit.
There’s not really much I can say about this one, save this: making content completely opaque to visually-impaired users simply shouldn’t be an option. And of course, spammers still can OCR your images.
Obviously, something like
foo@bar<span style=”display:none”>NULL</span>.com is silly; the spambot can filter out “display:none” spans pretty easily, or even just discard everything in a span.
<span class=’a’>foo</span><span class=’b’>bar</span>@“<span class=’c’>foo</span><span class=’d’>bar</span>.com at least requires the bot to open your stylesheet to see which spans are hidden. But remember, your server will happily dish out your easily-parsed css to anyone who asks for it; this is not a good place to hide secrets.
Sure, you can get pretty clever with this technique (I particularly like the idea of decoding not on the onload event, but on a click event), but you can’t change the fact that ultimately the bad guys can do everything with your code that a browser does–and eventually, they will.
Now, I can’t go so far as to condemn anyone who obfuscates an address; I get that spam is a pain, and filters aren’t perfect. Sometimes an ugly, hackish solution is the only way. But I’m suggesting that you think twice before you give in to the spammers and obfuscate, especially given the relative ineffectiveness of many commonly-used methods. The Web reaches its full promise when information is made easier to find, not harder.