Obfuscate no more: why your email address should go au naturale

screenshot of the obfuscation decoder demoI was recently redesigning my homepage, and I wanted to include my email address.  I knew that only n00b looz3rz display their addy in plain site for spambots to harvest, so I applied a little light obfuscation,  like they do on php.net and million other sites: “myname at jasonpriem dot com.”

“Take that, spammer scum!” I thought as I finished, basking in my newfound invulnerability to the v1@gr@-hawking vermin.  After all, if lots of people use address munging, it must work, right?

Right?

Darn it, now I’ve got to start reading about it.  So I did.  And after a few hours of reading blogs and writing code, I am now an Expert With Advice (hey, this is the internet).  And the advice is this:

Stop trying to obfuscate your email address.  Stop now.

I’ve got two reasons (and for a few more, some other folks have blogged about this, too).  First, the more theoretical one:

Spam is a problem for you–obfuscation makes it a problem for your users.

After all, they’re the ones who are going to have to do all the de-munging.  Are they always going to notice that they have to remove “.invalid” from the end?  Do they all know that the English “at” means “@”?   Do they have time to edit text in their address lines?   Address munging is fundamentally inelegant, because it intentionally works against clarity.

People have been making this argument for a very long time. It’s particularly relevant nowadays, though, because of the growing promise of the semantic web.  We want data to be machine readable, because then we can do cool stuff with it.  FOAF and the hCard microformat are pretty pointless if they don’t have real email addresses to work with.  “Hide the data from the machines” is a good strategy for fighting Skynet, but not for the future of the web.  Ok, reason two:

Address munging just doesn’t work.

It can’t.  It’s putting glasses on Superman.  Although in theory a valid email can be pretty hard to identify, in practice, emails addresses use a very limited vocabulary–and computers are good at identifying limited vocabularies.  Don’t forget, everyone has been using the same old [at] and “dot” tricks for decades–this is security through obscurity at its very worst.

But don’t take my word for it.  I took a couple hours and worked up a demo email obfuscation decoder that breaks the vast majority of text-based obfuscations; it’s also got an input field for you to test out your own munges (some other people have built similar demos, too).  It’s not perfect, but it correctly decodes most obfuscations–and remember that this is a novice programmer, working for an afternoon.  It’s that easy. Supporters of obfuscation argue that spammers will go after the low-hanging fruit; folks, text-based obfuscation is the low-hanging fruit.

Now, the Alert Reader has by this time noticed that I’ve limited my critique to text-based munging.  “What about more sophisticated methods,” the Alert Reader now asks?  “What about using an image, or CSS, or Javascript to hide addresses?”  Good questions, Alert Reader; you are very alert.  Alright, let’s take a quick look at these, too:

Images

There’s not really much I can say about this one, save this: making content completely opaque to visually-impaired users simply shouldn’t be an option. And of course, spammers still can OCR your images.

CSS

Obviously, something like  foo@bar<span style=”display:none”>NULL</span>.com is silly; the spambot can filter out “display:none” spans pretty easily, or even just discard everything in a span.  <span class=’a’>foo</span><span class=’b’>bar</span>@“<span class=’c’>foo</span><span class=’d’>bar</span>.com at least requires the bot to open your stylesheet to see which spans are hidden.  But remember, your server will happily dish out your easily-parsed css to anyone who asks for it; this is not a good place to hide secrets.

Javascript

There are too many js methods to cover in any detail here.  Some are better than others; a few try to degrade gracefully for users without Javascript support.  All of them, though, share the same weakness as CSS: everyone can read your Javascript.  And you certainly don’t need a browser to run it; there are lots of JS interpreters that are more than happy to run on a spammer’s server.

Sure, you can get pretty clever with this technique (I particularly like the idea of decoding not on the onload event, but on a click event), but you can’t change the fact that ultimately the bad guys can do everything with your code that a browser does–and eventually, they will.

Now, I’ll admit that images, CSS, and Javascript approaches are more effective than text-based ones.  All of them (when done properly) require the spammer to pay for more bandwidth and/or processor cycles.  But they all also inconvenience some or all of your users, and none of them are compatible with the sementic web.  They all give you false sense of security, and they’re ugly, hackish solutions. True, some obfuscations have performed well empirically–but keep in mind that these (pretty informal) experiments are years old.  As more people have adopted these measures, be sure that more spammers are spending the time to counter them, as well.

Now, I can’t go so far as to condemn anyone who obfuscates an address; I get that spam is a pain, and filters aren’t perfect.  Sometimes an ugly, hackish solution is the only way.  But I’m suggesting that you think twice before you give in to the spammers and obfuscate, especially given the relative ineffectiveness of many commonly-used methods.  The Web reaches its full promise when information is made easier to find, not harder.

23 Comments

  1. Posted June 23, 2009 at 1:41 pm | Permalink

    Your points on the failure of address munging are well-taken, but I would like to object to the general thrust of your argument: “Spam is a problem for you – obfuscation makes it a problem for your users.”

    The thing is, spam *filtering* is a problem for your users, too. As a responsible webmaster, I would rather provide as many readers as possible with a way to contact me that they can be certain will actually *work*. The “take all of the problems onto yourself – just post it unobfuscated” path leads either to a mailbox full of crap that users’ letters get lost in, or to aggressive filtering that prevents you from seeing their messages in the first place.

    Obfuscation (and here I speak of the image/CSS/Javascript variety) seems to me to be the least bad of the available options. I say this as a websurfer as well. If I’m going to take the time to send a private message to someone whose work I have just read, I’m already going out of my way to open up a communication channel, and a few seconds of checking their address is inconsequential compared to the time I’m going to spend typing at them anyway.

  2. A Guy Who Uses Gmail
    Posted June 24, 2009 at 12:25 am | Permalink

    I use GMail. Despite the fact that I get several spams an hour none appear in my inbox. All appear in the spam folder and I never look at them. To my knowledge I have never missed a legit email (except for credit card statements from one company which GMail marked as phishing attempts until I whitelisted them by clicking “this is not a phishing attempt”).
    So I’m quite happy to post my email anywhere. Heck I’d write it on a bathroom wall.
    Hey spammers: it’s angusgraham@gmail.com – go nuts.

  3. Posted June 24, 2009 at 7:57 am | Permalink

    I’m with Jason; the burden should be on the spammers, not the users. I am worried about false identification – I tend to regularly scan my caught spam as I do find too much real stuff trapped. However, that’s *my* burden, nor the sender’s burden.

    Having managed Tech Support and been engaged in customer service and marketing, it is hard enough to get users to contact with queries, problems and purchases. Anything that gets in the way, including a few seconds of checking and rekeying, is too much.

  4. jason
    Posted June 25, 2009 at 3:57 pm | Permalink

    Baxil, I see where you are coming from; I staked out a pretty extreme position in my original post partly for the sake of argument. That said, though, I’m not I’m very convinced by your case.

    You say that “spam *filtering* is a problem for your users;” I don’t see how that’s true. You offer two choices:

    either to a mailbox full of crap that users’ letters get lost in, or to aggressive filtering that prevents you from seeing their messages in the first place.

    But I see at least two other options:

    • better spam filtering or
    • more permissive spam filter settings and more “eyeball filtering”

    True, manually scanning through lot of spam is a pain for you–but that was my whole point. It’s a pain for you, not your users.

  5. jason
    Posted June 25, 2009 at 4:01 pm | Permalink

    @Gmail guy: now that’s some impressive putting your money where your mouth is.

    @Jeremy: “It is hard enough to get users to contact…anything that gets in the way…is too much.” Exactly. Because contact is so important, I would think this is the last place you want to put any kind of obstacle in front of the user.

  6. Posted July 9, 2009 at 10:41 pm | Permalink

    Yes!! Stop obfuscating and
    Yes!! Use Gmail… I left Gmail for a short period for various reasons (I am using IMAP to have best of both worlds, and temporarily unforwarded my personal domain from my gmail for awhile) – after a week or so of dealing with the spam, I was running back to gmail – it is far away the most accurate spam detector around.

  7. Posted September 13, 2009 at 12:48 pm | Permalink

    Just like everyone else I got attacked by unwanted emails on a daily bases, so I went on a quest to find a solution to stop it.
    Part 1
    The first step I took was jumping on the other side of the river and think like a spammer. I started to search for software that does the harvesting of emails on the internet.
    Using keywords such as “emails, harvest and extract” on Google and I ended up looking at hundreds of software listings, offering an easy way to attack unprotected emails in a few steps…
    I picked up software, called EmailSpiderGold to test. Within a couple of hours I ended up in harvesting 15000 webmasters emails to use on my discretion.
    Along the way I learned that, on the open are several ways to verify that those emails are active as the very developers also offer Email Verifiers which along many characteristics it checks the validity of recipient’s e-mails addresses by connecting to SMTP-servers and simulating the sending of a message and they work pretty smart too as they disconnect as soon as the mail server informs the program whether the address exists or not. On this conclusion we end up thinking that once the email is out there everyone can harvest it and use it without discretion for their own purpose.
    Part 2
    Solutions…
    I came across to several solutions being offered to prevent the emails from harvesting campaigns. Amongst them I found some interesting ones using java scripts to obfuscate the coding on the page.
    Strangely, I didn’t come across with anyone using their own encryption to publish their email on the web page.
    Their lack of confidence was the answer for me.
    Accidentally I got in touch with an old time software developer that shared the same frustration named Peter Johansson; together we joined forces and experiences to develop a shield to the issue. Only recently we had a winner called ATG, an Anti-Spam Tag Generator with advanced features that hides the real address from robotic harvesters. We tested it and it has proved to work.

    E.Hoxha

  8. jason
    Posted September 13, 2009 at 8:02 pm | Permalink

    Elton, I agree with you that “once the email is out there everyone can harvest it.” In fact, my point is that we should be trying to make it easy to get. Most obfuscations challenge users more than spambots.

    I also agree that, for the time being, Javascript-based obfuscation holds the most promise. It’s not a silver bullet, though, as I discuss in my post. The ATG product you mentioned (and sell on your site as a downloadable exe) is a good example. Let’s take a look at what ATG cranks out:

    <script type="text/javascript">
    function SLMEJMBF(A){
    var S = String.fromCharCode(109,97,105,108,116,111,58,116,101,115,»
    116,64,101,120,97,109,112,108,101,46,99,111,109);
    A.href = S;
    }
    </script>
    <a href="#" onmouseover="SLMEJMBF(this);" onfocus="SLMEJMBF(this);">mail example</a>

    For starters, if the client has javascript disabled, it breaks completely. That means tough luck, NoScript user: no email for you. This isn’t an insurmountable problem, though; check out Philip Hutchison’s gracefully-degrading script, for example.

    Second, the “encryption” you use is pretty trivial. You rely on Javascript’s “fromCharCode” method to read the munged address–so can the harvester. I added a simple function to my de-obfuscator demo to show how easy this is (it’s example 11).

    If I can break this munge with a 10-line function in a few minutes, trust me: someone else already has. Granted, this gets a lot harder to beat if you get just a little trickier; for instance, you might try breaking the address down into 10 strings and then concatenate them out of order–now a simple regex isn’t enough.

    But the basic problem hasn’t gone away: your server dishes out your unencrypted Javascript to anyone who wants it, no questions asked. That makes it a fundamentally bad place to put secrets.

    Thanks for your comment, and good luck with ATG!

  9. Posted November 13, 2009 at 11:04 am | Permalink

    with gmail any address that has periods in it are ignored as well as anything after the plus (+) so

    c.a.r.t.e.r@cartercole.com is the same as
    carter.@cartercole.com is the same as
    c.a.rter+spam@cartercole.com

    so i can filter or send any form of my address to spam and know where it was harvested from

    to get around this you could find addresses with gmail domain and remove everything after the + and the periods so its the clean version (unless the normal address has a period like carter.cole@dadada.com)

  10. Posted November 13, 2009 at 11:04 am | Permalink

    and the fact that foo@cool.com auto links as a mailto: anchor really helps…

  11. Rory
    Posted December 14, 2009 at 7:11 am | Permalink

    Another thumbs up for gmail. On average a lottery win or african millionaire only gets through about once a month and to my knowledge (i do check my spam folder periodically), I have not missed any valid emails.

  12. Bobby
    Posted December 20, 2009 at 5:10 pm | Permalink

    I’m extremely inexperienced in javascript and web programming in general, so forgive me if this sounds dumb, but what do you think about using a text field and emailing yourself whatever text the user has entered? Is there a way to make this method more impregnable (as far as maintaining email address secrecy is concerned)? I know it doesn’t take much to write a bot that spams such a system with messages, but methinks this presents a more controllable environment.

    On a different note, is there some way to use raw IP addresses instead of URLs that could throw a bot scanning for “x”@”y”.”z” off balance without overly confusing a desirable emailer?

    Another possibility is the use of email aliasing. Maintain a single account for which you keep the direct address secret. Create an alias on an email server and use that alias “au naturale” in your sites, forwarding messages sent to it on to your central account. When you start to get too much spam through that alias, create a new one and repeat. Is this a reasonable solution, or am I misunderstanding some easy-to-foil step in this process?

    Finally, a comment: I notice you keep mentioning that even the most complex obfuscation methods are easily discovered and routed if someone just looks at the code. Well… since there are so many possibilities, any spammer (hell, any programmer) would be hard-pressed to write a bot that could break them all by automation. A spammer would have to look at the code personally for each potential address to be certain of even (I’m guessing) 50% success… why bother, when the spammer could just look at the email address directly as the browser renders it on the page? What I’m trying to say is, I don’t see how “they can figure out how your code works by looking at it” is a reasonable argument against javascript-powered address obfuscation. The goal of obfuscation is to necessitate a human individual’s involvement in the identification of your email address with minimal confusion to that individual… methinks a sophisticated javascript obfuscation method accomplishes that goal.

  13. Posted March 28, 2011 at 11:25 pm | Permalink

    @Bobby
    finally somebody that understands the problem :)

    “NEVER obfuscate” can’t be the solution – actually, its the problem.
    Like Bobby said, a good obfuscation prevents 99% of all bots (and people behind it) to harvest your email.
    simply because it needs manpower and money to manually adjust the bot to your algorithm.

    of course something like “me at domain dot com” is stupid as hell. of course any parser in the world can translate hex-coded emails.
    it has to be a mix of different things combined with some robust javascript algorithm and you are fine.
    fallbacks for non js users included.
    maybe even a changing algorithm (based on the time of day/month).

    but NOT obfuscating is worse!

  14. jason
    Posted March 29, 2011 at 2:17 am | Permalink

    @Bobby:

    I don’t see how “they can figure out how your code works by looking at it” is a reasonable argument against javascript-powered address obfuscation.

    Well, it’s a reasonable argument because “encoding” an email with javascript is like sending a coded letter with the codebook in the same envelope. You are including the instructions on how to decode what you sent. As you point out (and I point out in my post), there are plenty of ways to interpret javascript on the server, and then simply read the resulting page. There’s no need for the spammer to “look at the code personally.”

    Email aliasing is a cute idea, but what about the legit user who saves your alias in her address book, only to have her emails bounced later when you discard that address? Is your spam filter really so ineffective that you have to resort to this shell game to read your email?

    @mark, I agree that “a mix of different things combined with some robust javascript algorithm,” plus a viable fallback for users with out js, plus a changing, time-based salting function…well, that’s likely to keep you spammer-free, for the most part. For now. I envy you for the time you have to put into all of this.

    But you haven’t solved the problem that you still send spammers the codebook along with the code. Eventually, someone’s going to write programs to read what you’re sending them, and then your spaghetti-code, security-through-obscurity “encryption” will have to be revised. Surely this sort of pointless spy-vs-spy is not the best use we have for our time?

    As I wrote in the original post, I’m not trying to throw the book at anyone here. Spam is a pain, I get it. And obfuscation can cut down on it. My points are that:

    1. The common, text-based techniques are kind of stupid.
    2. JS techniques can help, but they are by their nature both vulnerable and inelegant.
    3. The future of the Web is in making things more machine-readable, not less. Munging is on the wrong side of history.

    If your spam filter doesn’t work, by all means: do what you got to do. You’ll get no condemnation from me. But I’ve not heard anything yet to make me revise these three points.

  15. Jonathan
    Posted June 1, 2011 at 6:10 pm | Permalink

    Another systemic flaw with JavaScript obfuscation techniques is that people end up lazily copying each other.

    First, someone writes a “good obfuscation” program. Then, they blog about it to the world, make it into a downloadable plugin, or start selling it. Even a really clever obfuscation code, once slavishly copied by millions of web sites, will become worth the manpower and money for spammers to adjust their bots to intercept it.

    If you are going to use JavaScript, at least put some effort in to make up your own completely unique code. Otherwise you are hanging the fruit lower than you need to.

    And if you are one of the millions of developers who are fans of basic ROT13 email obfuscation, god help you.

  16. Wouter Bosgra
    Posted October 3, 2011 at 4:14 pm | Permalink

    I totally agree with this blogpost, and to my surprise most of my clients do as well – I guess spamfilters have evolved.

    However there are always exceptions to the rule, and I’m dealing with one now (which brought me here): Email-addresses posted in reactions by your visitors. These folks DO need to be protected against themselves. Except for bold people like ‘A Guy Who Uses Gmail’ maybe.

  17. jason
    Posted October 3, 2011 at 5:47 pm | Permalink

    A fair point, Wouter. People should be able to make their own informed choice about who has their email, and how it’s displayed. It’s probably not called for to make that choice for visitors.

  18. Joe Hourclé
    Posted May 31, 2012 at 4:14 am | Permalink

    There’s no reason to make it *too* easy for spam harvesters by posting naked e-mail addresses. I’ve been directly placing ‘mailto’ links into websites for years, and on one website with a half-dozen e-mail links, only *one* gets spam. (about 3-5 per day, and that address is the only one that’s been posted non-obfuscated when the site first went live a few years back, and gets used when posting on other websites). I’ve had to stop using the ‘+comment’ hack, as my ISP’s outsourced spam-filtering won’t deliver those to me. (and unlike others — it *does* get false positives, so I shut it off for some e-mail addresses)

    I use a mix of URI escaping + HTML entity encoding. I posted the technique to one of the BareBones mailing lists (bbedit-talk? web-authoring?) years ago, but I rarely see it in use in the wild (at least, no so much that the spam bots seem to use it). See the following perl script, which if fed an e-mail address will generate a user-friendly clickable (but obfuscated) mailto link:

    #!/usr/local/bin/perl –

    local $/ = undef;
    my @temp = split (m//, shift @ARGV) ;

    print ‘
    , join (”, map { sprintf (‘&#%d;’, (unpack ‘C’, $_)) } @temp)
    , ‘

    ;

  19. Joe Hourclé
    Posted May 31, 2012 at 5:22 am | Permalink

    And of course the blog software completely ate that script … and the sites it used to be posted on are have since moved, and aren’t in archive.org …

    Here’s what I *think* will come out as something copy and pastable from the way it did things … if not, email me: oneiros@annoying.org

    #!/usr/local/bin/perl –

    local $/ = undef;
    my @temp = split (m//, shift @ARGV) ;

    print ‘<<a href="'
    , join ('', map { sprintf ('&#%d;', (unpack 'C', $_)) }
    ( qw ( m a i l t o : ), split ( //,
    join('', map{ sprintf ( '%%%x', unpack('C', $_) ) } @temp ))))
    , '">'
    , join ('', map { sprintf ('&#%d;', (unpack 'C', $_)) } @temp)
    , '</a>'
    ;

  20. Craig
    Posted November 2, 2012 at 1:53 pm | Permalink

    @mark

    How can you bet everything on a JavaScript “solution” and then suggest providing a “fallback for non js users”?

    That’s akin to saying “we’ll lock all this money in Fort Knox but if a bank robber comes in complaining about how hard it is to steal, we’ll fall back on just handing it over to him.”

    Also, content should never, NEVER rely on the use of JavaScript. That’s the worst possible non-solution to an otherwise barely annoying problem.

    Using a combination of a few primitive obfuscation methods is much more effective than you might think. Writing a matcher for a bunch of known addresses is brain-dead easy but all this proves is that you can write a regular expression to match an address you wrote yourself.

    By using something like hex entities combined with span wrappers, you start to get into the territory of forcing people to write a full blown HTML parser. People who have written a HTML parser know what the edge-cases are and what causes a naive parser to puke. All you need to do is add some of that stuff into your span wrappers and add a few dummy HTML entities inside class attributes.

    Unless someone really does use a correct, normalising HTML parser, it becomes very easy to start mismatching hundreds of false-positives and missing hundreds of real addresses. Why would someone make their matcher much more complex and much slower for non-deterministic results? Every false-positive address that gets added to their database is extra overhead for their mail senders. If they do it recklessly to prove a point, they either go bald from all the extra work they have to do or their database gets filled up with time-wasting junk, which may never get purged.

    @Jason, I challenge you to write a script that matches the output of my (yet unwritten) random address generator with greater than 50% reliability. No JavaScript, just plain HTML. Accept?

  21. Brian Teller
    Posted February 28, 2013 at 11:07 am | Permalink

    As much as its a pain… it sounds like Mail Forms are the only solutions. The “e-mailing” and “verifying” are all done server side (via Perl or PHP) so the “codebook” is never seen by the end user (or spammer). Since a true e-mail address is only e-mailed if the mail form is validated (through whatever process you want to use) the end result of emailing should 99% result in a non-spam/valid e-mail.

  22. Posted May 20, 2013 at 4:47 pm | Permalink

    Thanks for the post and constructive discussion. The original post presumes that only the webmanager or content creator will have their address online. Many sites need to post a variety of contacts for staff, committee members, event hosts, etc. Not all of these are technically savvy, nor facile with email. Posting such addresses should be transparent to naive computer users, but not trigger harvesting.

    It seems that an unobfuscated, linked, primary contact address such as , changing as spam rates increase, and applying more aggressive spam filtering to retired variants, would be manageable and not complicate normal contact.

    For other addresses that may appear on the site, i would simply avoid clues that an address is present (e.g. “email”, :mailto”, “@”, “.at.”, “(a)”, “-DOT-”, or “..”) Inserting a generically named image in place of the @ may be sufficient to keep spam harvesters from recognising an address, while remaining simple to copy and fix for most visitors. Of course, this would not be effective for long lists or collections of emails.

  23. Posted October 18, 2013 at 2:28 am | Permalink

    Hello Jason
    I do agree with all the points that you mentioned about the relative ineffectiveness of many commonly-used methods to fight spam.
    Considered all that, I came up with a creative solution:

    The Solution – General Description

    This plugin is based on mcrypt php library. Both the encryption and decryption occurs on the server. JavaScript is used (an AJAX Post Request) to dynamically contact the server, where the e-mail decryption happens and send back the results. Regardless of how many e-mails are on a webpage only ONE AJAX request takes place. It utilizes the load event which means that only when the page is fully loaded the request – response happens (it will not slow down your page rendering). The actual display happens when you hover over the e-mail. Native JavaScript is used (no library dependencies). It is lightning fast and only 4 kb small.

    Major Functionality

    Encrypt and decrypt linked (mailto) e-mails and/or plaintext e-mails. You have the ability to activate and deactivate these choices. It has also shortcode functionality in case that you want to do it manually.
    Graceful degradation: Disabled JavaScript? No problem – if you click the e-mail link you will go to a page to fill in a question form. If you answer the question right the email will appear. You have the ability to set your own question and answer. This is great for browsers that do not support JavaScript (visually impaired people).
    For extra security the encrypter uses cookies – If cookies are disabled the e-mail(s) will not be revealed. According to projecthoneypot website: http://www.projecthoneypot.org/how_to_avoid_spambots_4.php “robots typically do not handle cookies. While it would be possible for spambots to deal with cookies as they traverse the web, it would add substantially to their overhead and, in turn, increase the costs to spammers stealing addresses. Again, we suggest that if a visitor to your site does not accept cookies you consider hiding the addresses displayed or restricting access to your contact page.”

    Please have a look at the website, test it and let me know your opinion:
    http://the-never-never-land.com/encrypt-email-address-wordpress-plugin/

2 Trackbacks

  1. By Why obfuscate email addresses? on November 3, 2012 at 3:22 am

    [...] some may argue that email obfuscation isn’t worth the hassle, I’m not so sure. I don’t like techniques that force the end user to do extra work, but [...]

  2. [...] through obscurity, but they also defeat the purpose of having user-friendly contact details. It is sometimes argued that this method is no longer worth the effort, but here are some statistics. Busy people will be [...]

Post a Comment

Your email is never shared. Required fields are marked *

*
*