Use Zotero in a separate window

zotero-two-screens1

As I’ve written before, I love the free citation manager Zotero.   And the group and sharing features that just dropped as part of v2.0b7, while still a little buggy, are taking the awesomeness up another level.

But one thing about Zotero has always really annoyed me: the horizantally-split screen.  I never feel like I have enough vertical context for either my Zotero library or the web page I’m viewing.   Meanwhile, I’ve got a whole ‘nother monitor just sitting there empty. Some other folks have complained about this too, suggesting a sidebar view for Zotero.

Today, though, I realized that there’s a really obvious solution: just open up a new Firefox window (ctrl+n), put it on my other monitor, and display Zotero full-screen there.  Dual-monitor workflow bliss.

Obfuscate no more: why your email address should go au naturale

screenshot of the obfuscation decoder demoI was recently redesigning my homepage, and I wanted to include my email address.  I knew that only n00b looz3rz display their addy in plain site for spambots to harvest, so I applied a little light obfuscation,  like they do on php.net and million other sites: “myname at jasonpriem dot com.”

“Take that, spammer scum!” I thought as I finished, basking in my newfound invulnerability to the v1@gr@-hawking vermin.  After all, if lots of people use address munging, it must work, right?

Right?

Darn it, now I’ve got to start reading about it.  So I did.  And after a few hours of reading blogs and writing code, I am now an Expert With Advice (hey, this is the internet).  And the advice is this:

Stop trying to obfuscate your email address.  Stop now.

I’ve got two reasons (and for a few more, some other folks have blogged about this, too).  First, the more theoretical one:

Spam is a problem for you–obfuscation makes it a problem for your users.

After all, they’re the ones who are going to have to do all the de-munging.  Are they always going to notice that they have to remove “.invalid” from the end?  Do they all know that the English “at” means “@”?   Do they have time to edit text in their address lines?   Address munging is fundamentally inelegant, because it intentionally works against clarity.

People have been making this argument for a very long time. It’s particularly relevant nowadays, though, because of the growing promise of the semantic web.  We want data to be machine readable, because then we can do cool stuff with it.  FOAF and the hCard microformat are pretty pointless if they don’t have real email addresses to work with.  “Hide the data from the machines” is a good strategy for fighting Skynet, but not for the future of the web.  Ok, reason two:

Address munging just doesn’t work.

It can’t.  It’s putting glasses on Superman.  Although in theory a valid email can be pretty hard to identify, in practice, emails addresses use a very limited vocabulary–and computers are good at identifying limited vocabularies.  Don’t forget, everyone has been using the same old [at] and “dot” tricks for decades–this is security through obscurity at its very worst.

But don’t take my word for it.  I took a couple hours and worked up a demo email obfuscation decoder that breaks the vast majority of text-based obfuscations; it’s also got an input field for you to test out your own munges (some other people have built similar demos, too).  It’s not perfect, but it correctly decodes most obfuscations–and remember that this is a novice programmer, working for an afternoon.  It’s that easy. Supporters of obfuscation argue that spammers will go after the low-hanging fruit; folks, text-based obfuscation is the low-hanging fruit.

Now, the Alert Reader has by this time noticed that I’ve limited my critique to text-based munging.  “What about more sophisticated methods,” the Alert Reader now asks?  “What about using an image, or CSS, or Javascript to hide addresses?”  Good questions, Alert Reader; you are very alert.  Alright, let’s take a quick look at these, too:

Images

There’s not really much I can say about this one, save this: making content completely opaque to visually-impaired users simply shouldn’t be an option. And of course, spammers still can OCR your images.

CSS

Obviously, something like  foo@bar<span style=”display:none”>NULL</span>.com is silly; the spambot can filter out “display:none” spans pretty easily, or even just discard everything in a span.  <span class=’a’>foo</span><span class=’b’>bar</span>@“<span class=’c’>foo</span><span class=’d’>bar</span>.com at least requires the bot to open your stylesheet to see which spans are hidden.  But remember, your server will happily dish out your easily-parsed css to anyone who asks for it; this is not a good place to hide secrets.

Javascript

There are too many js methods to cover in any detail here.  Some are better than others; a few try to degrade gracefully for users without Javascript support.  All of them, though, share the same weakness as CSS: everyone can read your Javascript.  And you certainly don’t need a browser to run it; there are lots of JS interpreters that are more than happy to run on a spammer’s server.

Sure, you can get pretty clever with this technique (I particularly like the idea of decoding not on the onload event, but on a click event), but you can’t change the fact that ultimately the bad guys can do everything with your code that a browser does–and eventually, they will.

Now, I’ll admit that images, CSS, and Javascript approaches are more effective than text-based ones.  All of them (when done properly) require the spammer to pay for more bandwidth and/or processor cycles.  But they all also inconvenience some or all of your users, and none of them are compatible with the sementic web.  They all give you false sense of security, and they’re ugly, hackish solutions. True, some obfuscations have performed well empirically–but keep in mind that these (pretty informal) experiments are years old.  As more people have adopted these measures, be sure that more spammers are spending the time to counter them, as well.

Now, I can’t go so far as to condemn anyone who obfuscates an address; I get that spam is a pain, and filters aren’t perfect.  Sometimes an ugly, hackish solution is the only way.  But I’m suggesting that you think twice before you give in to the spammers and obfuscate, especially given the relative ineffectiveness of many commonly-used methods.  The Web reaches its full promise when information is made easier to find, not harder.

Prezi: presentation junk 2.0

prezi logoIt’s 2009.  I think everyone out there knows that Powerpoint is, at best, overused (at worst:Stalin).  Particularly gruesome is the animated slide-transition “feature,” which I think most agree has the same communication effectiveness and subtle charm as “<blink>” tags, mouse-cursor trails, and hilarious animated gifs of cats.

So how is it that presentation tool Prezi is suddenly the toast of the town?  The quick sell looks like this:

“Prezi allows anyone who can sketch an idea on a napkin to create and perform stunning non-linear presentations with relations, zooming into details, and adjusting to the time left without the need to skip slides.”

I love how the first phrase suggests that there’s this great mass of napkin-sketching geniuses out there who can’t get their ideas out (until now!).  I mean, I like mind maps, but turning one into an outline is pretty easy.   So the presentations are “non-linear.”  Does that mean the audience can interact with them, zooming in on sub-points of interest?  If it does, let me show you this thing called “hyperlinks.”   And is skipping slides really this tremendous problem?

When it comes down to it, the real selling point of Prezi is just the “stunning” presentation.  Now, perhaps I’m jaded, but “zoom-in/zoom-out” leaves me unstunned.  More importantly, though, this seems a textbook example of chartjunk: a “really great” visual effect that serves only to obscure or distract from real information.  I think (hope) it’ll have the lasting appeal of Powerpoint’s racecar-noise-with-flying-in-bullet-point.

Perhaps I’m missing something (feel free to correct me in the comments) or just being curmudgeonly, but I think Prezi is vastly overhyped.  Powerpoint is bad enough.  Also: I like how the Prezi logo, by mixing case, suggests that the product may in fact be called “Pretzl.”  Ok, now that’s definitely being curmudgeonly.

Quick book review: Dreaming in Code

I imagine Scott Rosenberg reckoned he’d picked a winner when he started Dreaming in Code, his 2007 book chronicling the development of the Chandler personal information manager. The project seemed to have everything going for it. It had all the fashionable features: GTD! Open Source! Peer-to-peer! Level the silos! It was headed by software legend Mitch Kapor. It had infinite funding. It had talented programmers with impeccable resumes—decades upon decades of successful experience creating good software.

Over the course of Dreaming, though,  we see this elite team gradually self-destruct. We see vague spec. We see unrealistic deadlines. We see huge mid-stream course changes.  As Rosenberg writes, “By now, I know, any software developer reading this volume has likely thrown it across the room in despair, thinking, ‘Stop the madness! They’re making every mistake in the book!’”  Dreaming finally ends four years into Chandler’s development—with version 1.0 still a distant vision (it was finally released, mostly to yawns, last August ).

Rosenberg, though, is savvy enough to turn the Chandler team’s failure into his own success.  Not only does he use the story to anchor an excellent (if basic) introduction into the practices and quirks of the industry as a whole, he weaves an engrossing and deeply human narrative.

Aristotle said tragedy should evoke fear and pity in the viewer, and Rosenberg deftly supplies us with both. On the one hand, Dreaming reads like watching a horror movie: “No! Why are you splitting up to explore the house!? Why do you keep changing the UI every 6 months!? Noooo!!!!” At the same, Rosenberg does a pretty good job of making us really like many of the characters. Kapor, in particular, comes off as both an intelligent visionary and genuinely good guy. Watching Chandler implode, I feel bad for him.

In interviews, Rosenberg shows again and again how the characters, all experienced programmers, understand the Classic Mistakes. Then he describes with agonizing clarity how they turn right around and proceed to make just those mistakes. I think it’s this quality that put me so in mind of classical tragedy, where the noble hero is undone by just these sorts of tragic flaws or mistakes.

Rosenberg resist the temptation to write another Lessons From Software Failure manual.  Instead he shows how smart, capable programmers working in an ideal environment can reenact the same fatal mistakes programmers were cataloging decades ago. Like Greek drama, Dreaming confronts the ineluctability of failure head-on.  Rosenberg’s ultimate thesis is nothing more or less than the classic words of  Donald Knuth, with which he opens the book: Software is hard. Sophocles would be proud.

Other reviews I liked:

  • Amazon
  • Joel Spolsky: discusses the technical aspects more; doesn’t think Chandler was a very good idea to begin with.  Has some good points, here.
  • Adam Barr: discusses the individual parts of the book more.

FeedVis 2.0: custom visualization for your feeds

this is what feedvis looks like

My FeedVis project–the interactive tagcloud for a group of feeds–has been out for a week now, I’ve been thrilled at the positive response I’ve gotten so far.  One rather glaring problem with the program, though, was that you could only look at the top 50 edublogs.

Not anymore.  After a few late nights, I’ve got a beta system for uploading and analyzing your own sets of feeds.  You just upload your opml, wait a few minutes, and you’re set: FeedVis gives you a custom page that you can bookmark and return to anytime you like; it’ll continue to update every time you visit.  You can also browse visualizations of other people’s feeds.

It’s pretty untested, and I’m sure use will uncover some bugs.  But it’s got potential; I’m excited to see what people think.

FeedVis: a deeper tagcloud for edublogs

a screenshoto of feedvis

Tagclouds have value, but, as I’ve written before, they’ve a number of shortfalls as well.  I’ve just finished my attempt to remedy some of these problems: FeedVis.  It’s an animated tagcloud that lets you compare word frequencies accross different time periods and authors, then check out the posts that used the words.  The demo is using the feeds for Scott McLeod’s Technorati-compiled list of top 50 edublogs, since that’s what got me started about feeds and tagclouds in the first place (although the program will work with any set of feeds).  More details about how it works are on the demo page.

I think what I’m really most excited about is the way this uses animation to let you actually see the words changing from one sample to the next.    Motion is such an important part of the way we see the world, and it’s been underemployed in information visualization, I think (although this changing; Hans Rosling’s TED talks have gotten a lot of buzz, for instance).

The project has been really fun, and a great learning experience; it’s gotten me really pumped about inofVis for learning about online interaction.  I think there is a lot of potential there for ed tech research.  I’m also pretty excited about programming; I started learning in February (with php), and then started javascript a couple months ago.  It’s been a really mind-expanding experience, and I’m looking foward to my next project, probably once I get done with grad school apps.

PrezDebatr 2.0! Beta!

Google is transforming the way we watch a political debate.  This Google Blog post demonstrates how viewers of the VP debate earlier this month made Google searches like “clean coal” and “define:maverick” spike as candidates spoke.  Without question, these viewers are experiencing something much richer than what would have been possible fifteen years ago.

But why stop there?  Why not a service that analyzes this kind of real-time, viewer-supplied data, selects the most interesting bits, and then displays it?  It would function both as a real-time fact-checker and a window into audience’s reactions.

Lots of people already live-blog these things; it would be easy to get several thousand people to submit their questions and search results to a server, using a standardized interface.  The software then just aggregates, organizes, and presents the results.  Volunteers who try to game the system would be shut out with Digg-style, community-driven user ratings.  If Google would make its real-time query data available, that’d be added, too, significantly broadening the sample’s relevance.

Read More »

Grad school: because your uncle at Lehman Bros. is not such a great connection now.

A nice bit of infoVis from the web comic Piled Higher and Deeper.  Kind of not the best news for someone who’s applying to doctoral programs this fall…um, can my app go in a special pile for people who’ve been planning this for years, regardless of what the economy would’ve done?


The trouble with tagclouds

Tag clouds, those darlings of early web 2.0, have been seeing something of a backlash lately. Zeldman was suggesting that tag clouds were the new mullets back in 2005; more lately, ReadWriteWeb wondered if tagclouds were dead altogether. The main complaint in both cases wasn’t that tag clouds were just no good, but that they’d become trendy and thus overused.  Later criticism has argued that the increasingly common practice of using tag clouds for navigation is fundamentally flawed.

But the problems of tag clouds–and their close cousin, word clouds–go deeper, to their usefulness as a visualization method.  These aren’t problems with how the method is used or misused, but with the idea itself.

Moritz Stefaner points out (and presents his own solution for) several problems with the format:

  • tag clouds give a great picture of the “big head” of tags: the most frequently used tags that change little over time; they overlook, though, the “long tail”–where many of the interesting tags are located.
  • tag clouds don’t show change over time.  Chirag Mehta has created a tag cloud with a time slider, which helps with this.  But as Stefaner points out, animating tag clouds doesn’t work very well, as the changing size of the cloud moves the words around so they’re hard to follow.
  • Finally, tag clouds don’t show the relationships between tags (pretty much everyone who criticizes tag clouds mentions this one).

The IBM Many Eyes site has one of the best tag cloud (actually this does word clouds, too) tools I’ve seen, allowing users to get lots of data from each tag while keeping the interface clean and simple.  They make a great point about an inherent limitation of the tool: the size and shape of the words themselves isn’t controlled for.  So, long words seem more dominant than short ones, and words with lots of ascenders and descenders (the vertical strokes of letters like ‘b’ or ‘p’) tend to dominate as well.  This can subtly alter the overall gist that tag clouds are supposed to deliver.

The academic community has noted shortcomings of the technique, as well. Hearst and Rosner (2008) observe that the alphabetical layout of the cloud may lead to a sort of “false clustering” effect, as users misinterpret words because of surrounding tags.  Renninger and Shumar (2007) found that tag cloud quadrants have different rates of recall, a fact which most tag cloud designs ignore.  In fact, their findings suggest that a simple list of tags, ordered by frequency, may deliver a more accurate overall impression than a tag cloud.  Several researchers have sought to improve shortcomings in tag cloud presentation with packing and sorting algorithms that manage whitespace and cluster relevant concepts (Kaser and Lemire, 2007; Seifert, Kump, Kienreich, Granitzer, and Granitzer, 2008).

Now, this isn’t to say that tag clouds have no value; in fact, I think they have great potential. It’s just that we need to know when tag clouds and word clouds are appropriate, know their shortcomings, and (this is the fun part) try to find ways to make them better. Most of the sources cited above have set about doing just that. In my next post, I’ll discuss a few of these “next-generation tag cloud” concepts; in particular, I’ll be examining methods of using word clouds to compare different versions of a text.

Zotero Report Customizer 2.0

As I’ve discussed in a previous post, I’m an enthusiastic user of the free reference manager Zotero; I’m impressed with how such young, open-source product has managed to quickly outshine established, non-free alternatives like EndNote.

One difficulty I (and others) have had with Zotero, though, is in generating reports for a group of articles. Particularly, there’s no way to customize the categories you display in the report. This can be a real problem if you’re trying to share your sources with a co-author; at best, there’s a lot of unneeded metadata cluttering up the document (at worst, your email says you’ve been working on this for weeks, while your articles’ Date Added data tells a different tale…).

Now, I’m told this will be corrected in a later version of Zotero. However, I turned to PHP and a bit o’ regular expression magic to do it now. It turned out to be a good learning project, and I’ve been pleased to see that a few hundred other people (if Google Analytics is to be believed) have gotten some use out of it, too. The tool’s listed in the Zotero documentation, and–by far the most important of all–I got a free Zotero t-shirt out of the deal, which is now my favoritist garment ever.

I’ve also gotten quite a few feature requests from folks, including a request to help localize the script for German (you can find that German-language version here). Since my PHP skills have broadened in the last several months (I’m all the way to “novice” now!), I figured it was time to do an update. So, here is Zotero Report Customizer 2.0. New features include javascript form validation, a bunch of new categories, and the option to specify your own categories to delete if I don’t list ‘em. The script is also a ton easier to modify if you want to customize it to a different language, and can be set up to work in multiple languages at once. (I added a little German support for an example).

Have fun, and if you think of anything else you’d like in this, just let me know.