Category Archives: fun with data

A Web service for random GitHub usernames, via Google BigQuery, R, and CouchDB

In the course of building some much-needed testing infrastructure for total-impact, I found I needed a source of random GitHub usernames. A forum post directed me to the very cool GitHub Archive project, which pushes its extensive collection of GitHub data to Google BigQuery. BigQuery in turn lets you write SQL-style queries on ginormous datasets [...]

Twitter and the new scholarly ecosystem

This is a copy of a guest post I wrote for the LSE Impact of Social Sciences blog: In 1990, Tim Berners-Lee created the Web as a tool for scholarly communication at CERN. In the two decades since, his creation has gone on to transform practically every enterprise imaginable–except, somehow, scholarly communication.  Here, instead, we [...]

Has journal commenting failed?

It’s a great idea: take all the insights, suggestions, and criticisms on scholarly articles, the comments shared in journal clubs and scribbled in margins the world over, and make them accessible to everyone. Attach them to the article itself; make it a conversation, not an artifact. We have blog commenting, video commenting–why not article commenting? [...]

MEDLINE literature growth chart

We all know the volume of scientific literature is growing.  I went looking for an infographic showing this, but wasn’t satisfied with what I found, so I made one, based on the publication dates of articles in MEDLINE. I got the data by searching PubMed with the query (“[year]“[Publication Date])where [year] was each year from [...]

Scientometrics 2.0

I’m excited that I’ve had two papers accepted this week: “Scientometrics 2.0: Toward new metrics of scholarly impact on the social Web,” with Brad Hemminger, and “How and why scholars cite on Twitter” (online soon) with Kaitlin Costello. What’s special about these two papers is that they are the start of  a research project that [...]

FeedVis 2.0: custom visualization for your feeds

My FeedVis project–the interactive tagcloud for a group of feeds–has been out for a week now, I’ve been thrilled at the positive response I’ve gotten so far.  One rather glaring problem with the program, though, was that you could only look at the top 50 edublogs. Not anymore.  After a few late nights, I’ve got [...]

FeedVis: a deeper tagcloud for edublogs

Tagclouds have value, but, as I’ve written before, they’ve a number of shortfalls as well.  I’ve just finished my attempt to remedy some of these problems: FeedVis.  It’s an animated tagcloud that lets you compare word frequencies accross different time periods and authors, then check out the posts that used the words.  The demo is [...]

PrezDebatr 2.0! Beta!

Google is transforming the way we watch a political debate.  This Google Blog post demonstrates how viewers of the VP debate earlier this month made Google searches like “clean coal” and “define:maverick” spike as candidates spoke.  Without question, these viewers are experiencing something much richer than what would have been possible fifteen years ago. But [...]

Grad school: because your uncle at Lehman Bros. is not such a great connection now.

A nice bit of infoVis from the web comic Piled Higher and Deeper.  Kind of not the best news for someone who’s applying to doctoral programs this fall…um, can my app go in a special pile for people who’ve been planning this for years, regardless of what the economy would’ve done?

The trouble with tagclouds

Tag clouds, those darlings of early web 2.0, have been seeing something of a backlash lately. Zeldman was suggesting that tag clouds were the new mullets back in 2005; more lately, ReadWriteWeb wondered if tagclouds were dead altogether. The main complaint in both cases wasn’t that tag clouds were just no good, but that they’d [...]