<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jason Priem &#187; fun with data</title>
	<atom:link href="http://jasonpriem.org/category/fun-with-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://jasonpriem.org</link>
	<description></description>
	<lastBuildDate>Wed, 07 Dec 2011 20:43:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.1</generator>
		<item>
		<title>Twitter and the new scholarly ecosystem</title>
		<link>http://jasonpriem.org/2011/11/twitter-and-the-new-scholarly-ecosystem/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=twitter-and-the-new-scholarly-ecosystem</link>
		<comments>http://jasonpriem.org/2011/11/twitter-and-the-new-scholarly-ecosystem/#comments</comments>
		<pubDate>Wed, 30 Nov 2011 19:00:56 +0000</pubDate>
		<dc:creator>jason</dc:creator>
				<category><![CDATA[academia]]></category>
		<category><![CDATA[alt-metrics]]></category>
		<category><![CDATA[fun with data]]></category>
		<category><![CDATA[infovis]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[scholcom]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://jasonpriem.org/?p=657</guid>
		<description><![CDATA[This is a copy of a guest post I wrote for the LSE Impact of Social Sciences blog: In 1990, Tim Berners-Lee created the Web as a tool for scholarly communication at CERN. In the two decades since, his creation has gone on to transform practically every enterprise imaginable&#8211;except, somehow, scholarly communication.  Here, instead, we [...]]]></description>
			<content:encoded><![CDATA[<p><em>This is a copy of a <a href="http://blogs.lse.ac.uk/impactofsocialsciences/2011/11/21/altmetrics-twitter/">guest post</a> I wrote for the LSE Impact of Social Sciences blog:</em></p>
<div>In 1990, Tim Berners-Lee created the Web as a tool for scholarly communication at CERN. In the two decades since, his creation has gone on to transform practically every enterprise imaginable&#8211;except, somehow, scholarly communication.  Here, instead, we lurch ponderously through the time-sanctified dance of dissemination, 17th-century style. The article reigns. Scholars continue to wad the vibrant, diverse results of their creativity and expertise&#8211;figures, datasets, programs, abstracts, annotations, claims, reviews, comments, collections, workflows, discussions, arguments and programs&#8211;into publishers’ slow molds to be cast into articles: static, leaden information ingots.</p>
<p>Growing numbers of scholars, though, are realizing that this approach is no longer the best we can do. We’re <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000204">defrosting our digital libraries</a>, moving over a million personal reference collections online to services like Zotero and Mendeley (and in the process making the open reference list a new kind of publication). Scholars are flocking to scholarly blogs to post ideas, <a href="http://gowers.wordpress.com/2009/02/01/a-combinatorial-approach-to-density-hales-jewett/">collaborate with colleagues</a>, and <a href="http://journal.webscience.org/308/">discuss literature</a>, often creating a sort of <a href="http://cameronneylon.net/blog/p-%E2%89%A0-np-and-the-future-of-peer-review/">peer-review after publication</a>. Emboldened by <a href="http://www.nsf.gov/bfa/dias/policy/dmp.jsp">national mandates </a> and <a href="http://en.wikipedia.org/wiki/Human_Genome_Project">notable successes</a>, we’re beginning to publish reusable datasets as first-class citizens in the scholarly conversation. We’re sharing our software as <a href="http://cameronneylon.net/blog/open-research-computation-an-ordinary-journal-with-extraordinary-aims/">publications</a> and <a href="https://github.com/">on the Web</a>. The journal was the first revolution in scholarly communication; we’re on the brink of a second, driven by the new diversity, speed, and accessibility of the Web.</p>
<p>The poster child for this Scholcomm Spring is Twitter. There’s been terrific interest in scholars using Twitter to <a href="https://docs.google.com/present/view?id=dg7vjb8t_114d8z6ffgg">discuss and cite literature</a>, for <a href="http://academhack.outsidethetext.com/home/2008/twitter-for-academia/">teaching</a>, to <a href="http://journal.webscience.org/314/">enrich conferences</a>, or less formally as a “<a href="http://chronicle.com/article/10-High-Fliers-on-Twitter/16488">global faculty lounge</a>.” We recently finished a large study to get better data on these uses.</p>
<p>Instead of asking for <a href="http://en.wikipedia.org/wiki/Non-response_bias">self-identified</a> scholars on Twitter, we started out with a list of around 9,000 scholars from five US and UK universities, then searched for their names on the Twitter API. After manually confirming all the matches, we downloaded all the tweets each scholar had made and coded the content of these. The graphic below has some details of our findings (click for <a href="http://jasonpriem.org/self-archived/5uni-poster.png">full-size image</a>), but here’s a summary:</p>
<ol>
<li>Twitter adoption is broad-based: scholars from different fields and career stages are taking to Twitter at about the same rate.</li>
<li>Scholars are using Twitter as a scholarly medium, making announcements, linking to articles, even engaging in discussions about methods and literature. But the majority of most scholars’ tweets are personal, underscoring Twitter as a space of <a href="http://nms.sagepub.com/content/13/1/114">context collapse</a>, where users manage multiple identities.</li>
<li>Only about 1 in 40 scholars has an actively-updated Twitter account. This may seem small, but keep in mind that Twitter’s only 5 years old; email was still a scholarly novelty <a href="http://eric.ed.gov/ERICWebPortal/search/detailmini.jsp?_nfpb=true&amp;_&amp;ERICExtSearch_SearchValue_0=EJ399699&amp;ERICExtSearch_SearchType_0=no&amp;accno=EJ399699">15 years after</a> its <a href="http://en.wikipedia.org/wiki/Email#The_rise_of_ARPANET_mail">creation</a>. Taking the <a href="http://scholarlykitchen.sspnet.org/2011/10/13/short-term-thinking-twitter-economics-and-the-change-process/">long view</a>, the current count of scholars using Twitter is probably less important than its continued growth, which we see clearly.</li>
</ol>
<p><a href="http://jasonpriem.org/self-archived/5uni-poster.png"><img class="alignnone" title="Scholars on Twitter infographic" src="http://jasonpriem.org/self-archived/5uni-poster.png" alt="" width="583" height="931" /></a></p>
<p>Results like these are encouraging for those of us who see social media and related environments as the natural next frontier for communicating scholarship. It seems that scholars, without waiting for approval from the mandarins of the publishing industry, are beginning to explore and colonize the Web’s wide-open spaces.</p>
<p>But perhaps the most exciting thing about this nascent scholarly Great Migration is that the new, online tools of scholarship begin to give public substance to the formally ephemeral roots of scholarship: the discussions never transcribed, the annotations never shared, the introductions never acknowledged, the manuscripts saved and reread but never cited. These <a href="http://en.wikipedia.org/wiki/Dramaturgy_%28sociology%29#Back_stage">backstage</a> activities are now increasingly  tagged, cataloged, and archived on blogs, Mendeley, Twitter, and elsewhere.  As more scholars move more of their workflows to the public Web, we are assembling a vast registry of intellectual transactions&#8211;a web of ideas and their uses whose timeliness, speed, and precision make the traditional citation network look primitive.</p>
<p>I’ve been involved in early efforts to understand and use these new data sources to inform alternative metrics of impact, or “<a href="http://altmetrics.org/manifesto">altmetrics</a>.” Altmetrics could be used in evaluating scholars or institutions, complementing unidimensional citation counts with a <a href="http://total-impact.org/report.php?id=MqAnvI">rich array of indicators</a> revealing diverse impacts on multiple populations. They could also inform new, real-time filters for scholars burdened by information overload: imagine a system that gathers and analyzes the bookmarks, pageviews, tweets, and blog posts from your online networks, using your interactions with them to learn and display each day’s most important articles or posts.</p>
<p>Even better, what if every scholar in the world had such a system? We might do away with journals entirely. The Web can disseminate and archive products for nearly free. The slow, back-room machinations of closed peer review could be replaced by an open, accountable, distributed system that simply listens in to expert communities’ natural reactions to new work&#8211;the same way Google efficiently ranks the Web by listening in to the crowdsourced “review” of the hyperlink network.</p>
<p>Of course, this particular vision may not pan out. And although the current signs point toward more growth, scholars might get tired of Twitter. But to hang our hopes on a particular vision or tool is to miss what’s truly revolutionary about this moment. The journal monoculture, long the only viable approach to scholarly communication, is beginning to yield at its fringes to a more diverse, vibrant, online ecosystem of scholarly expression. This new ecosystem promises to change not only the way we express scholarship, but the way we measure, assess, and consume it.</p></div>
]]></content:encoded>
			<wfw:commentRss>http://jasonpriem.org/2011/11/twitter-and-the-new-scholarly-ecosystem/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Has journal commenting failed?</title>
		<link>http://jasonpriem.org/2011/01/has-journal-article-commenting-failed/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=has-journal-article-commenting-failed</link>
		<comments>http://jasonpriem.org/2011/01/has-journal-article-commenting-failed/#comments</comments>
		<pubDate>Fri, 07 Jan 2011 08:00:53 +0000</pubDate>
		<dc:creator>jason</dc:creator>
				<category><![CDATA[academia]]></category>
		<category><![CDATA[alt-metrics]]></category>
		<category><![CDATA[fun with data]]></category>
		<category><![CDATA[infovis]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://jasonpriem.com/?p=452</guid>
		<description><![CDATA[It’s a great idea: take all the insights, suggestions, and criticisms on scholarly articles, the comments shared in journal clubs and scribbled in margins the world over, and make them accessible to everyone. Attach them to the article itself; make it a conversation, not an artifact. We have blog commenting, video commenting&#8211;why not article commenting? [...]]]></description>
			<content:encoded><![CDATA[<p>It’s a great idea: take all the insights, suggestions, and criticisms on scholarly articles, the comments shared in journal clubs and scribbled in margins the world over, and make them accessible to everyone. Attach them to the article itself; make it a conversation, not an artifact. We have blog commenting, video commenting&#8211;why not article commenting?</p>
<p>That’s sounded good to a lot of publishers, and over the last five years, we’ve seen article commenting systems become pretty popular. But there’s a growing sense that article commenting isn’t working.</p>
<h2>The bad</h2>
<p><a href="http://jasonpriem.com/wp-content/uploads/2011/01/total_articles_and_articles_with_comments_by_qtr.png"><img class="size-full wp-image-453 alignright" title="total_articles_and_articles_with_comments_by_qtr" src="http://jasonpriem.com/wp-content/uploads/2011/01/total_articles_and_articles_with_comments_by_qtr.png" alt="" width="427" height="311" /></a><a href="http://dx.doi.org/10.1136/bmj.c3926">Gotzsche et al. (2010)</a> look at author replies to <a href="http://www.bmj.com">BMJ’s</a> “<a href="http://www.bmj.com/letters/">rapid response</a>” comments. We&#8217;d hope the chance to interact with authors would be a big plus for article commenting; however, they found that even when comments could “invalidate research or reduce&#8230; reliability,”  over half the time authors couldn’t be bothered to respond.</p>
<p>In another study,<a href="http://dx.doi.org/10.1016/j.annemergmed.2010.10.008"> Schriger at al.</a> (in press; thanks <a href="http://blog.coturnix.org/">Bora</a>) examine the prevalence of commenting systems in top medical journals.  They report that the percentage of journals offering rapid review has dropped from 12% in 2005 to 8% in 2009, and that fully half the journals sampled had commenting systems laying idle, completely unused by anyone. The authors conclude, “postpublication critique of articles in online pages provided by the journal does not seem to be taking hold.”</p>
<p>Finally, I collected data on <a href="http://www.plos.org/">PLoS</a> comments as part of a larger investigation of <a href="http://altmetrics.org/manifesto/">alt-metrics</a>. As evident from the graphic, the number articles with comments has held more or less steady as the total articles published has grown: again, not a pretty picture for those of us excited about article commenting.</p>
<h2>The good</h2>
<p>I’m not ready to give up on comments yet, though, because I think there’s a different way to see these findings. The question shouldn’t be “have comments failed,” but “are they succeeding somewhere, and why?”  After all, we’re still in the very early stages of this thing; change in scholarly communication so far has happened on a scale of centuries.</p>
<p><a href="http://jasonpriem.com/wp-content/uploads/2011/01/Articles_with_comments_by_journal_and_quarter.png"><img class="alignleft size-full wp-image-454" title="Articles_with_comments_by_journal_and_quarter" src="http://jasonpriem.com/wp-content/uploads/2011/01/Articles_with_comments_by_journal_and_quarter.png" alt="" width="506" height="410" /></a>Active, widespread commenting would be a radical change in how scholars communicate, and as with all fundemental shifts, we can assume most early efforts will be failures. In the 1900s, <a href="http://en.wikipedia.org/wiki/Brass_Era_car">way more automobile manufacturers went broke</a> building lousy cars than flourished making good ones. So in looking at comment ecosystems, we shouldn’t be stuck ogling the crowd of inevitable false starts&#8211;we should be trying to spot the nascent Model T.</p>
<p>And when we do see venues where comments are disproportionately successful, we should be trying to figure out what they’re doing right. While half the sample of the Schriger et al. study are stuck without a single commented article, <a href="http://www.bmj.com">BMJ</a>, <a href="http://www.cmaj.ca/">CMAJ</a>, and <a href="http://www.annals.org/">Ann. Intern. Med.</a> all have comments on 50-76%. How are they different? The BMJ articles sampled by Gotzshe et al. had a mean of 4.9 responses each, which is pretty respectable. Why are these here, but not elsewhere?</p>
<p>In the case of PLoS, we can see that even journals from the same publisher and on the same platform show widely different commenting rates. Is it the editors, the nature of the field, or something else that’s making PLoS Biology’s comment rate climb as PLoS Genetics’ holds steady and PLoS ONE’s drops?  This is a great opportunity for research that will help commenting evolve further.</p>
<h2>The future</h2>
<p>So I think that while we see cases where journal commenting is beginning to succeed, we should continue to put resources behind spreading that success. This said, I have to admit I’m doubtful that publisher-hosted commenting is the future.</p>
<p>Today we have two scholarly communication ecosystems: the formal, peer-reviewed one, and the shadow system encompassing everything from scribbled marginalia, to chats in the lab, to peer reviews themselves. Sooner or later, I believe the shadow ecosystem will migrate to the web; a detailed argument for why is a different post, but there are too many advantages. It’ll happen. The advance guard is already conversing, learning, and collaborating on Zotero, Mendeley, CiteULike, blogs, Twitter, and so on.</p>
<p>Publisher-hosted article commenting is the formal system’s bid to gain a foothold in the informal system as it moves online. And it’s a smart bid, because as the shadow system sheds its ephemerality, it’s going to become increasingly important to how we measure and do scholarship.</p>
<p>But the problem is that journal-based comments are as siloed as the articles they comment on; there’s limited exposure, and no community. Scholars will want to have their conversations with their people, in their ways, in their places.  Today, that mostly means Twitter and blogs (<a href="http://cameronneylon.net/blog/forward-linking-and-keeping-context-in-the-scholarly-literature/">as we saw in #arsenicLife</a>); in the future, it may also be scholar-specific services like <a href="http://thirdreviewer.com/">The Third Reviewer</a>, <a href="http://www.science3point0.com/coaspedia/index.php/Welcome">COASPedia</a>, or <a href="http://www.vivoweb.org/">VIVO</a>.</p>
<p>So while I support article commenting as it now exists, I think challenge of the future won’t be moving the shadow communication system online&#8211;it’ll be aggregating it so it can be consumed, measured, and filtered efficiently and meaningfully. I think alt-metrics will play a part in that, but again, that’s another post :)<br />
&nbsp;<br />
&nbsp;<br />
&nbsp;</p>
<h3><em>References:</em></h3>
<p>Here&#8217;s the <a href="http://jasonpriem.com/share/plos_altmetrics/event_trends.txt">dataset</a> and <a href="http://jasonpriem.com/share/plos_altmetrics/plos_comments_frequency.R">R code</a> for the PLoS graphics; I hope to be releasing the full data next week.</p>
<p>Gotzsche, P. C., Delamothe, T., Godlee, F., &amp; Lundh, A. (2010). Adequacy of authors&#8217; replies to criticism raised in electronic letters to the editor: cohort study. BMJ, 341(aug10 2), c3926-c3926. doi:<a href="http://dx.doi.org/10.1136/bmj.c3926">10.1136/bmj.c3926</a></p>
<p>Schriger, D. L., Chehrazi, A. C., Merchant, R. M., &amp; Altman, D. G. (In press). Use of the Internet by Print Medical Journals in 2003 to 2009: A Longitudinal Observational Study. Annals of Emergency Medicine, In Press, Corrected Proof. doi:<a href="http://dx.doi.org/10.1016/j.annemergmed.2010.10.008">10.1016/j.annemergmed.2010.10.008</a></p>
]]></content:encoded>
			<wfw:commentRss>http://jasonpriem.org/2011/01/has-journal-article-commenting-failed/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>MEDLINE literature growth chart</title>
		<link>http://jasonpriem.org/2010/10/medline-literature-growth-chart/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=medline-literature-growth-chart</link>
		<comments>http://jasonpriem.org/2010/10/medline-literature-growth-chart/#comments</comments>
		<pubDate>Mon, 18 Oct 2010 21:35:43 +0000</pubDate>
		<dc:creator>jason</dc:creator>
				<category><![CDATA[academia]]></category>
		<category><![CDATA[alt-metrics]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[fun with data]]></category>
		<category><![CDATA[infovis]]></category>

		<guid isPermaLink="false">http://jasonpriem.com/?p=406</guid>
		<description><![CDATA[We all know the volume of scientific literature is growing.  I went looking for an infographic showing this, but wasn&#8217;t satisfied with what I found, so I made one, based on the publication dates of articles in MEDLINE. I got the data by searching PubMed with the query ("[year]"[Publication Date])where [year] was each year from [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jasonpriem.com/wp-content/uploads/2010/10/medline-articles-by-year-lg.png"><img class="size-full wp-image-426 alignleft" title="medline-articles-by-year-lg" src="http://jasonpriem.com/wp-content/uploads/2010/10/medline-articles-by-year-lg.png" alt="" width="402" height="379" /></a>We all know the volume of scientific literature is growing.  I went looking for an infographic showing this, but wasn&#8217;t satisfied with what I found, so I made one, based on the publication dates of articles in <a href="http://www.nlm.nih.gov/databases/databases_medline.html">MEDLINE</a>.</p>
<p>I got <a href="http://jasonpriem.com/share/num-medline-articles-published-by-year.txt">the data</a> by searching <a href="http://www.ncbi.nlm.nih.gov/pubmed">PubMed</a> with the query<br />
<code>("[year]"[Publication Date])</code>where [year] was each year from 1950-2009. Then I charted the results in <a href="http://www.r-project.org/">R</a>, and resized them in Photoshop.</p>
<p>The data, R code, and images  are all <a href="http://wiki.creativecommons.org/CC0">CC0</a> (public domain), and can be used wherever and for whatever you fancy.</p>
<p><a href="http://jasonpriem.com/wp-content/uploads/2010/10/medline-articles-by-year-sm.png">small version of graphic</a></p>
<p><a href="http://jasonpriem.com/share/num-medline-articles-published-by-year.txt">num-medline-articles-published-by-year.txt</a></p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;"># setup.</span>
pub <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">read.<span style="">table</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;path_to_data_file&quot;</span>, header<span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span>cex<span style="color: #080;">=</span><span style="color: #ff0000;">2.2</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># controls the relative size of the text</span>
mainTitle <span style="color: #080;">&lt;-</span> <span style="color: #ff0000;">&quot;MEDLINE-indexed articles<span style="color: #000099; font-weight: bold;">\n</span>published per year&quot;</span>
&nbsp;
<span style="color: #228B22;"># make the plot.</span>
<span style="color: #228B22;"># see http://www.harding.edu/fmccown/r/ for a nice intro on plot options</span>
<span style="color: #0000FF; font-weight: bold;">plot</span><span style="color: #080;">&#40;</span>pub, main<span style="color: #080;">=</span>mainTitle, ylab<span style="color: #080;">=</span><span style="color: #ff0000;">''</span>, xlab<span style="color: #080;">=</span><span style="color: #ff0000;">''</span>, type<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;l&quot;</span>, axes<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span>, 
     <span style="color: #0000FF; font-weight: bold;">col</span><span style="color: #080;">=</span><span style="color: #ff0000;">'red'</span>, lwd<span style="color: #080;">=</span><span style="color: #ff0000;">6</span>, ylim<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">0</span>,<span style="color: #ff0000;">1000000</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;"># label the axes</span>
<span style="color: #0000FF; font-weight: bold;">axis</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span>, at<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">seq</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1950</span>, <span style="color: #ff0000;">2010</span>, <span style="color: #ff0000;">10</span><span style="color: #080;">&#41;</span>, lab<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">seq</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1950</span>, <span style="color: #ff0000;">2010</span>, <span style="color: #ff0000;">10</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
labs <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">''</span>,<span style="color: #ff0000;">''</span>,<span style="color: #ff0000;">'200k'</span>,<span style="color: #ff0000;">''</span>,<span style="color: #ff0000;">'400k'</span>,<span style="color: #ff0000;">''</span>,<span style="color: #ff0000;">'600k'</span>,<span style="color: #ff0000;">''</span>,<span style="color: #ff0000;">'800k'</span>,<span style="color: #ff0000;">''</span>,<span style="color: #ff0000;">'1M'</span><span style="color: #080;">&#41;</span><span style="color: #228B22;"># quick and dirty labels...</span>
<span style="color: #0000FF; font-weight: bold;">axis</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">2</span>, at<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">seq</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">0</span>, <span style="color: #ff0000;">1000000</span>, <span style="color: #ff0000;">100000</span><span style="color: #080;">&#41;</span>, lab<span style="color: #080;">=</span>labs, las<span style="color: #080;">=</span><span style="color: #ff0000;">2</span><span style="color: #080;">&#41;</span>  <span style="color: #228B22;"># &quot;las&quot; makes labeles display horiz.</span></pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://jasonpriem.org/2010/10/medline-literature-growth-chart/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Scientometrics 2.0</title>
		<link>http://jasonpriem.org/2010/07/scientometrics-2-0/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=scientometrics-2-0</link>
		<comments>http://jasonpriem.org/2010/07/scientometrics-2-0/#comments</comments>
		<pubDate>Sun, 11 Jul 2010 02:21:54 +0000</pubDate>
		<dc:creator>jason</dc:creator>
				<category><![CDATA[academia]]></category>
		<category><![CDATA[alt-metrics]]></category>
		<category><![CDATA[fun with data]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://jasonpriem.com/?p=336</guid>
		<description><![CDATA[I&#8217;m excited that I&#8217;ve had two papers accepted this week: &#8220;Scientometrics 2.0: Toward new metrics of scholarly impact on the social Web,&#8221; with Brad Hemminger, and &#8220;How and why scholars cite on Twitter&#8221; (online soon) with Kaitlin Costello. What&#8217;s special about these two papers is that they are the start of  a research project that [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m excited that I&#8217;ve had two papers accepted this week: &#8220;<a href="http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2874/2570">Scientometrics 2.0: Toward new metrics of scholarly impact on the social Web</a>,&#8221; with Brad Hemminger, and &#8220;How and why scholars cite on Twitter&#8221;  (online soon) with Kaitlin Costello.</p>
<p>What&#8217;s special about these two papers is that they are the start of  a research project that I hope will become my dissertation, an idea I&#8217;m somewhat reluctantly calling &#8220;scientometrics 2.0.&#8221; (do we really need more 2.0s?) Scientometrics is</p>
<blockquote><p>&#8230;the science of measuring and analysing <a title="Science" href="http://en.wikipedia.org/wiki/Science">science</a>. In practice, scientometrics is often done using <a title="Bibliometrics" href="http://en.wikipedia.org/wiki/Bibliometrics">bibliometrics</a> which is a measurement of the impact of (scientific) publications. (<a href="http://en.wikipedia.org/wiki/Scientometrics">Wikipedia</a>)</p></blockquote>
<p>My idea is that we should be looking beyond this, and starting to mine Web 2.0 sources for signals of scholarly impact. There are a few big advantages to this approach:</p>
<ol>
<li>It&#8217;s much faster.  Once a scholarly article is published, it takes a years for citations to that article to accumulate.  But it can take just days for, say, Diggs or tweets to show up: in our Twitter sample we found that nearly half the links to peer-reviewed articles appeared within a week of those articles&#8217; publication.  This speed could be harnessed to make real-time, personal <a href="http://www.youtube.com/watch?v=LabqeJEOQyI">filters</a> that inform scholars what&#8217;s groundbreaking across a broad set of fields. As the <a href="http://content.nejm.org/cgi/content/full/348/20/2030">velocity</a> and <a href="http://cameronneylon.net/blog/it%E2%80%99s-not-information-overload-nor-is-it-filter-failure-it%E2%80%99s-a-discovery-deficit/">volume</a> of science grow, this could be very valuable.</li>
<li>If I cite something, it probably had an impact in my work.  But what kind of impact?  What if I read it and talked about it, and it informed my general thinking&#8211;but not enough to cite?  Just looking at citations, we&#8217;re <a href="http://scholarlykitchen.sspnet.org/2009/06/29/is-the-impact-factor-from-a-bygone-era/">missing many other kinds of impact</a>.  Ten years ago, this was the best we could do.  But today, scholars are using online tools like <a href="http://www.citeulike.org/">CiteULike</a>, <a href="http://www.mendeley.com/">Mendeley</a>, and <a href="http://www.zotero.org/">Zotero</a> to manage their libraries; <a href="http://f1000.com/">Faculty of 1000</a> to review articles;  and <a href="http://twitter.com">Twitter</a>, <a href="http://friendfeed.com">FriendFeed</a>, and <a href="http://researchblogging.org/">ResearchBlogging.org</a> to discuss them.  Tools like these&#8211;and importantly, the open APIs many of them offer&#8211;allow us to lift the curtain and observe scholars in their native habitat.  Scientometrics 2.0 offers a chance for us to develop a richer, more nuanced picture of scholarly impact.</li>
<li>Finally, this approach allows us to break the centuries-old monopoly of the peer-reviewed article or monograph on scientific communication.  We can measure reactions not just to these articles, but also to blog posts, datasets, or videos.  If a certain blog post in your field is generating lots of buzz, there&#8217;s a good chance it&#8217;s worth your time.  Scientometrics 2.0 can support a sort of informal, &#8220;<a href="http://http://www.academicproductivity.com/2007/soft-peer-review-social-software-and-distributed-scientific-evaluation/">soft peer-review</a>&#8221; that works for free, on everything.</li>
</ol>
<p>At first, this approach will mostly be used for relatively &#8220;pure&#8221; academic study&#8211;learning more about how scholars communicate how impact is transmitted.  Soon, however, young scholars will start making a case to tenure and promotion committees that their heavily tweeted or bookmarked article should count in their favor. Ultimately, I think we&#8217;ll see tools that leverage this information to help direct scholars to the most important and relevant work for them, kind of a <a href="http://www.postrank.com/">PostRank</a> for academics.</p>
<p>Of course, there are some obstacles to this.  The most important one for now is getting people to trust that these alternative sources really mean anything.  Who cares if an article is tweeted a lot?  Won&#8217;t people game this?  What about scholars who don&#8217;t use social media (a majority, for now)?  These questions have answers, but they need to be taken seriously (see the articles for more detailed discussions).</p>
<p>Ultimately, scientometrics 2.0 is going to have to be something we investigate very carefully, and in the proper context.  However, in that context I think it has the potential to be quite valuable, and I&#8221;m excited about working toward this in the next several years.</p>
<p>(Note: for a bunch of relevant citations, see the <a href="http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2874/2570">first article</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://jasonpriem.org/2010/07/scientometrics-2-0/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>FeedVis 2.0: custom visualization for your feeds</title>
		<link>http://jasonpriem.org/2008/12/feedvis-20-custom-visualization-for-your-feeds/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=feedvis-20-custom-visualization-for-your-feeds</link>
		<comments>http://jasonpriem.org/2008/12/feedvis-20-custom-visualization-for-your-feeds/#comments</comments>
		<pubDate>Wed, 03 Dec 2008 20:27:27 +0000</pubDate>
		<dc:creator>jason</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[fun with data]]></category>
		<category><![CDATA[infovis]]></category>

		<guid isPermaLink="false">http://jasonpriem.com/?p=72</guid>
		<description><![CDATA[My FeedVis project&#8211;the interactive tagcloud for a group of feeds&#8211;has been out for a week now, I&#8217;ve been thrilled at the positive response I&#8217;ve gotten so far.  One rather glaring problem with the program, though, was that you could only look at the top 50 edublogs. Not anymore.  After a few late nights, I&#8217;ve got [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_68" class="wp-caption alignleft" style="width: 310px"><a href="http://jasonpriem.com/feedvis"><img class="size-medium wp-image-68" title="feedvis2" src="http://jasonpriem.com/wp-content/uploads/2008/12/feedvis2.jpg" alt="this is what feedvis looks like" width="300" height="256" /></a><p class="wp-caption-text"> </p></div>
<p><a href="http://jasonpriem.com/feedvis">My FeedVis project</a>&#8211;the interactive tagcloud for a group of feeds&#8211;has been out for a week now, I&#8217;ve been thrilled at <a href="http://cogdogblog.com/2008/11/25/feedvis/">the</a> <a href="http://blogoehlert.typepad.com/eclippings/2008/11/a-truckload-of-twitter-tools-and-some-peachy-keen-visualizations.html">positive</a> <a href="http://doug-johnson.squarespace.com/blue-skunk-blog/2008/12/1/on-ranking-awards-and-other-nonsense.html">response</a> I&#8217;ve gotten so far.  One rather glaring problem with the program, though, was that you could only look at the <a href="http://www.dangerouslyirrelevant.org/2008/06/top-50-p-12-edu.html">top 50 edublogs</a>.</p>
<p>Not anymore.  After a few late nights, I&#8217;ve got a beta system for uploading and analyzing your own sets of feeds.  You just upload your opml, wait a few minutes, and you&#8217;re set: FeedVis gives you a custom page that you can bookmark and return to anytime you like; it&#8217;ll continue to update every time you visit.  You can also browse visualizations of other people&#8217;s feeds.</p>
<p>It&#8217;s pretty untested, and I&#8217;m sure use will uncover some bugs.  But it&#8217;s got potential; I&#8217;m excited to see what people think.</p>
]]></content:encoded>
			<wfw:commentRss>http://jasonpriem.org/2008/12/feedvis-20-custom-visualization-for-your-feeds/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>FeedVis: a deeper tagcloud for edublogs</title>
		<link>http://jasonpriem.org/2008/11/feedvis-a-deeper-tagcloud-for-edublogs/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=feedvis-a-deeper-tagcloud-for-edublogs</link>
		<comments>http://jasonpriem.org/2008/11/feedvis-a-deeper-tagcloud-for-edublogs/#comments</comments>
		<pubDate>Tue, 25 Nov 2008 07:53:05 +0000</pubDate>
		<dc:creator>jason</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[fun with data]]></category>
		<category><![CDATA[infovis]]></category>

		<guid isPermaLink="false">http://jasonpriem.com/?p=51</guid>
		<description><![CDATA[Tagclouds have value, but, as I&#8217;ve written before, they&#8217;ve a number of shortfalls as well.  I&#8217;ve just finished my attempt to remedy some of these problems: FeedVis.  It&#8217;s an animated tagcloud that lets you compare word frequencies accross different time periods and authors, then check out the posts that used the words.  The demo is [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://jasonpriem.com/feedvis"><img class="size-full wp-image-54 alignleft" title="feedvis" src="http://jasonpriem.com/wp-content/uploads/2008/11/feedvis.jpg" alt="a screenshoto of feedvis" width="318" height="266" /></a></p>
<p>Tagclouds have value, but, <a href="http://jasonpriem.com/2008/09/the-trouble-with-tagclouds/">as I&#8217;ve written before</a>, they&#8217;ve a number of shortfalls as well.  I&#8217;ve just finished my attempt to remedy some of these problems: <a href="http://jasonpriem.com/feedvis/">FeedVis</a>.  It&#8217;s an animated tagcloud that lets you compare word frequencies accross different time periods and authors, then check out the posts that used the words.  The demo is using the feeds for Scott McLeod&#8217;s Technorati-compiled list of <a href="http://www.dangerouslyirrelevant.org/2008/06/top-50-p-12-edu.html">top 50 edublogs</a>, since that&#8217;s what got me started about feeds and tagclouds in the first place (although the program will work with any set of feeds).  More details about how it works are on the <a href="http://jasonpriem.com/feedvis/">demo page</a>.</p>
<p>I think what I&#8217;m really most excited about is the way this uses animation to let you actually see the words changing from one sample to the next.    Motion is such an important part of the way we see the world, and it&#8217;s been underemployed in information visualization, I think (although this changing; <a href="http://www.ted.com/index.php/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html">Hans Rosling&#8217;s TED talks</a> have gotten a lot of buzz, for instance).</p>
<p>The project has been really fun, and a great learning experience; it&#8217;s gotten me really pumped about <a href="http://en.wikipedia.org/wiki/Infovis">inofVis</a> for learning about online interaction.  I think there is a lot of potential there for ed tech research.  I&#8217;m also pretty excited about programming; I started learning in February (with php), and then started javascript a couple months ago.  It&#8217;s been a really mind-expanding experience, and I&#8217;m looking foward to my next project, probably once I get done with grad school apps.</p>
]]></content:encoded>
			<wfw:commentRss>http://jasonpriem.org/2008/11/feedvis-a-deeper-tagcloud-for-edublogs/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>PrezDebatr 2.0!  Beta!</title>
		<link>http://jasonpriem.org/2008/10/prezdebatr-20-beta/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=prezdebatr-20-beta</link>
		<comments>http://jasonpriem.org/2008/10/prezdebatr-20-beta/#comments</comments>
		<pubDate>Mon, 13 Oct 2008 17:11:05 +0000</pubDate>
		<dc:creator>jason</dc:creator>
				<category><![CDATA[fun with data]]></category>
		<category><![CDATA[infovis]]></category>

		<guid isPermaLink="false">http://jasonpriem.com/?p=44</guid>
		<description><![CDATA[Google is transforming the way we watch a political debate.  This Google Blog post demonstrates how viewers of the VP debate earlier this month made Google searches like &#8220;clean coal&#8221; and &#8220;define:maverick&#8221; spike as candidates spoke.  Without question, these viewers are experiencing something much richer than what would have been possible fifteen years ago. But [...]]]></description>
			<content:encoded><![CDATA[<p>Google is transforming the way we watch a political debate.  <a href="http://googleblog.blogspot.com/2008/10/vp-debate-candidates-questions-and.html">This Google Blog post</a> demonstrates how viewers of the <a href="http://en.wikipedia.org/wiki/United_States_vice-presidential_debate,_2008">VP debate</a> earlier this month made Google searches like &#8220;<a href="http://www.google.com/search?hl=en&amp;q=clean+coal&amp;btnG=Search">clean coal</a>&#8221; and &#8220;<a href="http://www.google.com/search?hl=en&amp;q=define%3Amaverick&amp;btnG=Search">define:maverick</a>&#8221; spike as candidates spoke.  Without question, these viewers are experiencing something much richer than what would have been possible fifteen years ago.</p>
<p>But why stop there?  Why not a service that analyzes this kind of real-time, viewer-supplied data, selects the most interesting bits, and then displays it?  It would function both as a real-time fact-checker and a window into audience&#8217;s reactions.</p>
<p>Lots of people already live-blog these things; it would be easy to get several thousand people to submit their questions and search results to a server, using a standardized interface.  The software then just aggregates, organizes, and presents the results.  Volunteers who try to game the system would be shut out with Digg-style, community-driven user ratings.  If Google would make its real-time query data available, that&#8217;d be added, too, significantly broadening the sample&#8217;s relevance.</p>
<p><span id="more-44"></span></p>
<p>The exciting part comes when this user-created input&#8212;both information about facts and people&#8217;s more general reaction to what they&#8217;re hearing&#8212;is presented to the the debate audience and the debaters themselves in real time (via a big display in the venue, for instance.)  For one thing,  the audience and debaters would immediately know of factual errors or half-truths, and have easy access to cited sources.  This would work for relatively picky things, like the pronunciation of someone&#8217;s name, but also for more substantive problems in responses; imagine how much answers would improve if the debaters knew they might have to stand next to an 8-foot-high rendering of, &#8220;40 million Americans think you just dodged the question.&#8221;</p>
<p>But also, the tool could act in more positive ways.  The candidates would immediately know the reactions of a national audience had about what they say&#8211;what the audience was interested in, confused about, or skeptical about.  If 80% of people want to hear more about the differences in candidates&#8217; economic plan, they probably will.  You would have a truly participatory, interactive town-hall meeting of sixty million people.   Techniques like this are already beginning to surface in education, with tools like classroom <a href="http://en.wikipedia.org/wiki/Audience_response">clickers </a>and &#8220;<a href="http://connect.educause.edu/Library/ELI/7ThingsYouShouldKnowAbout/39391?time=1223916435">google jockying</a>.&#8221;  Could they raise the level and relevancy of national politics?</p>
<p>Note: oops, accidentally published this sans links.  fixed now.</p>
]]></content:encoded>
			<wfw:commentRss>http://jasonpriem.org/2008/10/prezdebatr-20-beta/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Grad school: because your uncle at Lehman Bros. is not such a great connection now.</title>
		<link>http://jasonpriem.org/2008/10/grad-school-because-your-uncle-at-lehman-bros-is-not-such-a-great-connection-now/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=grad-school-because-your-uncle-at-lehman-bros-is-not-such-a-great-connection-now</link>
		<comments>http://jasonpriem.org/2008/10/grad-school-because-your-uncle-at-lehman-bros-is-not-such-a-great-connection-now/#comments</comments>
		<pubDate>Tue, 07 Oct 2008 16:50:24 +0000</pubDate>
		<dc:creator>jason</dc:creator>
				<category><![CDATA[academia]]></category>
		<category><![CDATA[fun with data]]></category>
		<category><![CDATA[infovis]]></category>

		<guid isPermaLink="false">http://jasonpriem.com/?p=38</guid>
		<description><![CDATA[A nice bit of infoVis from the web comic Piled Higher and Deeper.  Kind of not the best news for someone who&#8217;s applying to doctoral programs this fall&#8230;um, can my app go in a special pile for people who&#8217;ve been planning this for years, regardless of what the economy would&#8217;ve done?]]></description>
			<content:encoded><![CDATA[<p>A nice bit of <a href="http://en.wikipedia.org/wiki/Infovis">infoVis</a> from the web comic <a href="http://www.phdcomics.com/comics.php?f=1078">Piled Higher and Deeper</a>.  Kind of not the best news for someone who&#8217;s applying to doctoral programs this fall&#8230;um, can my app go in a special pile for people who&#8217;ve been planning this for years, regardless of what the economy would&#8217;ve done?</p>
<p><a href="http://www.phdcomics.com/comics/archive/phd100108s.gif"><br />
</a></p>
]]></content:encoded>
			<wfw:commentRss>http://jasonpriem.org/2008/10/grad-school-because-your-uncle-at-lehman-bros-is-not-such-a-great-connection-now/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The trouble with tagclouds</title>
		<link>http://jasonpriem.org/2008/09/the-trouble-with-tagclouds/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the-trouble-with-tagclouds</link>
		<comments>http://jasonpriem.org/2008/09/the-trouble-with-tagclouds/#comments</comments>
		<pubDate>Tue, 16 Sep 2008 02:33:43 +0000</pubDate>
		<dc:creator>jason</dc:creator>
				<category><![CDATA[fun with data]]></category>
		<category><![CDATA[infovis]]></category>

		<guid isPermaLink="false">http://jasonpriem.com/?p=33</guid>
		<description><![CDATA[Tag clouds, those darlings of early web 2.0, have been seeing something of a backlash lately. Zeldman was suggesting that tag clouds were the new mullets back in 2005; more lately, ReadWriteWeb wondered if tagclouds were dead altogether. The main complaint in both cases wasn&#8217;t that tag clouds were just no good, but that they&#8217;d [...]]]></description>
			<content:encoded><![CDATA[<p><img class="size-medium wp-image-35 alignleft" title="A tagcloud of this very post.  How meta." src="http://jasonpriem.com/wp-content/uploads/2008/09/tagcloud3-300x188.png" alt="" width="300" height="188" />Tag clouds, those darlings of early web 2.0, have been seeing something of a backlash lately. <a href="http://www.zeldman.com/">Zeldman</a> was <a href="http://www.zeldman.com/daily/0405d.shtml">suggesting</a> that tag clouds were the new <a href="http://en.wikipedia.org/wiki/Mullet_%28haircut%29">mullets</a> back in 2005; more lately, ReadWriteWeb wondered if tagclouds were <a href="http://www.readwriteweb.com/archives/tag_clouds_rip.php">dead altogether.</a> The main complaint in both cases wasn&#8217;t that tag clouds were just no good, but that they&#8217;d become trendy and thus overused.  Later criticism has argued that the increasingly common practice of using tag clouds for navigation is <a href="http://www.zeldman.com/daily/0505a.shtml">fundamentally flawed</a>.</p>
<p>But the problems of tag clouds&#8211;and their close cousin, <a href="http://www.joelamantia.com/blog/archives/tag_clouds/text_clouds_a_new_form_of_tag_cloud.html">word clouds</a>&#8211;go deeper, to their usefulness as a visualization method.  These aren&#8217;t problems with how the method is used or misused, but with the idea itself.</p>
<p><a href="http://well-formed-data.net/archives/42/tag-maps-update">Moritz Stefaner</a> points out (and presents his own solution for) several problems with the format:</p>
<ul>
<li>tag clouds give a great picture of the &#8220;big head&#8221; of tags: the most frequently used tags that change little over time; they overlook, though, the &#8220;long tail&#8221;&#8211;where many of the interesting tags are located.</li>
<li>tag clouds don&#8217;t show change over time.  Chirag Mehta has created a tag cloud with a time slider, which helps with this.  But as Stefaner points out, animating tag clouds doesn&#8217;t work very well, as the changing size of the cloud moves the words around so they&#8217;re hard to follow.</li>
<li>Finally, tag clouds don&#8217;t show the relationships between tags (pretty much everyone who criticizes tag clouds mentions this one).</li>
</ul>
<p>The IBM <a href="http://services.alphaworks.ibm.com/manyeyes/page/Tag_Cloud.html">Many Eyes</a> site has one of the best tag cloud (actually this does word clouds, too) tools I&#8217;ve seen, allowing users to get lots of data from each tag while keeping the interface clean and simple.  They make a great point about an inherent limitation of the tool: the size and shape of the words themselves isn&#8217;t controlled for.  So, long words seem more dominant than short ones, and words with lots of ascenders and descenders (the vertical strokes of letters like &#8216;b&#8217; or &#8216;p&#8217;) tend to dominate as well.  This can subtly alter the overall gist that tag clouds are supposed to deliver.</p>
<p>The academic community has noted shortcomings of the technique, as well. <a href="http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4438863">Hearst and Rosner (2008)</a> observe that the alphabetical layout of the cloud may lead to a sort of &#8220;false clustering&#8221; effect, as users misinterpret words because of surrounding tags.  <a href="http://portal.acm.org/citation.cfm?id=1240624.1240775">Renninger and Shumar (2007)</a> found that tag cloud quadrants have different rates of recall, a fact which most tag cloud designs ignore.  In fact, their findings suggest that a simple list of tags, ordered by frequency, may deliver a more accurate overall impression than a tag cloud.  Several researchers have sought to improve shortcomings in tag cloud presentation with packing and sorting algorithms that manage whitespace and cluster relevant concepts (<a href="http://arxiv.org/abs/cs.DS/0703109">Kaser and Lemire, 2007</a>; <a href="http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&amp;toc=comp/proceedings/iv/2008/3268/00/3268toc.xml&amp;DOI=10.1109/IV.2008.89">Seifert, Kump, Kienreich, Granitzer, and Granitzer, 2008</a>).</p>
<p>Now, this isn&#8217;t to say that tag clouds have no value; in fact, I think they have great potential.  It&#8217;s just that we need to know when tag clouds and word clouds are appropriate, know their shortcomings, and (this is the fun part) try to find ways to make them better. Most of the sources cited above have set about doing just that. In my next post, I&#8217;ll discuss a few of these &#8220;next-generation tag cloud&#8221; concepts; in particular, I&#8217;ll be examining methods of using word clouds to compare different versions of a text.</p>
]]></content:encoded>
			<wfw:commentRss>http://jasonpriem.org/2008/09/the-trouble-with-tagclouds/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>79% of oft-cited statistics are total garbage</title>
		<link>http://jasonpriem.org/2008/07/79-of-oft-cited-statistics-are-total-garbage/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=79-of-oft-cited-statistics-are-total-garbage</link>
		<comments>http://jasonpriem.org/2008/07/79-of-oft-cited-statistics-are-total-garbage/#comments</comments>
		<pubDate>Thu, 17 Jul 2008 03:43:03 +0000</pubDate>
		<dc:creator>jason</dc:creator>
				<category><![CDATA[fun with data]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://jasonpriem.com/?p=19</guid>
		<description><![CDATA[You know, we learn we remember 10% of what we read, 20% percent of what we hear, but 80% of what we actually experience.  Or, wait, maybe it&#8217;s 20%.  Or 30? Of course, as many people know, this delightful little statistic has no backing in any sort of serious research&#8212;nor, indeed, could it: &#8230;As Dwyer [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft alignnone size-full wp-image-23" style="float: left;" title="ghits-for-bogus-stat3" src="http://jasonpriem.com/wp-content/uploads/2008/07/ghits-for-bogus-stat3.png" alt="" width="498" height="318" /></p>
<p>You know, we learn <span style="font-family: Calibri;"><span style="font-family: 'Arial','sans-serif';">we remember 10% of what we read, 20% percent of what we hear, but 80% of what we actually experience.  Or, wait, maybe it&#8217;s 20%.  Or 30?</span></span></p>
<p>Of course, as many people know, this delightful little statistic has no backing in any sort of serious research&#8212;nor, indeed, could it:</p>
<blockquote><p>&#8230;As Dwyer points out, the reported percentages are impossible to interpret or verify without specifying at least the method of measurement, the age of the learners, the type of learning task, and the content being remembered (p. 10).  Despite the lack of credibility, this formulation is widely quoted, usually without attribution, and in recent years has become repeatedly conflated with Dale’s Cone, with the percentage statements superimposed on the cone, replacing or supplementing Dale’s original categories.</p>
<p class="MsoNormal">from <span style="font-family: Arial,Helvetica,New Font;"><span style="font-size: x-small;"><strong><span style="font-size: 10pt; font-family: Arial;"><a style="color: blue; text-decoration: underline;" href="http://www.indiana.edu/%7Emolpage/Cone%20of%20Experience_text.pdf"><span style="color: #000000;">Cone of Experience</span></a><em> (PDF),</em></span></strong><span style="font-size: 10pt; font-family: Arial;"> entry in A. Kovalchick &amp; K. Dawson, Ed&#8217;s, <em>Educational Technology: An  Encyclopedia. </em>Santa Barbara, CA: ABC-Clio, 2003. </span></span></span></p>
</blockquote>
<p class="MsoNormal"><a href="http://www.visualbeing.com/2005/07/08/forget-what-youve-heard-about-remembering/">Several</a> <a href="http://www.willatworklearning.com/2006/05/people_remember.html">bloggers</a> <a href="http://edutechy.com/?p=5">have</a> likewise been struck by the curious disconnect between the popularity of this statistic and its relation to reality.  Despite its readily apparent dodginess (We remember 90% of what we experience?  So I perfectly remember everything I did for nine out of the last ten years?), people love quoting this thing.</p>
<p>So quote they do.  And, since there&#8217;s no actual citation for this thing, the meme is free to mutate, which is actually kind of fascinating; the plot above shows the pattern in <a href="http://itre.cis.upenn.edu/~myl/languagelog/archives/000954.html">ghits </a>for different versions of this same &#8216;principle.&#8217;</p>
<p>But why?  Obviously, the meme lives because it has value to people; in this case,  it helps folks prove a point about better ways of teaching.  But that&#8217;s not really an answer; there&#8217;s no reverse version of this for people arguing the opposite side.  No, the real answer is this: the statistic lives because it demonstrates something that the speaker and the listener <em>both already agree on</em>.  Few people are going to call you on this statistic, because everyone knows that the gist is true in many situations; you probably will learn something better if you involve it in some kind of experience than if you just read about it and move on.</p>
<p>The New York Times did a great story some years ago on related idea, called <a href="http://query.nytimes.com/gst/fullpage.html?res=9B06E1DB1E3BF935A35751C1A96E958260&amp;n=Top/News/Science/Topics/Science%20and%20Technology">Scientific Myths That Are Too Good to Die</a>.  It documented how well-known experiments could become sort of &#8220;academic urban myths.&#8221;  Take, for instance, the experiment that lent it&#8217;s name to the oft-cited &#8220;<a href="http://en.wikipedia.org/wiki/Hawthorne_effect">Hawthorne Effect</a>&#8221; (in which the participants&#8217; mere knowledge that they&#8217;re part of an experiment skews results):</p>
<blockquote><p>&#8221;The results of this experiment, or rather the human relations interpretation offered by the researchers who summarized the results, soon became gospel for introductory textbooks in both psychology and management science,&#8221; said Dr. Lee Ross, a psychology professor at Stanford University.</p>
<p>But only five workers took part in the study, Dr. Ross said, and two were replaced partway through for gross insubordination and low output.</p>
<p>A psychology professor at the University of Michigan, Dr. Richard Nisbett calls the Hawthorne effect &#8221;a glorified anecdote.&#8221;</p></blockquote>
<p>These &#8220;glorified anecdotes&#8221; (and glorified ballpark guesses, which is really what the percentage-retention statistic is) hang on, though, because, in Dr. Ross&#8217; words again, &#8220;&#8216;sometimes a story deserves to be true.&#8221;  That is, the story or number itself may be wrong, but it may be a way to access a point that deserves our attention.</p>
<p>So, then, is a bad statistic in a good cause worthwhile?  What if my &#8220;90% retention&#8221; number gets that grumpy admin to allow my pet wiki project?  Is it worth it?  I say no, for reasons that lie outside the scope of this post (maybe next one?).  Any other opinions, though?</p>
]]></content:encoded>
			<wfw:commentRss>http://jasonpriem.org/2008/07/79-of-oft-cited-statistics-are-total-garbage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

