After seeing several posts about Wordle I finally decided to play. The easiest thing for me to play with were my del.icio.us tags. Even these are not entirely representative or, I should say, not accurately representative.
Based on previous (faulty) workflow, it is the case that there are hundreds of posts in Bloglines that I commented on that never made it to del.icio.us, along with hundreds of posts that I didn’t comment on but still wanted/intended to bookmark. Unfortunately, it is not as simple as going back in and doing so as there were about 5000 items marked keep alive when I finally abandoned Bloglines.
So. My “comment” tag should be much larger, and if everything that I meant to tag was then several tags would grow, others would shrink, and some would appear. Hard to say which ones at this point though.
The first image is based on all tags and contains what Wordle considers “common words.” The second has removed the “common words.” Considering “comment” is considered a common word that is unacceptable to me. I have almost 600 items tagged with “comment” in del.icio.us and, as I said, it ought to be way more.
Mark’s del.icio.us tags with common words
Originally uploaded by broken thoughts
Mark’s del.icio.us tags without common words
Originally uploaded by broken thoughts
Hopefully the “comment” tag gives some idea of the lengths I go to to have discussions on blogs, to the limit possible by the medium, anyway. Also, it may provide some hint as to why I did not play along with the 30-day comment challenge. While I do believe that it is good to step back and question why and how you do something, I thought 30 days of such was a bit of overkill. And based on some of the things I saw Greg, Meredith and others addressing I was right.
After playing with Wordle a bit I realized I could dump the text of some of my papers in it. The first several times I just got Java errors but it eventually worked.
The first one is from my paper for LIS590TR, “Mapping Thesauri for Interdisciplinary Work,” minus the bibliography.
Mapping Thesauri for Interdisciplinary Work
Originally uploaded by broken thoughts
I really like how “vocabulary” sits at the far left, sort of as a top term.
The next two are from my bibliography, “The Epilogue that Started It All; or, Integrating LIS (Harris and Hjørland).” I included 2 to demonstrate that Wordle seems to be treating capitalized and uncapitalized occurrences of the same word as different words, e.g. look for “Information” and “information” symmetrically opposed to each other near the right side, running vertically.
The Epilogue that Started It All
Originally uploaded by broken thoughts
Compare to this picture where all words are lowercase:
The Epilogue that Started It All
Originally uploaded by broken thoughts
Unless I’m blind, “information” does not exist twice in this one. I ran this test multiple times with different fonts and layouts and could not find any duplicates when I used all the same case. Doesn’t seem the algorithm is too bright in this respect.
I am aware that in some cases words which appear in a text as capitalized and as uncapitalized “versions” are, in fact, two (or more) different words, but more frequently they will be the same. Oh well. Can’t complain since it’s free. Actually, I’m not really complaining anyway but I would like to see one with the proper nouns capitalized and all other words in their lowercase instatiations but taking into account all occurrences.
This post has gone on far too long and took way too much time to construct, but it did force me to relearn image inclusion in WordPress. Go. Play. Wordle.