Kevin of Kevin’s Worklog has a thoughtful post on Google’s page ranking algorithm and relevance. I just made a comment asking for a little clarification of his thoughts. [Awaiting moderation]
Go read it [Kevin's post, not necessarily my comment], and the Buzzle article that he links to. Seems Google’s patent application has shed some light on their normally fairly secretive algorithm.
What do you think? I already have several issues with Google’s algorithms in the context of relevance. I can’t say that this is helping me resolve those. Just increases them, actually. I can see where it will help with spamming of the search process, and how it can help on occasion with actual relevant search results, but I don’t see how it will generally help.
And to quote Kevin quoting someone quoting a Google rep:
once you had tons of data it was amazing the types of things you could do. The same algorithms that wouldn’t return good results with smaller sets worked much better when the data set was massive. It seems once you get past a certain point, you get a new perspective.
Sounds a lot like Total Information Awareness to me. Maybe it is true in a algorithmical sense, and possibly even useful in some situations, but most of those with this much data and computing power at hand scare me with their ‘new perspectives.’ "Do No Evil" my rear.
I’m hoping for a response from Kevin, because he’s clearly smarter about these things than me, or at least more educated, based on his blog, but what do you think?
And since I remember some serious issues I had with the concept of ‘relevance:’
“Pertinence” and “relevance” are two terms that have been used in the
literature of information science to express a relationship between
some document and: 1) some request for information; 2) some need for
information; or 3) some individual who requests or needs information.
Thus, it might be said that a particular document is relevant (or
pertinent) to a particular request, to a particular information need,
or to a particular individual who requests information on a particular
subject. The relationship implied by these terms is one that is
extremely important to the evaluation of information services.
Unfortunately, the two terms have been used rather loosely in the
literature and a considerable amount of controversy seems to exist on
what the two terms actually mean and whether or not “relevance” is in
fact relevant to the evaluation of information services.
From the introduction to "Pertinence and Relevance" by Lancaster and Gale, in Encyclopedia of Library and Information Science, 2003, (p. 2307 in print) accessed online 18 June 2005. [Emphasis mine.]
Although other terminology could conceivably be used, we propose to adopt the term relevance to indicate a relationship existing between a document and a request statement in the eyes of a particular judge. It would be wrong to assume that relevance represents a precise, invariant relationship; it does not. In fact, rather than saying that a document is relevant to a request, it would be better to say that the document has been judged relevant to the request by a particular individual or group of individuals.
From "Pertinence and Relevance" by Lancaster and Gale, in Encyclopedia of Library and Information Science, 2003, (p. 2310 in print) accessed online 18 June 2005.
Ah, yes. Relevance is a subjective evaluation of the searcher; not something that any other person, and especially not a machine or algorithm, can decide for the searcher. Google’s rankings may be, and in fact are, relevant to Google’s algorithm, but they quite possibly, and often, are not in any way relevant to me or other searchers.
Which leads me back to my comment on Kevin’s post, if I were looking for his personal journal I would now have some additional information that is ‘relevant’ in a sense, but I would still judge Google’s #1 result as not relevant to me as I still don’t know where to find The Bruised Edge.
For the sake of experimentation I have looked a little more carefully and particularly at a link that did not look that promising at first, AND by doing a search on that page I have now found what I ‘was looking for.’ I wasn’t really looking for it, and the ‘secret’ is safe with me Kevin, but it is the example Kevin used in his post. And honestly, it wasn’t that difficult, but if we are to believe the studies of search practices searchers don’t usually work this hard at finding what they want. As another critique, this same page includes a link to the same post on Kevin’s Worklog, but it fails since that post isn’t on that blog. Hmmmm?
I just simply fail to see how Google’s brand of relevance is very relevant in this case. Thoughts anyone?