A little more openness is more than we can ever hope for, however
Many commentators give the impression that Google has some sort of magic AI that can read a document, determine it's context and place it in a pile, with other, similar, examples. In fact, they achieve a lot of this by analysing meta-data explicitly embedded around the content, by web developers, who deliberately write the content, or the XML-feeds that drive it, in this way, for exactly these sorts of reasons. In this case, we know, it's microformat data, that they're using; but we don't know precisely how they're using it. We know that semantic web design, where the document embeds data that describes its content - we know this is a Good Thing, for many reasons: Google pagerank ebing one of them, but we don't know exactly how any of it helps.
Google exists in a love-hate relationship with this process, wanting it to develope without explicitly saying, for instance, "our bot looks for divs with ids that contan the word 'address', when looking for address data, on a page" or "our algorithms seek out definition lists and instances of the abbreviation tag". They don't explain the exact process, since it would subvert their advertising revenue: page ranking would become a matter of good design, rather than dollars. It would also make it easier for bad guys to game the system, by pushing their phishing websites up the page ranking to lure the unwary - as many of them already do, by querying Google's own 'trends' system, to automatically find out what topics are 'hot', and gearing their phishing pages accordingly.
And so we end up in a world where The Search Engine appears to magically work things out for itself, while web developers frantically try to reverse engineer the process, and determine what it is, they're doing right, to keep the Googlemonster happy ("Hey, we're running kinda low on virgins... has anyone tried feeding the dragon some fat schoolboys, recently?").
Strangest of all, though, is that fact that, if you look at the Google cache for this page, you'll see that the Googlebot had already read, ranked, rated, and stacked, the contents this article, twenty three minutes after Sylvie pressed 'Publish'.
looks like sylvie is doing a new thing at l'inq: double meaning-ed expressions and phrases; doggy style ... et al.
Many commentators give the impression that Google has some sort of magic AI that can read a document, determine it's context and place it in a pile, with other, similar, examples. In fact, they achieve a lot of this by analysing meta-data explicitly embedded around the content, by web developers, who deliberately write the content, or the XML-feeds that drive it, in this way, for exactly these sorts of reasons. In this case, we know, it's microformat data, that they're using; but we don't know precisely how they're using it. We know that semantic web design, where the document embeds data that describes its content - we know this is a Good Thing, for many reasons: Google pagerank ebing one of them, but we don't know exactly how any of it helps.
Google exists in a love-hate relationship with this process, wanting it to develope without explicitly saying, for instance, "our bot looks for divs with ids that contan the word 'address', when looking for address data, on a page" or "our algorithms seek out definition lists and instances of the abbreviation tag". They don't explain the exact process, since it would subvert their advertising revenue: page ranking would become a matter of good design, rather than dollars. It would also make it easier for bad guys to game the system, by pushing their phishing websites up the page ranking to lure the unwary - as many of them already do, by querying Google's own 'trends' system, to automatically find out what topics are 'hot', and gearing their phishing pages accordingly.
And so we end up in a world where The Search Engine appears to magically work things out for itself, while web developers frantically try to reverse engineer the process, and determine what it is, they're doing right, to keep the Googlemonster happy ("Hey, we're running kinda low on virgins... has anyone tried feeding the dragon some fat schoolboys, recently?").
Strangest of all, though, is that fact that, if you look at the Google cache for this page, you'll see that the Googlebot had already read, ranked, rated, and stacked, the contents this article, twenty three minutes after Sylvie pressed 'Publish'.
google is having the touch. the rest are backpackers with heavy load.