So earlier I had noted how my mention of O’Reilly’s “Hacks” books played into the bigger question of whether in my writing I specifically wanted to link mentions of those books to O’Reilly’s pages or whether I (and you, dear reader, the intended beneficiary of all these musings!) would be better served by marking the references simply with an ISBN number and leaving something “smarter” to decide what to do with those marks.
I happened to view source of one of the O’Reilly pages at the end of my links, and saw some interesting things. Embedded in the HEAD data for that page is a META tag indicating the ISBN: <meta name="isbn" content="0596007795" /> By encoding this metadata, the publishers at oreilly.com are asserting that the URL http://www.oreilly.com/catalog/mindhks/ is about the book identified by ISBN 0596007795. (What we mean by “book” here is slippery, because the ISBN doesn’t identify a single physical copy of a book and it doesn’t identify all publication runs of the “same” book. “Edition” may be the best word for what an ISBN identifies. But the fact that human language is rarely so precise is fodder for many future posts.) In any case, with metadata like this, a tool like Snap Shots could consult its central knowledgebase, determine that the link is about a specific book, and make available its standard suite of book-centric tools, whatever those would turn out to be. Cool. Since Snap Shots are provided by a company whose business is spidering the web, this is actually fairly plausible.
Elsewhere in the HEAD of that document is this: <meta name="author" content="Tom Stafford, Matt Webb" />. Here’s the rub: to what does “author” refer? As humans, particularly after having read the page where this is embedded, we can infer that “author” here means “author(s) of the book Mind Hacks, for which this page is the official page.” But in the W3C examples of why you might want to use a META tag, several of them suggest that an “author” META tag is telling you who wrote the page in which the tag is found. If one were to extend the “book-centric tools” to try to provide “author” support as well, those tools would get tangled up in this scenario.


The HTML 4 spec foresaw this basic question and laid out a specification for defining a scheme attribute for a META tag and a profile attribute for the HEAD tag. As far as I know, neither of these concepts are in common use, but then, in general, packing data in the HEAD of a document is not widely used. Probably the most prominent adoption I can think of is RSS feed autodiscovery — when browsers like Firefox (above right) and Safari (below right) show you a tool in the URL bar so that you can subscribe to the feed for the given site, they are most assuredly using community standard conventions, with no concern for schemes and profiles.
I did just this recently come across a valiant attempt by Microformats guru Tantek Çelik to document some of those issues in the form of XHTML Metadata Profiles. I don’t think people are rushing to use these, since it looks like that page dates back to about 2003. Reading the XMDP description, I can’t shake the vision of the Microformats community as a network of Talmudic scholars scouring the received wisdom of specs like HTML 4 and attempting to codify another layer of understanding!
In any case, with the proper profile and scheme, the “aboutness” of the meta data in a page could be made explicit enough that tools could make a smarter decision (assuming, of course, that the page author makes a smart decision in the first place and specifies schemes and profiles carefully.) I’ll have to look further around XMDP and see if I can find anyone who is actually doing this.