More on What Links Mean

So my previous post got a bit into the idea of what we mean when we make links in text. This is something I’ve thought about a lot, although I’m sure there are people who’ve thought about it a lot more.
I do think this example I just encountered, in the Wikipedia page on Yeats’ The Second Coming, illustrates a dead end:


Every single word in the sentence is a different link! (Well, “Things fall apart” is a single link to a page which does not exist in Wikipedia yet.) I am not sure what value is added by linking works like “the” and “that” to their Wikipedia entries. (Upon closer inspection, it looks like those changes were just made in an edit yesterday so perhaps that is not something that will last long.
In the meantime, still looking for the current ideas about how authors could mark up their text with specific identities. It seems that Google is counting on their ability to infer will be good enough for a while to come, especially assuming that most content will be published without anyone taking the time to apply semantic markup. However, offers Snap Shot Markup Language (PDF) to apply their Snap Shots explicitly, without requiring a link. This is OK, but it is lacking the one thing that I’d been thinking about already — it only takes its meaning from the element content. For example, if I wanted to refer to the company formerly known as Apple Computer, I could use this markup:

<span class="Snap_Shot_Stock">AAPL</span>

This is OK, but it makes for awkward syntax. If I wanted to use their name (Apple) or a pronoun or such, there is no way to indicate that. It seems simple enough to move the identifying value into an attribute (at least as an option).

And here’s where it gets stickier. It seems like such a simple concept, but there are two hitches: (1) which attribute? and (2) what is the appropriate attribute value?
In the Microformats world, they would scour (and have scoured) the HTML 4 specifications for some already canonized markup which is appropriate, or close enough such that it could be endorsed for use in answer to (1). So far, though, I haven’t found any sign that they’ve nailed it. I respect the idea, since there’s a lot to be said for writing markup which can be tested for conformance to a well known specification. But we are approaching the tenth anniversary of the maturation of HTML 4 to a full W3C recommendation. (18 December 1997) The intervening revisions since then have been pretty small. Work on HTML5 is proceeding, and perhaps that’s the place to focus, but I’m still getting the lay of the land.
Anyway, to answer (1), you might be able to contort <META> tags to fit, but all examples I’ve ever seen for those suggest document-level values. That’s not the right way to specify the antecedent for a specific pronoun in the middle of a document. The only thing I can think of so far is good ol’ <a href=”…”> but since all browsers these days treat the href as something you should be able to click on to “activate,” that presupposes either better browser support or a bunch of javascripty overhead that doesn’t degrade well for people who don’t have it installed yet.
For (2), how to represent an identity, one thing is sure: even if URN sounds like the answer, the strict control over assigning namespaces is surely fatal in the internet world. (Only 31 have been allocated to date and most of them are esoteric, like the namespace for the New Zealand Government (RFC 4350) or the one for the Aerospace and Defence Industries Association of Europe (RFC 4688)).
There is this W3C working draft for CURIE Syntax 1.0, a “syntax for expressing URIs in a generic, abbreviated syntax.” (Note to editor: clean up that duplication of “syntax” before moving from working draft to published!) I haven’t yet really digested that document, so I can’t say much more for now, but it appears to be coming at some of the same things I have in mind.
…to be continued…