So, I’ve been thinking a lot lately about the Semantic Web (and largely about how it was predicted that it would take ten years to build, which means that it should be coming around any minute now…).
Anyway, a recent post by Tim Bray reminded me of one of the small things I’ve thought about in the past. He participated in a Round Table discussing XBRL (Extensible Business Reporting Language), “an XML dialect designed for encoding companies’ financial statements.” He writes:
I imagine a future in which you can go to xbrl.company-name.com and be pretty sure of finding authoritative machine-readable financial data. And in this picture, Metcalfe’s Law applies in more than one way: not only does the value of the financial data increase as a strong function of how many companies are providing it, but the pressure to join in does too, on those companies who aren’t providing it.
What if instead you assigned a TLD to each of the major exchanges, and then established a practice whereby part of being listed on an exchange was having a domain granted in that TLD. Then the companies which were listed would have a predictable place to publish their XBRL data, (say, at http://aapl.nyse/xbrl).
I guess there’s no reason you couldn’t do the same thing with “aapl.nyse.com” or “aapl.symbols.nyse.com”, delegating those DNS zones to the listed company. For some reason, the new TLD seems kind of interesting to me. Maybe it’s because the construction of the domain name is more straightforward than “aapl.symbols.nyse.com”.
(Edit: whoops, Apple is listed on the NASDAQ, not the NYSE. This points out a problem in “guessability” to which Bray’s original solution isn’t victim — but it seems to me that “guessability” isn’t the most important reason for developing a “custom” around publishing this data. Anyone who is prepared to consume XBRL data is probably prepared to (and might even prefer) a more methodical way of finding XBRL data sources.)
But I’m definitely down with the general idea; you could think of similar simple statistical type data which could be valuably published in a consistent structured data format — one example that has come to my mind recently is basic profile information for universities, published at something like northwestern.edu/profile (as far as I know, no one has bothered to define semantics for University profiles yet.)
What are some other kinds of data that would lend themselves to this model? It is kind of dependent on a balance of factors: if there is too much effort involved in collecting it, then it won’t be freely offered in a highly structured format (you won’t find Zagat giving away XML of their restaurant review database) but if it is fairly objective and there is competitive pressure upon the data owners to make sure their own data is accurately and visibly published, then it could happen.