On Saturday 17 February 2007 21:04, Jay R. Ashworth wrote:
On Fri, Feb 16, 2007 at 10:18:14AM +1100, Steve Bennett wrote:
On 2/16/07, Jay R. Ashworth jra@baylink.com wrote:
There was a fairly extensive discussion on this list a couple weeks ago -- which I thought you had participated in -- about microformats (I think the buzzword was) which amounted to "custom-made attributed to HTML tags which would be usable for semantic extraction, but ignored by browsers".
I'm sure I've never heard that term before, but fortunately [[Microformats]] got me up to speed. (Is there anything Wikipedia can't do :))
This topic, and the approach you and I forsee, is sort of an offshoot thereof...
I guess. What would the syntax look like to the user?
Well, one suggested solution was piping the
==section header tag|secthead==
but while I understand why that is most intiutive to people who *get* Wikipedia, I suspect it's a bit too breakable when confronted with people who don't--and there are a lot more of them. So I like a template or parser function that takes an argument and expands to the appropriate hidden markup to support the pointer, myself.
Semantic MediaWiki in parts is very similar to what microformats try to achieve. It collects semantic data and offers it in machine-readable formats. The difference is that it also can work on the data within the wiki (e.g. you can search for things). One could adjust Semantic MediaWiki to support microformat-like applications. Now I don't know which microformat you have in mind -- Semantic MediaWiki is not the solution for everything; but is has a lot of existing infrastructure that was built for storing and processing such structured data. So maybe extending Semantic MediaWiki is easier than building another parser extension for microformats.
In general, microformats are application-specific mini-markups, that were tailored to simplify markup for people who write XHTML. Microformats are intended to be easy to write (for people used to writing XHTML), but they are not easy to parse. Extracting microformat data from XHTML may require substantial effort (at least that's what I heard from microformat people at the W3C Technical Plenary last year). Now if you create a wiki markup (Semantic MediaWiki has one, but you can invent another one if you need), just to re-embed the already extracted information into XHTML, this seems to be unnecessarily complicated. Since you already have the information, you can easily provide it in a separate block of data -- both XML and RDF based formats are available for many typical microformat applications.
There are many ways to attach metadata to HTML. flickr for instance exports RDF metadata by directly embedding it into XHTML pages in a sort of customised way. The RDFa effort is about to standardise a clean solution for this. Semantic MediaWiki in turn puts the RDF on another URL which is linked to the HTML-document through the header (this is fully XHTML conformant and it scales to larger amounts of data); tools like Firefox' Piggybank extension will find the data and can import it.
Microformats are still a very good starting point: they have identified common applications and provide sets of important property definitions for each. This is certainly something to draw from. I just would not implement a wiki markup, XHMTL encoding, and custom handling for each such format.
Btw. what tool that supports microformats do you have in mind? Maybe this tool supports further input formats that have been invented for similar applications.
Cheers,
Markus