On Saturday 17 February 2007 21:04, Jay R. Ashworth wrote:
On Fri, Feb 16, 2007 at 10:18:14AM +1100, Steve
Bennett wrote:
On 2/16/07, Jay R. Ashworth
<jra(a)baylink.com> wrote:
There was a fairly extensive discussion on this
list a couple weeks ago
-- which I thought you had participated in -- about microformats (I
think the buzzword was) which amounted to "custom-made attributed to
HTML tags which would be usable for semantic extraction, but ignored by
browsers".
I'm sure I've never heard that term before, but fortunately
[[Microformats]] got me up to speed. (Is there anything Wikipedia
can't do :))
This topic, and the approach you and I forsee, is
sort of an offshoot
thereof...
I guess. What would the syntax look like to the user?
Well, one suggested solution was piping the
==section header tag|secthead==
but while I understand why that is most intiutive to people who *get*
Wikipedia, I suspect it's a bit too breakable when confronted with
people who don't--and there are a lot more of them. So I like a
template or parser function that takes an argument and expands to the
appropriate hidden markup to support the pointer, myself.
Semantic MediaWiki in parts is very similar to what microformats try to
achieve. It collects semantic data and offers it in machine-readable formats.
The difference is that it also can work on the data within the wiki (e.g. you
can search for things). One could adjust Semantic MediaWiki to support
microformat-like applications. Now I don't know which microformat you have in
mind -- Semantic MediaWiki is not the solution for everything; but is has a
lot of existing infrastructure that was built for storing and processing such
structured data. So maybe extending Semantic MediaWiki is easier than
building another parser extension for microformats.
In general, microformats are application-specific mini-markups, that were
tailored to simplify markup for people who write XHTML. Microformats are
intended to be easy to write (for people used to writing XHTML), but they are
not easy to parse. Extracting microformat data from XHTML may require
substantial effort (at least that's what I heard from microformat people at
the W3C Technical Plenary last year). Now if you create a wiki markup
(Semantic MediaWiki has one, but you can invent another one if you need),
just to re-embed the already extracted information into XHTML, this seems to
be unnecessarily complicated. Since you already have the information, you can
easily provide it in a separate block of data -- both XML and RDF based
formats are available for many typical microformat applications.
There are many ways to attach metadata to HTML. flickr for instance exports
RDF metadata by directly embedding it into XHTML pages in a sort of
customised way. The RDFa effort is about to standardise a clean solution for
this. Semantic MediaWiki in turn puts the RDF on another URL which is linked
to the HTML-document through the header (this is fully XHTML conformant and
it scales to larger amounts of data); tools like Firefox' Piggybank extension
will find the data and can import it.
Microformats are still a very good starting point: they have identified common
applications and provide sets of important property definitions for each.
This is certainly something to draw from. I just would not implement a wiki
markup, XHMTL encoding, and custom handling for each such format.
Btw. what tool that supports microformats do you have in mind? Maybe this tool
supports further input formats that have been invented for similar
applications.
Cheers,
Markus
--
Markus Krötzsch
Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe
mak(a)aifb.uni-karlsruhe.de phone +49 (0)721 608 7362
www.aifb.uni-karlsruhe.de/WBS/ fax +49 (0)721 693 717