Re: [Toolserver-l] Experimental: Live template value search

22 Apr 2009


      On Tue, Apr 21, 2009 at 21:15, Daniel Kinzler daniel@brightbyte.de wrote:
...
More or less - the parser parses the text, and hands the bit that is RDF
(turtle) to the RDF-Extension for analysis. It analyzes the statements and would
save it to the database (this is not yet implemented).
There is a preprocessor that expands all templates recursively. After that, the
real "parser" (read: munger) is invoked to turn wiki text into HTML.
In the case of a "semantified" infobox, the substitution process would generate
RDF/Turtle statements using the template parameters. These would in turn be
handed to the RDF extension, which would write the resulting triples to the
database.
Thanks! My picture of the process is becoming clearer... :-)
To reiterate: Template definitions would be extended to generate
not just Wikitext aimed at the HTML generator, but also stuff that
is processed by the RDF generator but ignored by the HTML
generator (or at least by the browser).
Maybe it would sometimes be better for the RDF generator to have
access to the unexpanded templates?
Property values contain all kinds of stuff, and DBpedia experience
shows that one often needs specialized parsers to extract only
the desired info. One way to distinguish between desired and
undesired info is to have some metadata about the targets of
wikilinks. The RDF extension would have to be quite sophisticated...
...
...
How are updates distributed? Do subscribers regularly poll
the server for recent changes? Or is there some kind of
store-and-forward / publish-subscribe?
There is the RSS/Atom feed (human readable, not easy to parse), and an OAI-PMH
interface ("life update feed"). There's also the web API for polling data in a
machine readable form, and there's the RC ("recent changes") channel on IRC
(human readable, can't be parsed reliably). True XMPP based pubsub is being
worked on, see http://brightbyte.de/page/RecentChanges_via_Jabber.
Looks great! But if I understand this correctly, tools that need the whole
article text would still have to pull it from wikipedia servers. Pushing the
whole text might improve scalability. A hierarchical structure (like a
content distribution network) would be cool...
An RDF extension could simply be one of these 'text updates' consumers.
Performance-wise, that would duplicate some of the effort of expanding the
templates etc., but distribution and separation would come "for free".
Christopher

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Toolserver-l] Experimental: Live template value search