At 2:28 PM -0800 3/21/06, Brion Vibber wrote:
G@B wrote:
- Is there a Wikipedia API-Specification in order
to reach my goal?
Not at this time.
- If no, which alternatives are there?
If you're a nice person: open a web browser and point it at Wikipedia.
That's approximately what I do in my web pages. My text is rife with
links to WP (to the point that $WP is defined in the PHP code :-). In
fact, the ability to use WP as a source of explanatory footnotes is a
very big win for me.
However, I can imagine situations where this would not be an optimal
solution. For example, someone might wish to present the user with
tooltips, image-mapped diagrams for context and navigation, etc. So,
the web site would display derived information (probably linking to
WP, as well).
This sort of analysis requires intimate familiarity with the structure
of the input data. So, definitions, consistency, and stability are
important requirements.
At 11:25 PM +0100 3/21/06, G@B wrote:
- If no, which alternatives are there?
Here are some possibilities I've considered:
* screen scraping
UI design is hard enough without trying to keep things convenient
for use by programs. So, most developers (let alone contributors)
won't optimize for this use.
XHTML, if used, helps with some low-level syntax issues (XML
parsers work :-), but the structure may still be chaotic and
subject to unannounced changes.
Nonetheless, I've suggested that Semantic WP (SWP) tags be part
of the generated XHTML, to enable analysis (etc) by browsers.
* XML/SOAP/...
Quite possible, assuming that WP will allow it and someone can do
(or support) the necessary standardization and implementation.
The SWP folks will be forced to do something like this, if nobody
gets there first. In any case, it won't be trivial to do right.
* RDBMS (eg, MySQL)
Assuming that read access is available, a script can easily send
off queries and evaluate the replies. WP could, in fact, allow
this, but caution would be worthwhile, as this level of access
might create new openings for DDoS attacks, etc. OTOH, if access
were controlled, correct behavior could be enforced by fiat.
Otherwise, you fall back to Brion's suggestion of keeping a mirror.
The last time I checked, this was not a turnkey procedure, but the
situation may be different now. Is mirroring automated now?
* code-level (e.g., PHP) access
If you have access to the MW code base, you can grab any data you
like. However, this puts you in the role of maintaining a forked
version of MW. Of course, if your changes are deemed useful and
safe, you might get them into the MW code base. In fact, putting
XML access and/or SMW facilities into MW is an example of this.
* command-line access
If you have command-line access on a machine where MW is running,
and appropriate permissions, you can access data in a number of
ways. For example, you could look directly into the MySQL files
or behind MW's back at generated files, etc. (However, YMMV!)
In summary, there are a variety of options. My own approach is to use
mediated database access (eg, via Perl's DBI module). This shields me
from implementation details and reduces portability issues. With the
exception of the DB structure, I can treat MW largely as a "black box".
Although I'm not sure I'll need it, DBI-Link provides a way to access
arbitrary databases via PostgreSQL. So, if PostgreSQL can provide
facilities that MySQL (or whatever) does not, it can be used as a
"wrapper":
??? -> PostgreSQL -> PL/PerlU -> DBI-Link -> Perl DBI -> MySQL (etc)
I would be happy to hear of other possibilities, etc. TMTOWTDI!
-r
--
http://www.cfcl.com/rdm Rich Morin
http://www.cfcl.com/rdm/resume rdm(a)cfcl.com
http://www.cfcl.com/rdm/weblog +1 650-873-7841
Technical editing and writing, programming, and web development