Hi Flacus, hi Leo
1) we need a place to accounte things like for tool.
err... come again?
2) At the moment i can access the php via
http://tools.wikimedia.de/~daniel/foo/WikiProxy.php?wiki=de&title=Haus
That works with de and en. But if i use fr for example that doesnt work.
Hm? "Haus" does not exist in the fr wiki, so you get a 404 (with no
visible text, for consistency with action=raw). Looking for "Berlin"
works, for example:
http://tools.wikimedia.de/~daniel/foo/WikiProxy.php?wiki=fr&title=Berlin
Ok
you wrote only de/en will be cached.
but perhaps you can make the tool working for no-cached languages ?
WikiProxy works for all wikis. If there is no cache table, it will
simply pass the text through.
Btw.: the values for the wiki parameter can be full domain names. Short
names like "de", "fr", etc work for wikipedias, for other wikis, use
the
full domain name, like "pl.wikinews.org", etc.
3) what about cache-expired-time ?
The cache does not expire, the text is kept indefinitely, separate for
each revision.
4) Perhaps we can have a small space from the
toolserver to tmp more request?
Uh, what?
*Do you look only for remote articles when they not
avaible on the toolserver?
Yes. It first looks into the text table - if the text is not there (i.e.
it has the EXTERNAL flag), it looks into the cache. If it's not in the
cache, it pulls it via HTTP, and put it into the cache.
I ask this because I will read all articles in german
wikipedia (at least at
the first time I run my script) and that will bring a big performence
problem.
If you need to process a *lot* of articles, use an XML dump. Your
database will not be up to the minute anyway. If you need to track live
updates, consider using the Atom feed for the RC page - it's possible to
extract the diff from that, but it's a bit messy. I have code for that
somewhere, though.
*Can you give my a PHP Interface to the wikiproxy?
Something like a include
file "WikiProxy.inc" with a function to return the article as string? That
would be great! 'Cause I doesn't need any slow http, tcp or whatever
connection.
Accessing the cache directly would require you to have read- and write
access to the cache tables - this is messy administration-wise. As I
said, I also thought about bypassing the HTTP interface for the proxy...
I started to write a daemon mode for the proxy, so it can be contacted
using fifos or plain TCP - it works more or less, but there's no client
interface for this yet. I could start to write one, but don't hold your
breath... in any case, I'm not sure how much faster that would actually be.
Regards,
Daniel
--
Homepage:
http://brightbyte.de