Hi Flacus, hi Leo
- we need a place to accounte things like for tool.
err... come again?
- At the moment i can access the php via
http://tools.wikimedia.de/~daniel/foo/WikiProxy.php?wiki=de&title=Haus That works with de and en. But if i use fr for example that doesnt work.
Hm? "Haus" does not exist in the fr wiki, so you get a 404 (with no visible text, for consistency with action=raw). Looking for "Berlin" works, for example: http://tools.wikimedia.de/~daniel/foo/WikiProxy.php?wiki=fr&title=Berlin
Ok you wrote only de/en will be cached. but perhaps you can make the tool working for no-cached languages ?
WikiProxy works for all wikis. If there is no cache table, it will simply pass the text through.
Btw.: the values for the wiki parameter can be full domain names. Short names like "de", "fr", etc work for wikipedias, for other wikis, use the full domain name, like "pl.wikinews.org", etc.
- what about cache-expired-time ?
The cache does not expire, the text is kept indefinitely, separate for each revision.
- Perhaps we can have a small space from the toolserver to tmp more request?
Uh, what?
*Do you look only for remote articles when they not avaible on the toolserver?
Yes. It first looks into the text table - if the text is not there (i.e. it has the EXTERNAL flag), it looks into the cache. If it's not in the cache, it pulls it via HTTP, and put it into the cache.
I ask this because I will read all articles in german wikipedia (at least at the first time I run my script) and that will bring a big performence problem.
If you need to process a *lot* of articles, use an XML dump. Your database will not be up to the minute anyway. If you need to track live updates, consider using the Atom feed for the RC page - it's possible to extract the diff from that, but it's a bit messy. I have code for that somewhere, though.
*Can you give my a PHP Interface to the wikiproxy? Something like a include file "WikiProxy.inc" with a function to return the article as string? That would be great! 'Cause I doesn't need any slow http, tcp or whatever connection.
Accessing the cache directly would require you to have read- and write access to the cache tables - this is messy administration-wise. As I said, I also thought about bypassing the HTTP interface for the proxy... I started to write a daemon mode for the proxy, so it can be contacted using fifos or plain TCP - it works more or less, but there's no client interface for this yet. I could start to write one, but don't hold your breath... in any case, I'm not sure how much faster that would actually be.
Regards, Daniel