Re: [Toolserver-l] Troubles with reading Articles

23 Mar 2006

      Hi Flacus, hi Leo
...

we need a place to accounte things like for tool.

err... come again?
...

At the moment i can access the php via

http://tools.wikimedia.de/~daniel/foo/WikiProxy.php?wiki=de&title=Haus
That works with de and en. But if i use fr for example that doesnt work.
Hm? "Haus" does not exist in the fr wiki, so you get a 404 (with no
visible text, for consistency with action=raw). Looking for "Berlin"
works, for example:
http://tools.wikimedia.de/~daniel/foo/WikiProxy.php?wiki=fr&title=Berlin
...
Ok
you wrote only de/en will be cached.
but perhaps you can make the tool working for no-cached languages ?
WikiProxy works for all wikis. If there is no cache table, it will
simply pass the text through.
Btw.: the values for the wiki parameter can be full domain names. Short
names like "de", "fr", etc work for wikipedias, for other wikis, use the
full domain name, like "pl.wikinews.org", etc.
...

what about cache-expired-time ?

The cache does not expire, the text is kept indefinitely, separate for
each revision.
...

Perhaps we can have a small space from the toolserver to tmp more request?

Uh, what?
...
*Do you look only for remote articles when they not avaible on the toolserver?
Yes. It first looks into the text table - if the text is not there (i.e.
it has the EXTERNAL flag), it looks into the cache. If it's not in the
cache, it pulls it via HTTP, and put it into the cache.
...
I ask this because I will read all articles in german wikipedia (at least at 
the first time I run my script) and that will bring a big performence 
problem.
If you need to process a *lot* of articles, use an XML dump. Your
database will not be up to the minute anyway. If you need to track live
updates, consider using the Atom feed for the RC page - it's possible to
extract the diff from that, but it's a bit messy. I have code for that
somewhere, though.
...
*Can you give my a PHP Interface to the wikiproxy? Something like a include 
file "WikiProxy.inc" with a function to return the article as string? That 
would be great! 'Cause I doesn't need any slow http, tcp or whatever 
connection.
Accessing the cache directly would require you to have read- and write
access to the cache tables - this is messy administration-wise. As I
said, I also thought about bypassing the HTTP interface for the proxy...
I started to write a daemon mode for the proxy, so it can be contacted
using fifos or plain TCP - it works more or less, but there's no client
interface for this yet. I could start to write one, but don't hold your
breath... in any case, I'm not sure how much faster that would actually be.
Regards,
Daniel
-- 
Homepage: http://brightbyte.de

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Toolserver-l] Troubles with reading Articles