I am very pleased to announce that my tools are now officially running on the toolserver [1].
This includes my wiki2xml converter [2], which can now take parameters via URL (as described on the page), allowing other scripts/bots to grab XML-converted Wikipedia pages directly, or users to get Wikipedia article(s) as ODT file, directly via a link.
Magnus
[1] http://tools.wikimedia.de/~magnus/index.html [2] http://tools.wikimedia.de/~magnus/wiki2xml/w2x.php
On 5/1/06, Magnus Manske magnus.manske@web.de wrote:
This includes my wiki2xml converter [2], which can now take parameters via URL (as described on the page), allowing other scripts/bots to grab XML-converted Wikipedia pages directly, or users to get Wikipedia article(s) as ODT file, directly via a link.
I realize it's got a comment that it's slow, but I just requested (by title) the page Texas. It's been running about 5 minutes now with no response. Is it failing under load?
Jeremy Dunck wrote:
On 5/1/06, Magnus Manske magnus.manske@web.de wrote:
This includes my wiki2xml converter [2], which can now take parameters via URL (as described on the page), allowing other scripts/bots to grab XML-converted Wikipedia pages directly, or users to get Wikipedia article(s) as ODT file, directly via a link.
I realize it's got a comment that it's slow, but I just requested (by title) the page Texas. It's been running about 5 minutes now with no response. Is it failing under load?
I've been running some of the scripts yesterday, and they were *way* more responsive. It seems the text access is slowing them down today. After all, not only the page text, but also the templates (and sub-templates and so on) have to be loaded.
Sadly, it does not seem to be possible to load text directly from the toolserver database for current or recent versions.
Magnus
Hello, Am Montag, den 01.05.2006, 22:14 +0200 schrieb Magnus Manske:
Sadly, it does not seem to be possible to load text directly from the toolserver database for current or recent versions.
the current text can be found (maybe not for all articels (ask Duesentreib about that)) in the table cur.
Exsample:
select cur_text from cur where cur_id=1001 limit 1; (on dewiki_p) is the "Welt am Draht"-articel in de:
Magnus
Sincerly, DaB.
Hi
the current text can be found (maybe not for all articels (ask Duesentreib about that)) in the table cur.
This is incorrect. The cur-table is not being updated since the days of MediaWiki 1.4. Currently, the only way to get full text reliably is via HTTP, preferably from WikiProxy.
Full text is (conceptionally) stored in the `text` table (referenced from the revision table), but in practice that table often just contains pointers to an external storage server, to which we don't have access.
I have been told about a few problems with WikiProxy and am currently trying to fix them. I will put a new version live soon.
NOTE: this also means that access tokens will soon become necessary if you want to use WikiProxy remotely. Local access from Zedler will work as before.
-- Daniel
On 5/1/06, Magnus Manske magnus.manske@web.de wrote:
Sadly, it does not seem to be possible to load text directly from the toolserver database for current or recent versions.
OK. Also, is there some place to look for what all this XML means?
I see you're preserving spaces and all that, which is nice, but I'm not sure how to go about getting meaning from the document.
Also: <articles loadtime="1017.704256773 sec" rendertime="6.0266571044922 sec" totaltime="1023.7309138775 sec">
Ouch. :)
Jeremy Dunck schrieb:
On 5/1/06, Magnus Manske magnus.manske@web.de wrote:
Sadly, it does not seem to be possible to load text directly from the toolserver database for current or recent versions.
OK. Also, is there some place to look for what all this XML means?
No. Maybe I should write a Meta page... (I have no experience whatsoever in writing DTDs or the like)
I see you're preserving spaces and all that, which is nice, but I'm not sure how to go about getting meaning from the document.
Also: <articles loadtime="1017.704256773 sec" rendertime="6.0266571044922 sec" totaltime="1023.7309138775 sec">
17 minutes to load the text, six seconds to parse it. Strange, I just ran [[en:Texas]] locally, and loadtime was 30 seconds (rendering 4.2 sec). Is there some kind of limiter running on the toolserver?
Ouch. :)
Indeed.
Magnus
Magnus Manske:
17 minutes to load the text, six seconds to parse it. Strange, I just ran [[en:Texas]] locally, and loadtime was 30 seconds (rendering 4.2 sec). Is there some kind of limiter running on the toolserver?
something is currently hammering wikiproxy, which is why the load is so high. that might explain the load time, if you use wikiproxy for it.
(tail -f /var/log/apache/access)
- river.
River Tarnell schrieb:
Magnus Manske:
17 minutes to load the text, six seconds to parse it. Strange, I just ran [[en:Texas]] locally, and loadtime was 30 seconds (rendering 4.2 sec). Is there some kind of limiter running on the toolserver?
something is currently hammering wikiproxy, which is why the load is so high. that might explain the load time, if you use wikiproxy for it.
Actually, at this very moment, it seems quite fast (e.g., "missing images").
I can switch my scripts to use either wikiproxy or the direct "&action=raw" request. The latter seems a little slower that wikiproxy during good time, though.
Recommendations?
Magnus
For what it's worth, I've added most tools to my new list of toolserver tools that allows grouping by tags (categories):
http://tools.wikimedia.de/~interiot/cgi-bin/tstoc
I'll be improving the script a little more in the near future to make it easier for users to discover useful tools, but it should be useful already.
-Interiot
On Mon, May 01, 2006 at 03:55:42PM +0200, Magnus Manske wrote:
I am very pleased to announce that my tools are now officially running on the toolserver [1].
This includes my wiki2xml converter [2], which can now take parameters via URL (as described on the page), allowing other scripts/bots to grab XML-converted Wikipedia pages directly, or users to get Wikipedia article(s) as ODT file, directly via a link.
Magnus
[1] http://tools.wikimedia.de/~magnus/index.html [2] http://tools.wikimedia.de/~magnus/wiki2xml/w2x.php _______________________________________________ Toolserver-l mailing list Toolserver-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/toolserver-l
toolserver-l@lists.wikimedia.org