2011/4/6 Daniel Kinzler daniel@brightbyte.de
On 06.04.2011 09:15, Alex Brollo wrote:
I saved the HTML source of a typical Page: page from it.source, the resulting txt file having ~ 28 kBy; then I saved the "core html" only,
t.i.
the content of <div class="pagetext">, and this file have 2.1 kBy; so there's a more than tenfold ratio between "container" and "real content".
wow, really? that seems a lot...
I there a trick to download the "core html" only?
there are two ways:
a) the old style "render" action, like this: http://en.wikipedia.org/wiki/Foo?action=render
b) the api "parse" action, like this: < http://en.wikipedia.org/w/api.php?action=parse&page=Foo&redirects=1&...
To learn more about the web API, have a look at < http://www.mediawiki.org/wiki/API%3E
Thanks Daniel, API stuff is a little hard for me: the more I study, the less I edit. :-)
Just to have a try, I called the same page, "render" action gives a file of ~ 3.4 kBy, "api" action a file of ~ 5.6 kBy. Obviuosly I'm thinking to bot download. You are suggesting that it would be a good idea to use a *unlogged * bot to avoid page parsing, and to catch the page code from some cache? I know that some thousands of calls are nothing for wiki servers, but... I always try to get a good performance, even from the most banal template.
Alex