2011/4/6 Daniel Kinzler <daniel(a)brightbyte.de>
On 06.04.2011 09:15, Alex Brollo wrote:
I saved the HTML source of a typical Page: page
from it.source, the
resulting txt file having ~ 28 kBy; then I saved the "core html" only,
t.i.
the content of <div
class="pagetext">, and this file have 2.1 kBy; so
there's a more than tenfold ratio between "container" and "real
content".
wow, really? that seems a lot...
I there a trick to download the "core
html" only?
there are two ways:
a) the old style "render" action, like this:
<http://en.wikipedia.org/wiki/Foo?action=render>
b) the api "parse" action, like this:
<
http://en.wikipedia.org/w/api.php?action=parse&page=Foo&redirects=1…
To learn more about the web API, have a look at <
http://www.mediawiki.org/wiki/API>
Thanks Daniel, API stuff is a little hard for me: the more I study, the
less I edit. :-)
Just to have a try, I called the same page, "render" action gives a file of
~ 3.4 kBy, "api" action a file of ~ 5.6 kBy. Obviuosly I'm thinking to bot
download. You are suggesting that it would be a good idea to use a *unlogged
* bot to avoid page parsing, and to catch the page code from some cache? I
know that some thousands of calls are nothing for wiki servers, but... I
always try to get a good performance, even from the most banal template.
Alex