Re: [Wikitech-l] Core html of a wikisource page

6 Apr 2011


      2011/4/6 Daniel Kinzler daniel@brightbyte.de
...
On 06.04.2011 09:15, Alex Brollo wrote:
...
I saved the HTML source of a typical Page: page from it.source, the
resulting txt file having ~ 28 kBy; then I saved the "core html" only,
t.i.
...
the content of <div class="pagetext">, and this file have 2.1 kBy; so
there's a more than tenfold ratio between "container" and "real content".
wow, really? that seems a lot...
...
I there a trick to download the "core html" only?
there are two ways:
a) the old style "render" action, like this:
http://en.wikipedia.org/wiki/Foo?action=render
b) the api "parse" action, like this:
<
http://en.wikipedia.org/w/api.php?action=parse&page=Foo&redirects=1&...
...
To learn more about the web API, have a look at <
http://www.mediawiki.org/wiki/API%3E
Thanks Daniel, API stuff is a little hard for me:  the more I study, the
less I edit. :-)
Just to have a try, I called the same page, "render" action gives a file of
~ 3.4 kBy, "api" action a file of ~ 5.6 kBy. Obviuosly I'm thinking to bot
download. You are suggesting that it would be a good idea to use a *unlogged
* bot to avoid page parsing, and to catch the page code from some cache? I
know that some thousands of calls are nothing for wiki servers, but... I
always try to get a good performance, even from the most banal template.
Alex

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Core html of a wikisource page