Core html of a wikisource page - Wikitech-l

6 Apr 2011

I saved the HTML source of a typical Page: page from it.source, the
resulting txt file having ~ 28 kBy; then I saved the "core html" only, t.i.
the content of <div class="pagetext">, and this file have 2.1 kBy; so
there's a more than tenfold ratio between "container" and "real
content".

I there a trick to download the "core html" only? And, most important: could
this save a little bit of server load/bandwidth? I humbly think that "core
html" alone could be useful as a means to obtain a "well formed page
content",  and that this could be useful to obtain derived formats of the
page (i.e. ePub).

Alex brollo