On Wed, Apr 6, 2011 at 3:15 AM, Alex Brollo <alex.brollo(a)gmail.com> wrote:
I saved the HTML source of a typical Page: page from
it.source, the
resulting txt file having ~ 28 kBy; then I saved the "core html" only, t.i.
the content of <div class="pagetext">, and this file have 2.1 kBy; so
there's a more than tenfold ratio between "container" and "real
content".
I there a trick to download the "core html" only? And, most important: could
this save a little bit of server load/bandwidth?
It could save a huge amount of bandwidth. This could be a big deal
for mobile devices, in particular, but it could also reduce TCP
round-trips and make things noticeably snappier for everyone. I
recently read about an interesting technique on Steve Souders' blog,
which he got by analyzing Google and Bing:
http://www.stevesouders.com/blog/2011/03/28/storager-case-study-bing-google/
The gist is that when the page loads, assuming script is enabled, you
store static pieces of the page in localStorage (available on the
large majority of browsers, including IE8) and set a cookie. Then if
the cookie is present on a subsequent request, the server doesn't send
the repetitive static parts, and instead sends a <script> that inserts
the desired contents synchronously from localStorage. I guess this
breaks horribly if you request a page with script enabled and then
request another page with script disabled, but otherwise the basic
idea seems powerful and reliable, if a bit tricky to get right.
Of course, it would at least double the amount of space pages take in
Squid cache, so for us it might not be a great idea. Still
interesting, and worth keeping in mind.