On Wed, Apr 6, 2011 at 3:15 AM, Alex Brollo alex.brollo@gmail.com wrote:
I saved the HTML source of a typical Page: page from it.source, the resulting txt file having ~ 28 kBy; then I saved the "core html" only, t.i. the content of <div class="pagetext">, and this file have 2.1 kBy; so there's a more than tenfold ratio between "container" and "real content".
I there a trick to download the "core html" only? And, most important: could this save a little bit of server load/bandwidth?
It could save a huge amount of bandwidth. This could be a big deal for mobile devices, in particular, but it could also reduce TCP round-trips and make things noticeably snappier for everyone. I recently read about an interesting technique on Steve Souders' blog, which he got by analyzing Google and Bing:
http://www.stevesouders.com/blog/2011/03/28/storager-case-study-bing-google/
The gist is that when the page loads, assuming script is enabled, you store static pieces of the page in localStorage (available on the large majority of browsers, including IE8) and set a cookie. Then if the cookie is present on a subsequent request, the server doesn't send the repetitive static parts, and instead sends a <script> that inserts the desired contents synchronously from localStorage. I guess this breaks horribly if you request a page with script enabled and then request another page with script disabled, but otherwise the basic idea seems powerful and reliable, if a bit tricky to get right.
Of course, it would at least double the amount of space pages take in Squid cache, so for us it might not be a great idea. Still interesting, and worth keeping in mind.