Some data about size of Recent Changes and possible optimalizations.
RC (Polish, anonymous) for 250 articles has 198 720 bytes.
Some possible improvements: * not generating space after <li> 249 bytes saved * not generating newlines 356 bytes saved * using <i> not <em> 319 bytes saved * using <b> not <strong> 1219 bytes saved * not generating </li> 1249 bytes saved * not using class='internal' (by making this default case) 4555 bytes saved * not generating title="..." 7329 bytes saved * using relative links for links to http://pl.wikipedia.org/wiki/... 7713 bytes saved * not generating PHPSESSID=... in links 9674 bytes saved
So: * trivial markup changes - 2143 bytes, or 1.1% saved * ... + making internal links default - 6698 bytes, or 3.4% saved (this is trivial and doesn't affect anything) * ... + relative links - 14411 bytes, or 7.3% (still doesn't affect anything, but implementing generation of relative links may be a bit tricky as we allow many different paths to the same thing) * doing all these changes except not generating </li> - 31414 bytes, or 15.8% saved (it may have some minor effect on functionality) * doing all these changes - 33047 bytes, or 16.6% saved (won't be xhtml-compatible, but perfect html)
Tomasz-
- not using class='internal' (by making this default case) 4555 bytes saved
I support these changes except for this one, unless it is changed throughout the entire code base. It is also questionable whether the "history" links are really necessary.
Regards,
Erik
On Sun, Jul 20, 2003 at 01:47:00AM +0200, Erik Moeller wrote:
Tomasz-
- not using class='internal' (by making this default case) 4555 bytes saved
I support these changes except for this one, unless it is changed throughout the entire code base.
Yeah, hard to do it to RC only.
It is also questionable whether the "history" links are really necessary.
Well, that would mean some functionality lost. Better do those improvements that don't affect functionality first.
Tomasz Wegrzanowski wrote:
- using <i> not <em> 319 bytes saved
- using <b> not <strong> 1219 bytes saved
<em> and <strong> should be used, not <i> and <b>. The reasons are complicated and have to do with accessibility and text browsers. I don't really understand those reasons myself, but from my experience it seems to be consensus to use <em> and <strong>. I think <i> and <b> are even deprecated in the newest version of HTML, or something...
On Sun, Jul 20, 2003 at 02:18:30AM +0200, Timwi wrote:
Tomasz Wegrzanowski wrote:
- using <i> not <em> 319 bytes saved
- using <b> not <strong> 1219 bytes saved
<em> and <strong> should be used, not <i> and <b>. The reasons are complicated and have to do with accessibility and text browsers. I don't really understand those reasons myself, but from my experience it seems to be consensus to use <em> and <strong>. I think <i> and <b> are even deprecated in the newest version of HTML, or something...
Wasting bytes where it's critical to make RC as small as possible for "complicated reasons" doesn't really convince me.
And no, they're not deprecated in HTMl 4.01.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 20 Jul 2003 02:39, Tomasz Wegrzanowski wrote:
On Sun, Jul 20, 2003 at 02:18:30AM +0200, Timwi wrote:
Tomasz Wegrzanowski wrote:
- using <i> not <em> 319 bytes saved
- using <b> not <strong> 1219 bytes saved
<em> and <strong> should be used, not <i> and <b>. The reasons are complicated and have to do with accessibility and text browsers. I don't really understand those reasons myself, but from my experience it seems to be consensus to use <em> and <strong>. I think <i> and <b> are even deprecated in the newest version of HTML, or something...
Wasting bytes where it's critical to make RC as small as possible for "complicated reasons" doesn't really convince me.
OK, I'll have a stab at this: the original concept of the <I> and <B> tags were to make the text they encapsulated italicised and emboldened, respectively. This is obviously a Bad Thing, as it implicitly promotes the (false) concept that HTML is a print mark-up language, as opposed to a text mark-up language, for display, printing, being read aloud, being encoded into light pulses, etc. This isn't really 'complicated' per se, it just requires people to understand the underlying philosophy of HTML.
And a far more useful byte-stripper would be to install mod_gzip (or is this being used already?).
And no, they're not deprecated in HTML 4.01.
To quote: "[These] elements specify font information. Although they are not [...] deprecated, their use is discouraged in favour of style sheets."
So, they are "something" similar to deprecated.
Yours, - -- James D. Forrester mailto:jon@eh.org | mailto:csvla@dcs.warwick.ac.uk mailto:jamesdforrester@hotmail.com | mailto:james@jdforrester.org
On Sun, Jul 20, 2003 at 03:53:45AM +0100, James D. Forrester wrote:
On 20 Jul 2003 02:39, Tomasz Wegrzanowski wrote:
On Sun, Jul 20, 2003 at 02:18:30AM +0200, Timwi wrote:
Tomasz Wegrzanowski wrote:
- using <i> not <em> 319 bytes saved
- using <b> not <strong> 1219 bytes saved
<em> and <strong> should be used, not <i> and <b>. The reasons are complicated and have to do with accessibility and text browsers. I don't really understand those reasons myself, but from my experience it seems to be consensus to use <em> and <strong>. I think <i> and <b> are even deprecated in the newest version of HTML, or something...
Wasting bytes where it's critical to make RC as small as possible for "complicated reasons" doesn't really convince me.
OK, I'll have a stab at this: the original concept of the <I> and <B> tags were to make the text they encapsulated italicised and emboldened, respectively.
What we want to say is exactly "this should be in italics", and "this should be in bold font", not "this should be emphasized", "this should be strongly emphasized". As what we want is exactly *font information*, we should be using font markup.
Just think for a moment - when read aloud it should be read the same way as the rest of text, not with any emphasis added
Anyway code style doesn't really matter here - we should use whatever results in smaller Recent Changes.
This is obviously a Bad Thing, as it implicitly promotes the (false) concept that HTML is a print mark-up language, as opposed to a text mark-up language, for display, printing, being read aloud, being encoded into light pulses, etc. This isn't really 'complicated' per se, it just requires people to understand the underlying philosophy of HTML.
Underlying philosophy is "primitive text display markup language later extended to support other media types, but not very good at it"
And a far more useful byte-stripper would be to install mod_gzip (or is this being used already?).
Completely unrelated.
And no, they're not deprecated in HTML 4.01.
To quote: "[These] elements specify font information. Although they are not [...] deprecated, their use is discouraged in favour of style sheets."
So, they are "something" similar to deprecated.
This is far from that. There's really no reason to use <span class='italics'> instead of <i>.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 20 Jul 2003 04:27, Tomasz Wegrzanowski wrote:
On Sun, Jul 20, 2003 at 03:53:45AM +0100, James D. Forrester wrote:
[Snip]
What we want is exactly *font information*, we should be using font mark-up.
No, 'we' want text mark-up, '''you''' seem to want font mark-up. Font mark-up is inherently evil and wrong in the context of a web-based system. Feel free to fork a print-medium version of the Wikipedia; this isn't what we are about, and never has been - Wikipedia isn't a print medium, as is expostulated in a large variety of discussions and pages, notably on meta.
Just think for a moment - when read aloud it should be read the same way as the rest of text, not with any emphasis added
Sorry, but this is Just Plain Wrong; if you read italicised or emboldened text as plain-spoken text, you're Doing It Wrong (tm).
Anyway code style doesn't really matter here - we should use whatever results in smaller Recent Changes.
No. 'Philosophical' this point may be, but abject utilitarianism isn't a suitable method for standards design or adherence. What you propose would render the Wikipedia unusable for some people; surely this runs entirely contrary to the concept of open use that the Wikipedia aims for?
[Snip]
This isn't really 'complicated' per se, it just requires people to understand the underlying philosophy of HTML.
Underlying philosophy is "primitive text display mark-up language later extended to support other media types, but not very good at it"
No. NOT 'display'. Honest.
And a far more useful byte-stripper would be to install mod_gzip (or is this being used already?).
Completely unrelated.
Err, no. This can dramatically reduce the number of bytes transmitted, which is what any and all suchwise effort is geared towards. It's certainly not unrelated.
[Snip]
Yours, - -- James D. Forrester mailto:jon@eh.org | mailto:csvla@dcs.warwick.ac.uk mailto:jamesdforrester@hotmail.com | mailto:james@jdforrester.org
"James D. Forrester" james@jdmf.demon.co.uk writes:
No. 'Philosophical' this point may be, but abject utilitarianism isn't a suitable method for standards design or adherence. What you propose would render the Wikipedia unusable for some people; surely this runs entirely contrary to the concept of open use that the Wikipedia aims for?
In principle, I agree with you
... but ...
In practice, non-text based HTML renderers treat <i> and <b> in exactly the same way as <em> and <strong>. So although what we're doing is bad and wrong, doing it the right way won't actually make it more accesible (and will slow load times ... ironically making it *less* accesible.)
On Sun, Jul 20, 2003 at 06:05:00AM +0100, James D. Forrester wrote:
Anyway code style doesn't really matter here - we should use whatever results in smaller Recent Changes.
No. 'Philosophical' this point may be, but abject utilitarianism isn't a suitable method for standards design or adherence. What you propose would render the Wikipedia unusable for some people; surely this runs entirely contrary to the concept of open use that the Wikipedia aims for?
So show one case when using <i>/<b> actually breaks something or no discussion is needed.
Tomasz Wegrzanowski wrote:
On Sun, Jul 20, 2003 at 02:18:30AM +0200, Timwi wrote:
Tomasz Wegrzanowski wrote:
- using <i> not <em> 319 bytes saved
- using <b> not <strong> 1219 bytes saved
<em> and <strong> should be used, not <i> and <b>. The reasons are complicated and have to do with accessibility and text browsers. I don't really understand those reasons myself, but from my experience it seems to be consensus to use <em> and <strong>. I think <i> and <b> are even deprecated in the newest version of HTML, or something...
Wasting bytes where it's critical to make RC as small as possible for "complicated reasons" doesn't really convince me.
Could you please explain to me why you think it's critical to make RC as small as possible? Our biggest current problem is database performance, not bandwidth usage.
If you really want to make it as small as possible, then you should use gzip; once that is installed, using <em> vs. <i> won't make a difference.
Timwi
On Sun, Jul 20, 2003 at 02:36:38PM +0200, Timwi wrote:
Tomasz Wegrzanowski wrote:
On Sun, Jul 20, 2003 at 02:18:30AM +0200, Timwi wrote:
Tomasz Wegrzanowski wrote:
- using <i> not <em> 319 bytes saved
- using <b> not <strong> 1219 bytes saved
<em> and <strong> should be used, not <i> and <b>. The reasons are complicated and have to do with accessibility and text browsers. I don't really understand those reasons myself, but from my experience it seems to be consensus to use <em> and <strong>. I think <i> and <b> are even deprecated in the newest version of HTML, or something...
Wasting bytes where it's critical to make RC as small as possible for "complicated reasons" doesn't really convince me.
Could you please explain to me why you think it's critical to make RC as small as possible? Our biggest current problem is database performance, not bandwidth usage.
Because most people are using 56kbit modems and recent changes size is huge. It's not about server bandwidth at all.
If you really want to make it as small as possible, then you should use gzip; once that is installed, using <em> vs. <i> won't make a difference.
It's not supported by too many browsers.
Tomasz Wegrzanowski wrote:
Because most people are using 56kbit modems and recent changes size is huge. It's not about server bandwidth at all.
I have a 56k. The slow speed I get from RC *is* due to Wikipedia, the database I presume. While RC is chugging s-l-o-w-l-y I can get Google results up in no time :)
One enhancement to RC I suggested a LONG time ago was a link at the foot of the page saying:
"Show next 100 | 250 | 500 changes"
With this, compulsive RC addicts could check a whole 24 hours of edits in small chunks. This might be better for the database too?
tarquin-
One enhancement to RC I suggested a LONG time ago was a link at the foot of the page saying:
"Show next 100 | 250 | 500 changes"
Definitely. This is already in my TODO list. I would go so far to say that the current linkbar could be replaced with such a linkbar, and that the default number would only be set in the user preferences. Furthermore, it might be useful to store the timestamp of the last check in the user table, and to only show the changes since the last check by default -- that could substantially decrease database usage.
Regards,
Erik
Tomasz Wegrzanowski wrote:
On Sun, Jul 20, 2003 at 02:36:38PM +0200, Timwi wrote:
Tomasz Wegrzanowski wrote:
Wasting bytes where it's critical to make RC as small as possible for "complicated reasons" doesn't really convince me.
Could you please explain to me why you think it's critical to make RC as small as possible? Our biggest current problem is database performance, not bandwidth usage.
Because most people are using 56kbit modems and recent changes size is huge. It's not about server bandwidth at all.
A 56kbit modem can transmit 56 Kilobits per second! With TCP's ~3.8% overhead (let's exaggerate to 4% for simplicity), it means you can transmit 53.76 Kbit/sec = 6.72 KB/sec. The Recent Changes list as I see it now (last 50 changes) is 40.8 KB ... this would therefore, under perfect server performance, take almost exactly 6 seconds.
According to your own calculations, you can save up to 16.6% - that takes it down to very close to 5 seconds.
I'm not convinced this unnoticeable improvement is worth the hassle.
If you really want to make it as small as possible, then you should use gzip; once that is installed, using <em> vs. <i> won't make a difference.
It's not supported by too many browsers.
That doesn't matter. The server can detect when the browser accepts gzip (the Accept-Encoding HTTP header), so those that don't can still be served the pages non-gzipped. Besides, enough modern browsers support it.
Timwi
On Sun, Jul 20, 2003 at 06:35:28PM +0200, Timwi wrote:
Tomasz Wegrzanowski wrote:
On Sun, Jul 20, 2003 at 02:36:38PM +0200, Timwi wrote:
Tomasz Wegrzanowski wrote:
Wasting bytes where it's critical to make RC as small as possible for "complicated reasons" doesn't really convince me.
Could you please explain to me why you think it's critical to make RC as small as possible? Our biggest current problem is database performance, not bandwidth usage.
Because most people are using 56kbit modems and recent changes size is huge. It's not about server bandwidth at all.
A 56kbit modem can transmit 56 Kilobits per second! With TCP's ~3.8% overhead (let's exaggerate to 4% for simplicity), it means you can transmit 53.76 Kbit/sec = 6.72 KB/sec. The Recent Changes list as I see it now (last 50 changes) is 40.8 KB ... this would therefore, under perfect server performance, take almost exactly 6 seconds.
According to your own calculations, you can save up to 16.6% - that takes it down to very close to 5 seconds.
I'm not convinced this unnoticeable improvement is worth the hassle.
50 changes is next to useless. 100 is minimum that's any useful, and 250 or 500 is needed to really know what's going on.
250 changes is about 200 kB, or 29s, with 16.6% less to transfer, it's 24s. Sounds significant enough to me.
On Sun, 20 Jul 2003, Timwi wrote:
Tomasz Wegrzanowski wrote:
- using <i> not <em> 319 bytes saved
- using <b> not <strong> 1219 bytes saved
<em> and <strong> should be used, not <i> and <b>. The reasons are complicated and have to do with accessibility and text browsers. I don't really understand those reasons myself, but from my experience it seems to be consensus to use <em> and <strong>. I think <i> and <b> are even deprecated in the newest version of HTML, or something...
As far as I know, the reason is that <em> and <strong> specify the nature of the text, while <i> and <b> specify how it should be shown. In theory, the browser manufacturer or user could choose how he wants emphasized and strongly emphasized text to be shown.
I myself am a proponent of using <em> and <strong> rather than <i> and <b>, but in Wikipedia's case I think that the change would not be bad, might actually be good. The reason is that a user will write ''text'' to put a text in italics, not to give it light emphasis. Therefore, many of our ''text''s should not be <em>, but <cite>. Because this cannot be machine-handled anyway, we might as well stop pretending that we do so.
Andre Engels
Andre Engels wrote:
I myself am a proponent of using <em> and <strong> rather than <i> and <b>, but in Wikipedia's case I think that the change would not be bad, might actually be good. The reason is that a user will write ''text'' to put a text in italics
We're talking about the Recent Changes page though. ;-)
Timwi
On Tue, 22 Jul 2003, Timwi wrote:
Andre Engels wrote:
I myself am a proponent of using <em> and <strong> rather than <i> and <b>, but in Wikipedia's case I think that the change would not be bad, might actually be good. The reason is that a user will write ''text'' to put a text in italics
We're talking about the Recent Changes page though. ;-)
Oops, you're completely right of course. Still, I would say that in that case neither <em> nor <cite> captures the meaning very well. Do we want to keep it in italics at all? (*bends down quickly to avoid the blows*)
Andre Engels
On Tue, Jul 22, 2003 at 11:56:36AM +0200, Andre Engels wrote:
On Sun, 20 Jul 2003, Timwi wrote:
Tomasz Wegrzanowski wrote:
- using <i> not <em> 319 bytes saved
- using <b> not <strong> 1219 bytes saved
<em> and <strong> should be used, not <i> and <b>. The reasons are complicated and have to do with accessibility and text browsers. I don't really understand those reasons myself, but from my experience it seems to be consensus to use <em> and <strong>. I think <i> and <b> are even deprecated in the newest version of HTML, or something...
As far as I know, the reason is that <em> and <strong> specify the nature of the text, while <i> and <b> specify how it should be shown. In theory, the browser manufacturer or user could choose how he wants emphasized and strongly emphasized text to be shown.
I myself am a proponent of using <em> and <strong> rather than <i> and <b>, but in Wikipedia's case I think that the change would not be bad, might actually be good. The reason is that a user will write ''text'' to put a text in italics, not to give it light emphasis. Therefore, many of our ''text''s should not be <em>, but <cite>. Because this cannot be machine-handled anyway, we might as well stop pretending that we do so.
In Recent Changes italics is used for look only, not for emphasis, so you're right here.
wikitech-l@lists.wikimedia.org