This might be a good idea anyway, since I don't see the need to having a stats page that recalculates the number of articles for each and every request (esp. since the main reason for reworking the php wikiware was to maximize efficiency).
The site stats are calculated each time a page is /saved/, and just directly fetched when queried (except for page read counts, which naturally have to be updated on each page read). This isn't a performance issue at all.
The same with the "good" article count--it's only updated when a page is saved, when either (1) a new page is saved that qualifies as "countable", (2) a formerly uncountable page is edited to become countable, or (3) a formerly countable page is edited to become uncountable. So the {{NUMBEROFPAGES}} query on the front page is not a calculation, just a lookup (and a faster one than each of the links).
The criterion for "countable" is flexible; right now, a page is countable if it (1) does not belong to any namespace, and (2) contains a comma.
The details of how namespaces are handled is really more of a techie issue, but since you brought it up here, I'll detail it here. The full details, of course, can be found in the code.
"Namespace" is a separate field of the database from "Title"; in fact it's an integer (the exact text of each one depends on the language). Regular encyclopedia articles have a namespace of 0, "Talk" (or "Diskussion") pages have a namespace of 1, etc. Things like the search function simply add (namespace=0) to the query, and never bother looking at the title (which may contain colons)..
The actual names of the namespaces come into play when interpreting links. For example, when the software sees a link to [[User:X]], is grabs whatever appears before the first colon and looks to see if it is a known namespace or Interwiki. If it is, then the code looks up the article with a query along the lines of (namespace=2 and title='X'), and so on. If it's not a recognized prefix, then it uses a query like (namespace=0 and title='2001:_A_Space_Odyssey'). "Image:" is magic on other levels as well, but that's more detail for later.
0
I've been running two programmes recently, the stressbot and a new program, 'postit' that creates random articles
Each article contains a block of random alphabet soup, just to stress the system. The random characters are chosen from a range of characters, with a weighting on special chars such as '&', '[', '], '<', '>' etc. to stress the parser.
I've just tweaked the program to use characters in the range 32-254, rather than the previous ASCII-only random characters. I've noticed that this seems to speed article posting up a fair bit. The only reason I can think of is that the parser is taking significant CPU time for these pages, and the non-ASCII chars are causing some regexps to terminate early, saving time.
I'll do a couple more tests to try to confirm this.
Neil
2001: A Space Odyssey
I had not thought of this example, and I just now saw it and grasped for the first time why having colons in titles is a good thing.
I now side with Lee -- we should allow colons in titles, software wise, but have a "simple convention" as he puts it to avoid them whenever possible. But there are cases -- movie titles, book titles, and the like -- where they are indispensible.
If a few articles have titles that conflict with some future desired namespace, we can easily fix those few at that time.
--Jimbo
wikipedia-l@lists.wikimedia.org