On my favorite page,
there is a column for "depth", which is "a rough indicator of a
Wikipedia's quality, showing how frequently its articles are
updated". Tomorrow that column has been there for two full years,
with slight modifications of its formula.
I wrote a separate page about this,
(Note that this is completely unrelated to
There has been a lengthy discussion on the good and evil of trying
to estimate the quality of Wikipedia. But I think "depth" is the
only measurement that we can track over such a long time.
What other estimates of Wikipedia quality do we have, that can be
applied across language versions?
Erik Zachte's Wikipedia Statistics (last updated in May 2008)
presents a number of values that could be used to calculate a
quality estimate: number of articles, number of articles longer
than 0.5 kbytes or 2 kbytes (excluding some markup), mean edits
per article, mean bytes per article, number of edits (total), size
of database in bytes or words, number of internal or interwiki or
image or external links, number of redirects.
The editing depth is essentially the number of edits divided by
the number of articles (with two more factors in the formula).
This means edit wars and repeated use of the save button (instead
of preview) will give a higher depth. But if an article is made
perfect before it is saved, it gives a low depth. Thus, "depth"
measures the amount of editing activity within Wikipedia, as
opposed to the real quality of the resulting article.
This can be interesting in itself, but it might also be
interesting to estimate the amount of interconnectivity between
articles, where orphan articles or articles with just one link to
them are discounted as a kind of stub. How can such a measurement
be defined? If possible, by just combining the values we already
Earlier (2005-2006), the Swedish language Wikipedia created many
(mostly very short) articles, giving it a high ranking position in
the list of Wikipedias (by article count). But since these stubs
were created once and never touched again, this gave it a rather
low "depth" of 14 (in November 2007). During 2008, a number of
subprojects have gone back and made minor edits to many old
articles, so the "depth" has climbed to 23. This is not high, but
no longer among the very lowest. The increase by +64 percent is
however overshadowed by the Turkish Wikipedia's increase by
+125 percent (from 39 to 88).
Also, the French Wikipedia has increased its depth from 58 to 113,
while the German Wikipedia only moved from depth 68 to 80.
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
On Sun, Nov 30, 2008 at 1:11 AM, Brian Salter-Duke
> On Sun, 30 Nov 2008 00:50:08 +0100, Platonides <Platonides(a)gmail.com> wrote:
>> See https://bugzilla.wikimedia.org/show_bug.cgi?id=16491
>> Other parameters, like urlContents or signed wouldn't be used but at
>> least they can be disabled.
> I am afraid this is all beyond my expertise. Are you saying that there
> is no way Jmol can ever be used on WMF projects?
disabled and the extension gets a proper review (TM).
-----BEGIN PGP SIGNED MESSAGE-----
Gerard Meijssen wrote:
> Do not forget the als.wikipedia.org. It stands for Alsatian, but the als
> code is the Tosk language. The "gsw" code is the code that should have been
Adding it to my list, thanks!
> The nrm.wikipedia is also using a wrong code. nrm is Narom, a language from
> Malaysia. Nourmande is not recognised as a language, consequently there is
> no code available for it. I propose to use qaa for this.
I'd recommend roa-x-norman (generic Romance code with an extension tag)
rather than a private-use identifier.
Private-use identifiers are meant more for things like internal coding
within an application; such as where you'd want to indicate that a
document is not in a human language, or maybe a special mixed setting
that's specific to your organization's internal usage such as "not yet
inspected for coding" or something.
Per spec: "These identifiers may only be used locally, and may not be
used in interchange without a private agreement."
The purpose of using language codes on the web as we do is explicitly
for interchange with browser software, end users (as a navigation marker
in the URL), and content reusers.
- -- brion
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----
For quick background, it's pretty painful to rename a database in our
system, and we currently have a lot of bits in our configuration that
make automatic relationships between the database name and the domain
name, so this has delayed renaming of some language subdomains for a while.
It's not impossible to have them be different, just fairly awkward. :)
I'd like to get these done soon, but before we get started, I want to
make sure the queue is complete and ready to go. I've currently got four
language code renames that I see being requested...
== Aromanian ==
roa-rup.wikipedia.org -> rup.wikipedia.orgroa-rup.wiktionary.org -> rup.wiktionary.orghttps://bugzilla.wikimedia.org/show_bug.cgi?id=15988
ISO-639-2 code 'rup' was added in September 2005, and can supersede the
generic 'roa' code with 'rup' subtag.
This seems pretty uncontroversial. Existing domains and interwikis would
== Low German ==
nds.wikipedia.org -> nds-de.wikipedia.orgnds.wikibooks.org -> nds-de.wikibooks.orgnds.wikiquote.org -> nds-de.wikiquote.orgnds.wiktionary.org -> nds-de.wiktionary.orghttps://bugzilla.wikimedia.org/show_bug.cgi?id=8540
Reasoning: Disambiguation of country variants to create a portal site
(nds-nl.wikipedia.org exists as well).
The original request is almost 2 years old and didn't seem to have any
clear consensus; is this still desired?
Creating a portal site could cause difficulties with URL compatibility,
and I don't really recommend making this change without clear consensus
from the community there.
Note that nds.wikipedia.org includes a link on the front page to
== Moldovan ==
mo.wikipedia.org -> mo-cyrl.wikipedia.orgmo.wiktionary.org -> mo-cyrl.wiktionary.org
The official Moldovan language is the same as Romanian, using Latin
script and same orthography as on ro.wikipedia.org. Latin script was
officially adopted in 1989, replacing Soviet-era Cyrillic script; use of
Cyrillic script is still "official" in an unrecognized,
lightly-populated breakaway region but if people there use it, they
don't seem to edit Wikipedia...
The 'mo' language code has been officially deprecated from ISO 639-1/2
as of November 3, 2008; "Moldovan" in general use is just Romanian, and
is covered by ro.wikipedia.org.
mo.wikipedia.org has not actually been edited since December 2006.
mo.wiktionary.org seems to have.... 4 definition pages, which only
contain translations (no definitions!) Being inactive, these projects
could be closed in addition to / instead of the rename.
Use of tag 'mo-cyrl' would follow existing IANA-registered language
subtags such as 'bs-Cyrl' and 'bs-Latn' for Cyrillic and Latin script
Most likely, for compatibility we would redirect the existing 'mo' URLs
to the new 'mo-cyrl' ones, but they would now be visibly marked by the
subtag as being "yes we know, it's Cyrillic here". If we're going to
lock the site as well, adding a sitenotice pointing to the Romanian wiki
is probably wise.
== Belorusian "old orthography" ==
be-x-old.wikipedia.org -> be-tarask.wikipedia.orghttps://bugzilla.wikimedia.org/show_bug.cgi?id=9823
Some time ago we swapped around the Belorusian Wikipedia, moving the
previous version which was primarily using a non-official orthography,
from 'be' to 'be-x-old', and re-establishing be.wikipedia.org using the
official state orthography.
There was later a request to rename 'be-x-old' (using a non-standard
code) to 'be-tarask', a IANA-registered subtag which is rather more
descriptive. IMHO this change should not be terribly controversial -- if
we're not closing it, we may as well give it its official RFC
Old domain and interwikis would be redirected.
-- brion vibber (brion @ wikimedia.org)
I have an extension which parses the contents of a page to store the content
of certain embedded tags to the database, and I want the parsing to take
place after the pre-processing (comment removal, template expansion, etc.)
I also need the code to be compatible with MW1.6 as I am currently unable to
upgrade to PHP5 (hopefully soon...)
Here is the code I was using until recently (where $Text is the unmodified
// Create new Parser object to deal with some transformations that are
// required before saving.
$Parser = new Parser();
// Use the Parser object to strip out html comments, nowiki and pre tags
// and whatever other bits shouldn't make it through when rendering (so
// they don't affect saving).
$ParserOptions = new ParserOptions();
$StripState =& $Parser->mStripState;
$Parser->mOptions = $ParserOptions;
$TidyText = $Parser->strip($Text, $StripState, true);
// Then replace any variables, parser functions etc. so that 'hidden' tags
// (e.g. tags that are created by code, such as using the ExpandAfter
// extension) are expanded properly for saving.
$Parser->mFunctionHooks = $wgParser->mFunctionHooks;
$Parser->mTitle =& $wgParser->mTitle;
$TidyText = $Parser->replaceVariables($TidyText);
However, I was recently testing this on MW1.12, and this gives the following
Fatal error: Call to a member function matchStartToEnd() on a non-object
in Parser.php on line 2771
I fixed this by inserting the following two lines just before the second
$TidyText = ...
$Parser->mVariables =& $wgParser->mVariables;
$Parser->mOutput =& $wgParser->mOutput;
Now, it is clear to me that this is the wrong way of going about this - I
shouldn't be having to mess with the internals of the parser object in order
to just pre-process the text, as it will clearly break whenever the parser
object is updated!
Can someone tell me the correct forward-compatible way to pre-process
article text in this manner?
- Mark Clements (HappyDog).
I am trying to setup a local box to run it as a local Wikipedia server. It
would have Ubuntu 8.04 with PHP5 and Mysql 5. I would like to replicate all
en.wikipedia with the history and conversation. This would be part of an
academic research on the quality of Wikipedia as an information source.
I need your input on the hardware configuration please.
- RAM: 16 GB or 32 GB?
- CPU: double quad-core (Xeon 2 Ghz) or 1 quad-core or 1 double-core?
- I intented to have 4-5 TB of storage space.
How much space you think I need for temporary tables? Is 300 GB enough or I
should get more. The full database might be around 600 GB uncompressed.
One user only is going to run queries on this server and some queries might
span several entire tables.
Thanks in advance for all your suggestions.
You are receiving this email because your project has been selected to
take part in a new effort by the PHP QA Team to make sure that your
project still works with PHP versions to-be-released. With this we
hope to make sure that you are either aware of things that might
break, or to make sure we don't introduce any strange regressions.
With this effort we hope to build a better relationship between the
PHP Team and the major projects.
If you do not want to receive these heads-up emails, please reply to
me personally and I will remove you from the list; but, we hope that
you want to actively help us making PHP a better and more stable tool.
The fifth & final (for the second time now ;-)) release candidate of
PHP 5.2.7 was just released and can be downloaded from http://downloads.php.net/ilia/
. Please try this release candidate against your code and let us know
if any regressions should you find any. Since the last release a few
memory related issues were addressed, hopefuly we are finally at a
conclusion of the release cycle. The goal is to have 5.2.7 by end of
next week, so timely testing would be extremely helpful.
In case you think that other projects should also receive this kinds
of emails, please let me know privately, and I will add them to the
list of projects to contact.
5.2 Release Master