Thanks a lot Gerard and Jesse!
Apologies for jumping to conclusions about Mediawiki not supporting Indian
languages.Will explore what you guys said.Thanks!
Quoting wikitech-l-request(a)lists.wikimedia.org:
Send Wikitech-l mailing list submissions to
wikitech-l(a)lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
or, via email, send a message with subject or body 'help' to
wikitech-l-request(a)lists.wikimedia.org
You can reach the person managing the list at
wikitech-l-owner(a)lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wikitech-l digest..."
Today's Topics:
1. Re: Indian Language support in Mediawiki?
(Jesse Martin (Pathoschild))
2. PHP 5.2.5RC1 testing (Ilia Alshanetsky)
3. Re: Indian Language support in Mediawiki? (GerardM)
4. Re: Looking for a system administrator familiar with the
Squid setup (John Q)
5. New version WikiXRay Python parser (Felipe Ortega)
6. Incremental history dumps (Lars Aronsson)
7. Re: Incremental history dumps (Gregory Maxwell)
8. RFC: Incremental history dumps (Platonides)
9. Re: [MediaWiki-CVS] SVN: [26830] trunk/phase3 (Simetrical)
----------------------------------------------------------------------
Message: 1
Date: Fri, 19 Oct 2007 09:54:02 -0400
From: "Jesse Martin (Pathoschild)" <pathoschild(a)gmail.com>
Subject: Re: [Wikitech-l] Indian Language support in Mediawiki?
To: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>
Message-ID:
<1913a8240710190654t2ff4beib02fa4343caf2279(a)mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Hello,
The Hindi and Kannada Wikipedias use JavaScript to do that; for
example, try typing in the box at
<http://hi.wikipedia.org/wiki/test?action=edit>.
The interface has also been translated into Hindi, Tamil, and Kannada.
You can set the interface language in the file LocalSettings.php
(globally) or through the wiki page "Special:Preferences" (per user).
Yours cordially,
Jesse Martin (Pathoschild)
------------------------------
Message: 2
Date: Thu, 18 Oct 2007 19:34:14 -0400
From: Ilia Alshanetsky <ilia(a)prohost.org>
Subject: [Wikitech-l] PHP 5.2.5RC1 testing
To: php-qa(a)lists.php.net
Cc: marc(a)phpmyadmin.net, serendipity(a)supergarv.de, php(a)fudforum.org,
contact-us(a)lists.geeklog.net, wikitech-l(a)lists.wikimedia.org,
bharat(a)menalto.com, jasper(a)album.co.nz, php-testing(a)phorum.org,
dev(a)sugarcrm.com, m(a)wordpress.org, Greg Beaver
<greg(a)chiaraquartet.net>et>, pear-qa(a)lists.php.net, matteo(a)beccati.com
Message-ID: <4D932D75-38FB-455A-864A-89F35D113917(a)prohost.org>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Hello!
You are receiving this email because your project has been selected
to take part in a new effort by the PHP QA Team to make sure that
your project still works with PHP versions to-be-released. With this
we hope to make sure that you are either aware of things that might
break, or to make sure we don't introduce any strange regressions.
With this effort we hope to build a better relationship between the
PHP Team and the major projects.
If you do not want to receive these heads-up emails, please reply to
me personally and I will remove you from the list; but, we hope that
you want to actively help us making PHP a better and more stable tool.
The first release candidate of PHP 5.2.5 was just released and can be
downloaded from
http://downloads.php.net/ilia/. Please try this
release candidate against your code and let us know if any
regressions should you find any. The goal is to have 5.2.5 out within
three weeks time, so timely testing would be extremely helpful.
In case you think that other projects should also receive this kinds
of emails, please let me know privately, and I will add them to the
list of projects to contact.
Best Regards,
Ilia Alshanetsky
5.2 Release Master
------------------------------
Message: 3
Date: Fri, 19 Oct 2007 16:58:30 +0200
From: GerardM <gerard.meijssen(a)gmail.com>
Subject: Re: [Wikitech-l] Indian Language support in Mediawiki?
To: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>
Message-ID:
<41a006820710190758i27879322kf366c08c269dc298(a)mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Hoi,
When you look at the localisation statistics, you understand what the
problem is. Much work is needed to complete the localisation for MediaWiki.
The fact that the Hindi or other Wikipedias have been localised to a larger
extend makes no difference for the localisation of MediaWiki,
http://www.mediawiki.org/wiki/Localisation_statistics
Thanks,
GerardM
On 10/19/07, Jesse Martin (Pathoschild) <pathoschild(a)gmail.com> wrote:
Hello,
The Hindi and Kannada Wikipedias use JavaScript to do that; for
example, try typing in the box at
<http://hi.wikipedia.org/wiki/test?action=edit>.
The interface has also been translated into Hindi, Tamil, and Kannada.
You can set the interface language in the file LocalSettings.php
(globally) or through the wiki page "Special:Preferences" (per user).
Yours cordially,
Jesse Martin (Pathoschild)
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
------------------------------
Message: 4
Date: Fri, 19 Oct 2007 09:49:09 -0700
From: John Q <johnq(a)wikia.com>
Subject: Re: [Wikitech-l] Looking for a system administrator familiar
with the Squid setup
To: wikitech-l(a)lists.wikimedia.org
Message-ID: <4718E005.1020405(a)wikia.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Travis,
We just went through a big evolution and found a few things that helped
us, so how about we take a look. Also, Emil made that patch that
prevents google analytics from busting the cache as much... that's in
the wikimedia svn. Can you ask Jack to come up to San Mateo and we'll
sit down with him and we'll also get Artur to connect with you on-line.
Thanks,
John Q.
-------- Original Message --------
Subject: [Wikitech-l] Looking for a system administrator familiar with
the Squid setup
Date: Thu, 18 Oct 2007 08:43:56 -0400
From: Travis Derouin <travis(a)wikihow.com>
Reply-To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Hey,
We've been running into some performance problems lately, and I'm stumped.
I'm not sure if we need more hardware or not.
I'd like to find a system administrator familiar with the Squid,
Apache/Mediawiki, MySQL setup to take a look at our system, and identify any
potential problems that we might have. We'd be comfortable with either a
one-time fee, or an hourly rate. We have a 6 server setup right now, with 1
Squid, 1 DB, 3 Apaches and 1 spare. If you or someone you know is
interested, send an e-mail directly to me: travis(a)wikihow.com with your
details and experience.
Sorry for the job-type like posting, but I'm out of ideas and need some
help.
Thanks!
Travis
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
------------------------------
Message: 5
Date: Fri, 19 Oct 2007 20:39:07 +0200 (CEST)
From: Felipe Ortega <glimmer_phoenix(a)yahoo.es>
Subject: [Wikitech-l] New version WikiXRay Python parser
To: wiki-research-l(a)lists.wikimedia.org,
wikitech-l(a)lists.wikimedia.org
Message-ID: <31191.9735.qm(a)web27503.mail.ukl.yahoo.com>
Content-Type: text/plain; charset=iso-8859-1
Hi.
A new version of the Python parser in WikiXRay, along with improved
documentation, can be found here:
http://meta.wikimedia.org/wiki/WikiXRay_Python_parser
Basically, I've developed two flavors: the standard for those people who want
an alternative to other tools for processing Wikipedia's dumps (including the
text table). The other version is for research purposes, It ignores the text
itself and extracts instead useful info on the fly.
Both flavors use extended inserts (you can tune the size and num. of rows)
and the --monitor mode calls a db access module to avoid timeout errors.
Further improvements (--skipnamespaces and --inject, this one should be very
easy) are on the way.
Best,
Felipe.
---------------------------------
S? un Mejor Amante del Cine
?Quieres saber c?mo? ?Deja que otras personas te ayuden!.
------------------------------
Message: 6
Date: Fri, 19 Oct 2007 21:10:36 +0200 (CEST)
From: Lars Aronsson <lars(a)aronsson.se>
Subject: [Wikitech-l] Incremental history dumps
To: wikitech-l(a)lists.wikimedia.org
Message-ID: <Pine.LNX.4.64.0710192057030.30952(a)localhost.localdomain>
Content-Type: TEXT/PLAIN; charset=US-ASCII
In the recent weeks I have been following the database dumps of
some languages of Wikipedia. I download and analyze a dump, do
various improvements, and then wait for the next dump to become
available for a new analysis. There are 2 or 3 weeks between each
dump. There appear to be two parallel dump processes continuously
running,
http://download.wikimedia.org/backup-index.html
What takes most time in each dump is the large file with complete
version history, pages-meta-history.xml.bz2 and
pages-meta-history.xml.7z
This is the largest file in compressed format, but since it
contains every version of every article it is also very highly
compressed, and expands to become enormous. I guess that very few
people find use for this file. In addition, only a very small
portion of its contents is changed between two dumps. So we spend
a lot of time and effort (and delay of other things) in order to
create very little for very few users.
I think that this dump should be made incremental. Every week,
only that week's additional versions need to be dumped. This can
then be added to the dump of the previous week, the week before
that, etc., which hasn't really changed. This way, the dump
process could be made much faster, and the two parallel dump
processes would complete the cycle in less time, so new dumps of
the same project could be made available more frequently.
Or is it already done this way, behind the scenes, only that it
isn't visible from the outside?
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik -
http://aronsson.se
------------------------------
Message: 7
Date: Fri, 19 Oct 2007 16:12:06 -0400
From: "Gregory Maxwell" <gmaxwell(a)gmail.com>
Subject: Re: [Wikitech-l] Incremental history dumps
To: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>
Message-ID:
<e692861c0710191312g6079c2d6md5cb326a69f84d47(a)mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
It already works that way on the backend, pretty much.
We can't make the old increments available forever beacuse of things
we ar obligated to discontinue distributing, so incrementals to the
users would not be so useful.
On 10/19/07, Lars Aronsson <lars(a)aronsson.se> wrote:
In the recent weeks I have been following the database dumps of
some languages of Wikipedia. I download and analyze a dump, do
various improvements, and then wait for the next dump to become
available for a new analysis. There are 2 or 3 weeks between each
dump. There appear to be two parallel dump processes continuously
running,
http://download.wikimedia.org/backup-index.html
What takes most time in each dump is the large file with complete
version history, pages-meta-history.xml.bz2 and
pages-meta-history.xml.7z
This is the largest file in compressed format, but since it
contains every version of every article it is also very highly
compressed, and expands to become enormous. I guess that very few
people find use for this file. In addition, only a very small
portion of its contents is changed between two dumps. So we spend
a lot of time and effort (and delay of other things) in order to
create very little for very few users.
I think that this dump should be made incremental. Every week,
only that week's additional versions need to be dumped. This can
then be added to the dump of the previous week, the week before
that, etc., which hasn't really changed. This way, the dump
process could be made much faster, and the two parallel dump
processes would complete the cycle in less time, so new dumps of
the same project could be made available more frequently.
Or is it already done this way, behind the scenes, only that it
isn't visible from the outside?
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik -
http://aronsson.se
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
------------------------------
Message: 8
Date: Fri, 19 Oct 2007 22:16:32 +0200
From: Platonides <Platonides(a)gmail.com>
Subject: [Wikitech-l] RFC: Incremental history dumps
To: wikitech-l(a)lists.wikimedia.org
Message-ID: <ffb3b0$7p9$1(a)ger.gmane.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Lars Aronsson wrote:
Or is it already done this way, behind the
scenes, only that it
isn't visible from the outside?
No.
AFAIK it is done as follows:
Precondition: The last full dump (if not present, treat as empty).
1- Take an snapshot of the wiki status (page table?) and create
stub-meta-history
2- Read stub-meta-history and fill the page content with the last dump
page contents. If a page content is not on previous dump, get it from
the external storage in a blocking way.
Result: A bzipped2 full history dump.
The bzip2 dump is then uncompressed and 7zipped.
If there's an error on a call to the external storage, the process can't
be resumed and the dump fails.
I had been recently thinking on it and think it could be done as this:
Precondition: The last full dump (if not present, treat as empty) and
its greatest revid.
1a- Take an snapshot of the wiki status (page table?) and create
stub-meta-history
1b- While reading the revisions, if revid is greater than the
lastdumpgreaterrevid (LDGR), add it to N files (a file per M revisions).
2-Run N processes grabbing these page contents. Store them on a
new-format dump (the external storage equivalent), one per revid list
file. If one fails, just rerun it.
3- Read stub-meta-history and fill the page content with the last dump
page contents. If a page text is not on previous dump, grab from the
list file if revid > LDGR else, get it from the external storage saving
it on a different file.
Revisions not present on last dump nor incremental dumps will occur on
restored pages, and still be able to block it, but being much less, it's
much more unlikely that they fail.
4-Save the new dump LDGR with the new bzipped dump.
Making available the M+1 incremental dumps, using the smaller
meta-stubs-history, last dump can be recreated using the previous one
(=less download size).
Wikimedia would still provide the full dumps, but you would only be need
ed the first time.
Comments?
------------------------------
Message: 9
Date: Fri, 19 Oct 2007 16:53:15 -0400
From: Simetrical <Simetrical+wikilist(a)gmail.com>
Subject: Re: [Wikitech-l] [MediaWiki-CVS] SVN: [26830] trunk/phase3
To: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>
Message-ID:
<7c2a12e20710191353n7b6d2bc4wd2524e19898a2f2c(a)mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On 10/19/07, Thomas Dalton <thomas.dalton(a)gmail.com> wrote:
The
bigger problem is your join condition. If a row matches
pl_namespace=rc_namespace and pl_title=rc_title, then it's joined to
*every row* of the page and redirect tables, because there are no
restrictions on them! The converse is true as well.
This is the part I don't understand. What do you mean by there being
no restrictions? Why does the pl_from=$id part not restrict it
appropriately?
You have to keep in mind that when you're doing simple joins, the
result set is equal to the Cartesian product (every possible
combination of rows from each table), filtered according to the
WHERE/ON conditions (which are equivalent). The pl_from=$id restricts
it so that all the rows in the result set will have one particular
pagelinks row, no other. The remaining conditions must put enough
restrictions on the recentchanges, page, *and* redirect tables to keep
the number of returned rows small enough to be reasonable.
The problem is that your conditions state that either the
recentchanges table must obey certain conditions, relating to the one
pagelinks row already selected, *or* the page and redirect and
recentchanges tables must obey certain (different) conditions. If a
recentchanges row obeys the first set of conditions (i.e., it
corresponds to the pagelinks row), there are no restrictions on what
page or redirect rows can be associated with it, and therefore *every*
page row and *every* redirect row is associated with it, and so is
*every* combination thereof.
This will not actually appear in the result set, because the GROUP BY
will condense rows with identical recentchanges rows. I'm not sure
exactly how GROUP BY works here as opposed to DISTINCT, say, given
that there are no grouping operators or anything: I hardly qualify as
an SQL expert. But I could tell from the EXPLAIN that the query was
seriously inefficient, and I noticed the deficiency in the join
condition that was prompting a Cartesian join of the last two tables
(after Xgc in #mysql prompted me to take a closer look at the query).
I put it through various tests before committing
it, and it seemed to
give the correct results (obviously, none of my tests revealed the
error with the cutoff - that's a problem with testing on a test
install, not a real world database). So is the query correct, just
inefficient, or were my tests insufficient to catch the mistakes?
It may be correct. I'm not sure, because on my PC (which is my test
server) it sent mysqld to 90%+ CPU usage for somewhere well over a
minute while copying to tmp table, so I got bored and killed the
thread. Whether it would have returned the correct results half an
hour from now is a somewhat academic question. :)
Generally speaking, it's handy to have a relatively realistic local
database. At the suggestion of Yurik, I use the Simple English
Wikipedia because it's not gigantic and it's not gibberish to me.
It's still not really ideal, because for instance the user table is
practically nonexistent, recentchanges is unrealistically small, etc.,
so I can use the toolserver if I still wasn't sure. That still
wouldn't be quite ideal, since it has a different version of MySQL
installed and so on, but it would be a pretty good approximation.
By the way, did you test your patch while logged in? It seems to
cause a fatal error before it even tries to execute the query.
Generally speaking, it's a bad idea to mix implicit join syntax (foo,
bar) with explicit join syntax (foo JOIN bar), like foo, bar, baz LEFT
OUTER JOIN quuz: it doesn't do what you expect.
Due to all these issues, I've reverted this, r26848.
------------------------------
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
End of Wikitech-l Digest, Vol 51, Issue 38
******************************************