It looks to me like there are a large number (as many as 1 million)
redirects missing from the redirect.sql file.
My script extracts redirects from the redirect.sql file and the page id's
using the page.sql file. Most of these pages can be resolved (about 1
million). However, when I scan the page.sql file for page names which are
redirects which were never resolved to any relation in the
redirect.sqlfile, there is about 1 million more.
Here are some examples (the ones on the left are missing from redirect.sql)
which were derived from 20070908 but I believe the problem is not limited to
this date:
Alstrom's syndrome -> Alstrom syndrome
Tito's Handmade Vodka -> Tito's Vodka
Titov_Drvar -> Drvar
Another experiment which seems to confirm this is that I can extract
2.4million redirects from the
page-articles.xml file, which is approximately the number of redirects I get
from redirect.sql + the number which seem missing according to page.sql.
Am I misunderstanding something?
A related question is why the redirect.sql file has the destination link as
a string and not as a page id? The category-links.sql file does this also.
Is this just for readability, because it takes more effort to construct
linked databases.
I hope I have posted this in the right place.
thanks!!
John
Hi @all,
is there a hook, which gets active every time a article is called?
I want to use it to check the wikitext of every article for a special syntax.
Every time I click a link, even before the page is diplayed...
Any idea?
greets...magggus
An automated run of parserTests.php showed the following failures:
This is MediaWiki version 1.12alpha (r26900).
Reading tests from "maintenance/parserTests.txt"...
Reading tests from "extensions/Cite/citeParserTests.txt"...
Reading tests from "extensions/Poem/poemParserTests.txt"...
Reading tests from "extensions/LabeledSectionTransclusion/lstParserTests.txt"...
17 still FAILING test(s) :(
* URL-encoding in URL functions (single parameter) [Has never passed]
* URL-encoding in URL functions (multiple parameters) [Has never passed]
* Table security: embedded pipes (http://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-April/022293.html) [Has never passed]
* Link containing double-single-quotes '' (bug 4598) [Has never passed]
* message transform: <noinclude> in transcluded template (bug 4926) [Has never passed]
* message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed]
* BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed]
* HTML bullet list, unclosed tags (bug 5497) [Has never passed]
* HTML ordered list, unclosed tags (bug 5497) [Has never passed]
* HTML nested bullet list, open tags (bug 5497) [Has never passed]
* HTML nested ordered list, open tags (bug 5497) [Has never passed]
* Inline HTML vs wiki block nesting [Has never passed]
* Mixing markup for italics and bold [Has never passed]
* dt/dd/dl test [Has never passed]
* Images with the "|" character in the comment [Has never passed]
* Parents of subpages, two levels up, without trailing slash or name. [Has never passed]
* Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]
Passed 527 of 544 tests (96.88%)... 17 tests failed!
Thanks a lot Gerard and Jesse!
Apologies for jumping to conclusions about Mediawiki not supporting Indian
languages.Will explore what you guys said.Thanks!
Quoting wikitech-l-request(a)lists.wikimedia.org:
> Send Wikitech-l mailing list submissions to
> wikitech-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.wikimedia.org/mailman/listinfo/wikitech-l
> or, via email, send a message with subject or body 'help' to
> wikitech-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wikitech-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikitech-l digest..."
>
>
> Today's Topics:
>
> 1. Re: Indian Language support in Mediawiki?
> (Jesse Martin (Pathoschild))
> 2. PHP 5.2.5RC1 testing (Ilia Alshanetsky)
> 3. Re: Indian Language support in Mediawiki? (GerardM)
> 4. Re: Looking for a system administrator familiar with the
> Squid setup (John Q)
> 5. New version WikiXRay Python parser (Felipe Ortega)
> 6. Incremental history dumps (Lars Aronsson)
> 7. Re: Incremental history dumps (Gregory Maxwell)
> 8. RFC: Incremental history dumps (Platonides)
> 9. Re: [MediaWiki-CVS] SVN: [26830] trunk/phase3 (Simetrical)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 19 Oct 2007 09:54:02 -0400
> From: "Jesse Martin (Pathoschild)" <pathoschild(a)gmail.com>
> Subject: Re: [Wikitech-l] Indian Language support in Mediawiki?
> To: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>
> Message-ID:
> <1913a8240710190654t2ff4beib02fa4343caf2279(a)mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hello,
>
> The Hindi and Kannada Wikipedias use JavaScript to do that; for
> example, try typing in the box at
> <http://hi.wikipedia.org/wiki/test?action=edit>.
>
> The interface has also been translated into Hindi, Tamil, and Kannada.
> You can set the interface language in the file LocalSettings.php
> (globally) or through the wiki page "Special:Preferences" (per user).
>
> Yours cordially,
> Jesse Martin (Pathoschild)
>
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 18 Oct 2007 19:34:14 -0400
> From: Ilia Alshanetsky <ilia(a)prohost.org>
> Subject: [Wikitech-l] PHP 5.2.5RC1 testing
> To: php-qa(a)lists.php.net
> Cc: marc(a)phpmyadmin.net, serendipity(a)supergarv.de, php(a)fudforum.org,
> contact-us(a)lists.geeklog.net, wikitech-l(a)lists.wikimedia.org,
> bharat(a)menalto.com, jasper(a)album.co.nz, php-testing(a)phorum.org,
> dev(a)sugarcrm.com, m(a)wordpress.org, Greg Beaver
> <greg(a)chiaraquartet.net>, pear-qa(a)lists.php.net, matteo(a)beccati.com
> Message-ID: <4D932D75-38FB-455A-864A-89F35D113917(a)prohost.org>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
>
> Hello!
>
> You are receiving this email because your project has been selected
> to take part in a new effort by the PHP QA Team to make sure that
> your project still works with PHP versions to-be-released. With this
> we hope to make sure that you are either aware of things that might
> break, or to make sure we don't introduce any strange regressions.
> With this effort we hope to build a better relationship between the
> PHP Team and the major projects.
>
> If you do not want to receive these heads-up emails, please reply to
> me personally and I will remove you from the list; but, we hope that
> you want to actively help us making PHP a better and more stable tool.
>
> The first release candidate of PHP 5.2.5 was just released and can be
> downloaded from http://downloads.php.net/ilia/. Please try this
> release candidate against your code and let us know if any
> regressions should you find any. The goal is to have 5.2.5 out within
> three weeks time, so timely testing would be extremely helpful.
>
> In case you think that other projects should also receive this kinds
> of emails, please let me know privately, and I will add them to the
> list of projects to contact.
>
> Best Regards,
>
> Ilia Alshanetsky
> 5.2 Release Master
>
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 19 Oct 2007 16:58:30 +0200
> From: GerardM <gerard.meijssen(a)gmail.com>
> Subject: Re: [Wikitech-l] Indian Language support in Mediawiki?
> To: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>
> Message-ID:
> <41a006820710190758i27879322kf366c08c269dc298(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Hoi,
> When you look at the localisation statistics, you understand what the
> problem is. Much work is needed to complete the localisation for MediaWiki.
> The fact that the Hindi or other Wikipedias have been localised to a larger
> extend makes no difference for the localisation of MediaWiki,
>
> http://www.mediawiki.org/wiki/Localisation_statistics
>
> Thanks,
> GerardM
>
> On 10/19/07, Jesse Martin (Pathoschild) <pathoschild(a)gmail.com> wrote:
> >
> > Hello,
> >
> > The Hindi and Kannada Wikipedias use JavaScript to do that; for
> > example, try typing in the box at
> > <http://hi.wikipedia.org/wiki/test?action=edit>.
> >
> > The interface has also been translated into Hindi, Tamil, and Kannada.
> > You can set the interface language in the file LocalSettings.php
> > (globally) or through the wiki page "Special:Preferences" (per user).
> >
> > Yours cordially,
> > Jesse Martin (Pathoschild)
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l(a)lists.wikimedia.org
> > http://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 19 Oct 2007 09:49:09 -0700
> From: John Q <johnq(a)wikia.com>
> Subject: Re: [Wikitech-l] Looking for a system administrator familiar
> with the Squid setup
> To: wikitech-l(a)lists.wikimedia.org
> Message-ID: <4718E005.1020405(a)wikia.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Travis,
>
> We just went through a big evolution and found a few things that helped
> us, so how about we take a look. Also, Emil made that patch that
> prevents google analytics from busting the cache as much... that's in
> the wikimedia svn. Can you ask Jack to come up to San Mateo and we'll
> sit down with him and we'll also get Artur to connect with you on-line.
>
> Thanks,
> John Q.
>
>
> -------- Original Message --------
> Subject: [Wikitech-l] Looking for a system administrator familiar with
> the Squid setup
> Date: Thu, 18 Oct 2007 08:43:56 -0400
> From: Travis Derouin <travis(a)wikihow.com>
> Reply-To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
>
> Hey,
>
> We've been running into some performance problems lately, and I'm stumped.
> I'm not sure if we need more hardware or not.
>
> I'd like to find a system administrator familiar with the Squid,
> Apache/Mediawiki, MySQL setup to take a look at our system, and identify any
> potential problems that we might have. We'd be comfortable with either a
> one-time fee, or an hourly rate. We have a 6 server setup right now, with 1
> Squid, 1 DB, 3 Apaches and 1 spare. If you or someone you know is
> interested, send an e-mail directly to me: travis(a)wikihow.com with your
> details and experience.
>
> Sorry for the job-type like posting, but I'm out of ideas and need some
> help.
>
> Thanks!
> Travis
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
> http://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 19 Oct 2007 20:39:07 +0200 (CEST)
> From: Felipe Ortega <glimmer_phoenix(a)yahoo.es>
> Subject: [Wikitech-l] New version WikiXRay Python parser
> To: wiki-research-l(a)lists.wikimedia.org,
> wikitech-l(a)lists.wikimedia.org
> Message-ID: <31191.9735.qm(a)web27503.mail.ukl.yahoo.com>
> Content-Type: text/plain; charset=iso-8859-1
>
> Hi.
>
> A new version of the Python parser in WikiXRay, along with improved
> documentation, can be found here:
>
> http://meta.wikimedia.org/wiki/WikiXRay_Python_parser
>
> Basically, I've developed two flavors: the standard for those people who want
> an alternative to other tools for processing Wikipedia's dumps (including the
> text table). The other version is for research purposes, It ignores the text
> itself and extracts instead useful info on the fly.
>
> Both flavors use extended inserts (you can tune the size and num. of rows)
> and the --monitor mode calls a db access module to avoid timeout errors.
>
> Further improvements (--skipnamespaces and --inject, this one should be very
> easy) are on the way.
>
> Best,
>
> Felipe.
>
>
> ---------------------------------
>
> S? un Mejor Amante del Cine
> ?Quieres saber c?mo? ?Deja que otras personas te ayuden!.
>
>
> ------------------------------
>
> Message: 6
> Date: Fri, 19 Oct 2007 21:10:36 +0200 (CEST)
> From: Lars Aronsson <lars(a)aronsson.se>
> Subject: [Wikitech-l] Incremental history dumps
> To: wikitech-l(a)lists.wikimedia.org
> Message-ID: <Pine.LNX.4.64.0710192057030.30952(a)localhost.localdomain>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
>
> In the recent weeks I have been following the database dumps of
> some languages of Wikipedia. I download and analyze a dump, do
> various improvements, and then wait for the next dump to become
> available for a new analysis. There are 2 or 3 weeks between each
> dump. There appear to be two parallel dump processes continuously
> running, http://download.wikimedia.org/backup-index.html
>
> What takes most time in each dump is the large file with complete
> version history, pages-meta-history.xml.bz2 and
> pages-meta-history.xml.7z
>
> This is the largest file in compressed format, but since it
> contains every version of every article it is also very highly
> compressed, and expands to become enormous. I guess that very few
> people find use for this file. In addition, only a very small
> portion of its contents is changed between two dumps. So we spend
> a lot of time and effort (and delay of other things) in order to
> create very little for very few users.
>
> I think that this dump should be made incremental. Every week,
> only that week's additional versions need to be dumped. This can
> then be added to the dump of the previous week, the week before
> that, etc., which hasn't really changed. This way, the dump
> process could be made much faster, and the two parallel dump
> processes would complete the cycle in less time, so new dumps of
> the same project could be made available more frequently.
>
> Or is it already done this way, behind the scenes, only that it
> isn't visible from the outside?
>
>
> --
> Lars Aronsson (lars(a)aronsson.se)
> Aronsson Datateknik - http://aronsson.se
>
>
>
> ------------------------------
>
> Message: 7
> Date: Fri, 19 Oct 2007 16:12:06 -0400
> From: "Gregory Maxwell" <gmaxwell(a)gmail.com>
> Subject: Re: [Wikitech-l] Incremental history dumps
> To: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>
> Message-ID:
> <e692861c0710191312g6079c2d6md5cb326a69f84d47(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> It already works that way on the backend, pretty much.
>
> We can't make the old increments available forever beacuse of things
> we ar obligated to discontinue distributing, so incrementals to the
> users would not be so useful.
>
>
> On 10/19/07, Lars Aronsson <lars(a)aronsson.se> wrote:
> >
> > In the recent weeks I have been following the database dumps of
> > some languages of Wikipedia. I download and analyze a dump, do
> > various improvements, and then wait for the next dump to become
> > available for a new analysis. There are 2 or 3 weeks between each
> > dump. There appear to be two parallel dump processes continuously
> > running, http://download.wikimedia.org/backup-index.html
> >
> > What takes most time in each dump is the large file with complete
> > version history, pages-meta-history.xml.bz2 and
> > pages-meta-history.xml.7z
> >
> > This is the largest file in compressed format, but since it
> > contains every version of every article it is also very highly
> > compressed, and expands to become enormous. I guess that very few
> > people find use for this file. In addition, only a very small
> > portion of its contents is changed between two dumps. So we spend
> > a lot of time and effort (and delay of other things) in order to
> > create very little for very few users.
> >
> > I think that this dump should be made incremental. Every week,
> > only that week's additional versions need to be dumped. This can
> > then be added to the dump of the previous week, the week before
> > that, etc., which hasn't really changed. This way, the dump
> > process could be made much faster, and the two parallel dump
> > processes would complete the cycle in less time, so new dumps of
> > the same project could be made available more frequently.
> >
> > Or is it already done this way, behind the scenes, only that it
> > isn't visible from the outside?
> >
> >
> > --
> > Lars Aronsson (lars(a)aronsson.se)
> > Aronsson Datateknik - http://aronsson.se
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l(a)lists.wikimedia.org
> > http://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
>
>
> ------------------------------
>
> Message: 8
> Date: Fri, 19 Oct 2007 22:16:32 +0200
> From: Platonides <Platonides(a)gmail.com>
> Subject: [Wikitech-l] RFC: Incremental history dumps
> To: wikitech-l(a)lists.wikimedia.org
> Message-ID: <ffb3b0$7p9$1(a)ger.gmane.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Lars Aronsson wrote:
> > Or is it already done this way, behind the scenes, only that it
> > isn't visible from the outside?
>
> No.
>
> AFAIK it is done as follows:
>
> Precondition: The last full dump (if not present, treat as empty).
> 1- Take an snapshot of the wiki status (page table?) and create
> stub-meta-history
> 2- Read stub-meta-history and fill the page content with the last dump
> page contents. If a page content is not on previous dump, get it from
> the external storage in a blocking way.
>
> Result: A bzipped2 full history dump.
> The bzip2 dump is then uncompressed and 7zipped.
>
> If there's an error on a call to the external storage, the process can't
> be resumed and the dump fails.
>
>
> I had been recently thinking on it and think it could be done as this:
> Precondition: The last full dump (if not present, treat as empty) and
> its greatest revid.
> 1a- Take an snapshot of the wiki status (page table?) and create
> stub-meta-history
> 1b- While reading the revisions, if revid is greater than the
> lastdumpgreaterrevid (LDGR), add it to N files (a file per M revisions).
> 2-Run N processes grabbing these page contents. Store them on a
> new-format dump (the external storage equivalent), one per revid list
> file. If one fails, just rerun it.
>
> 3- Read stub-meta-history and fill the page content with the last dump
> page contents. If a page text is not on previous dump, grab from the
> list file if revid > LDGR else, get it from the external storage saving
> it on a different file.
>
> Revisions not present on last dump nor incremental dumps will occur on
> restored pages, and still be able to block it, but being much less, it's
> much more unlikely that they fail.
>
> 4-Save the new dump LDGR with the new bzipped dump.
>
> Making available the M+1 incremental dumps, using the smaller
> meta-stubs-history, last dump can be recreated using the previous one
> (=less download size).
>
> Wikimedia would still provide the full dumps, but you would only be need
> ed the first time.
>
> Comments?
>
>
>
>
> ------------------------------
>
> Message: 9
> Date: Fri, 19 Oct 2007 16:53:15 -0400
> From: Simetrical <Simetrical+wikilist(a)gmail.com>
> Subject: Re: [Wikitech-l] [MediaWiki-CVS] SVN: [26830] trunk/phase3
> To: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>
> Message-ID:
> <7c2a12e20710191353n7b6d2bc4wd2524e19898a2f2c(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On 10/19/07, Thomas Dalton <thomas.dalton(a)gmail.com> wrote:
> > > The bigger problem is your join condition. If a row matches
> > > pl_namespace=rc_namespace and pl_title=rc_title, then it's joined to
> > > *every row* of the page and redirect tables, because there are no
> > > restrictions on them! The converse is true as well.
> >
> > This is the part I don't understand. What do you mean by there being
> > no restrictions? Why does the pl_from=$id part not restrict it
> > appropriately?
>
> You have to keep in mind that when you're doing simple joins, the
> result set is equal to the Cartesian product (every possible
> combination of rows from each table), filtered according to the
> WHERE/ON conditions (which are equivalent). The pl_from=$id restricts
> it so that all the rows in the result set will have one particular
> pagelinks row, no other. The remaining conditions must put enough
> restrictions on the recentchanges, page, *and* redirect tables to keep
> the number of returned rows small enough to be reasonable.
>
> The problem is that your conditions state that either the
> recentchanges table must obey certain conditions, relating to the one
> pagelinks row already selected, *or* the page and redirect and
> recentchanges tables must obey certain (different) conditions. If a
> recentchanges row obeys the first set of conditions (i.e., it
> corresponds to the pagelinks row), there are no restrictions on what
> page or redirect rows can be associated with it, and therefore *every*
> page row and *every* redirect row is associated with it, and so is
> *every* combination thereof.
>
> This will not actually appear in the result set, because the GROUP BY
> will condense rows with identical recentchanges rows. I'm not sure
> exactly how GROUP BY works here as opposed to DISTINCT, say, given
> that there are no grouping operators or anything: I hardly qualify as
> an SQL expert. But I could tell from the EXPLAIN that the query was
> seriously inefficient, and I noticed the deficiency in the join
> condition that was prompting a Cartesian join of the last two tables
> (after Xgc in #mysql prompted me to take a closer look at the query).
>
> > I put it through various tests before committing it, and it seemed to
> > give the correct results (obviously, none of my tests revealed the
> > error with the cutoff - that's a problem with testing on a test
> > install, not a real world database). So is the query correct, just
> > inefficient, or were my tests insufficient to catch the mistakes?
>
> It may be correct. I'm not sure, because on my PC (which is my test
> server) it sent mysqld to 90%+ CPU usage for somewhere well over a
> minute while copying to tmp table, so I got bored and killed the
> thread. Whether it would have returned the correct results half an
> hour from now is a somewhat academic question. :)
>
> Generally speaking, it's handy to have a relatively realistic local
> database. At the suggestion of Yurik, I use the Simple English
> Wikipedia because it's not gigantic and it's not gibberish to me.
> It's still not really ideal, because for instance the user table is
> practically nonexistent, recentchanges is unrealistically small, etc.,
> so I can use the toolserver if I still wasn't sure. That still
> wouldn't be quite ideal, since it has a different version of MySQL
> installed and so on, but it would be a pretty good approximation.
>
> By the way, did you test your patch while logged in? It seems to
> cause a fatal error before it even tries to execute the query.
> Generally speaking, it's a bad idea to mix implicit join syntax (foo,
> bar) with explicit join syntax (foo JOIN bar), like foo, bar, baz LEFT
> OUTER JOIN quuz: it doesn't do what you expect.
>
> Due to all these issues, I've reverted this, r26848.
>
>
>
> ------------------------------
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
> http://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
> End of Wikitech-l Digest, Vol 51, Issue 38
> ******************************************
>
An automated run of parserTests.php showed the following failures:
This is MediaWiki version 1.12alpha (r26873).
Reading tests from "maintenance/parserTests.txt"...
Reading tests from "extensions/Cite/citeParserTests.txt"...
Reading tests from "extensions/Poem/poemParserTests.txt"...
Reading tests from "extensions/LabeledSectionTransclusion/lstParserTests.txt"...
17 still FAILING test(s) :(
* URL-encoding in URL functions (single parameter) [Has never passed]
* URL-encoding in URL functions (multiple parameters) [Has never passed]
* Table security: embedded pipes (http://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-April/022293.html) [Has never passed]
* Link containing double-single-quotes '' (bug 4598) [Has never passed]
* message transform: <noinclude> in transcluded template (bug 4926) [Has never passed]
* message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed]
* BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed]
* HTML bullet list, unclosed tags (bug 5497) [Has never passed]
* HTML ordered list, unclosed tags (bug 5497) [Has never passed]
* HTML nested bullet list, open tags (bug 5497) [Has never passed]
* HTML nested ordered list, open tags (bug 5497) [Has never passed]
* Inline HTML vs wiki block nesting [Has never passed]
* Mixing markup for italics and bold [Has never passed]
* dt/dd/dl test [Has never passed]
* Images with the "|" character in the comment [Has never passed]
* Parents of subpages, two levels up, without trailing slash or name. [Has never passed]
* Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]
Passed 527 of 544 tests (96.88%)... 17 tests failed!
On 10/20/07, William Pietri <william(a)scissor.com> wrote:
> RLS wrote:
> > Can anyone shed some light on why the en.wp recent changes RSS feed is
> > only updating once every 20 minutes or longer? It used to be as often
> > as every 2 or 3 minutes.
> >
> > I use Lupin's Anti-Vandal Tool to keep an eye on the recent change
> > when I feel like vandal patrolling. It relies on the RSS recent
> > changes feed (since it includes diffs as part of the feed), but it's
> > practically useless if it only gets two or three updates an hour.
>
>
> Hmmm... I have some code in Wikiticker that would make it relatively
> easy for me to build a basic RSS feed for you:
>
> http://dev.scissor.com/wikiticker/
>
> But getting the diffs for every change would be enough hits on the
> server that I always assumed it was, well, gauche. So I never tried.
>
>
> Looking at the feed, though, for me it was updating every 5-20
> seconds.[1] So perhaps it was a transient problem?
Heh, I found the problem. It's caused by the server-side caching of
the RSS feed for Special:Recentchanges. If you have Lupin's
Anti-Vandal Tool running in one window, and it gets "stuck", visiting
http://en.wikipedia.org/w/index.php?title=Special:Recentchanges&feed=rss
and adding &action=purge will immediately cause the tool to update
correctly on its next "pull".
Any possibility of disabling caching on the RSS feed? Is it even
possible to disable server-side caching for one particular page or
page format?
I'd rather not lose the benefit of server-side caching in general by
disabling it in my preferences (not to mention that everyone who uses
Lupin's AVT would have to do the same to guarantee that AVT would work
correctly for them), and I can't edit the script to add the
&action=purge as it's in someone else's userspace.
--en:Darkwind
(cc: wikitech-l)
An automated run of parserTests.php showed the following failures:
This is MediaWiki version 1.12alpha (r26861).
Reading tests from "maintenance/parserTests.txt"...
Reading tests from "extensions/Cite/citeParserTests.txt"...
Reading tests from "extensions/Poem/poemParserTests.txt"...
Reading tests from "extensions/LabeledSectionTransclusion/lstParserTests.txt"...
17 still FAILING test(s) :(
* URL-encoding in URL functions (single parameter) [Has never passed]
* URL-encoding in URL functions (multiple parameters) [Has never passed]
* Table security: embedded pipes (http://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-April/022293.html) [Has never passed]
* Link containing double-single-quotes '' (bug 4598) [Has never passed]
* message transform: <noinclude> in transcluded template (bug 4926) [Has never passed]
* message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed]
* BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed]
* HTML bullet list, unclosed tags (bug 5497) [Has never passed]
* HTML ordered list, unclosed tags (bug 5497) [Has never passed]
* HTML nested bullet list, open tags (bug 5497) [Has never passed]
* HTML nested ordered list, open tags (bug 5497) [Has never passed]
* Inline HTML vs wiki block nesting [Has never passed]
* Mixing markup for italics and bold [Has never passed]
* dt/dd/dl test [Has never passed]
* Images with the "|" character in the comment [Has never passed]
* Parents of subpages, two levels up, without trailing slash or name. [Has never passed]
* Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]
Passed 527 of 544 tests (96.88%)... 17 tests failed!
Hoi,
I just found that the localisation of the Novial language has happened on
the Novial Wikipedia. As this language is only "enabled" on this project,
the word done to localise is limited to this project.
Would someone be so kind and allow for Novial to be a MediaWiki supported
language and, would someone be so kind to export the localisation data and
move it into BetaWiki ?
Thanks,
GerardM
In the recent weeks I have been following the database dumps of
some languages of Wikipedia. I download and analyze a dump, do
various improvements, and then wait for the next dump to become
available for a new analysis. There are 2 or 3 weeks between each
dump. There appear to be two parallel dump processes continuously
running, http://download.wikimedia.org/backup-index.html
What takes most time in each dump is the large file with complete
version history, pages-meta-history.xml.bz2 and
pages-meta-history.xml.7z
This is the largest file in compressed format, but since it
contains every version of every article it is also very highly
compressed, and expands to become enormous. I guess that very few
people find use for this file. In addition, only a very small
portion of its contents is changed between two dumps. So we spend
a lot of time and effort (and delay of other things) in order to
create very little for very few users.
I think that this dump should be made incremental. Every week,
only that week's additional versions need to be dumped. This can
then be added to the dump of the previous week, the week before
that, etc., which hasn't really changed. This way, the dump
process could be made much faster, and the two parallel dump
processes would complete the cycle in less time, so new dumps of
the same project could be made available more frequently.
Or is it already done this way, behind the scenes, only that it
isn't visible from the outside?
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
Hi @all...
the extension I actually create, tries to call a function by using the
ArticleSaveComplete-Hook:
function ArticleReturn(&$text) {
global $wgOut, $wgUser, $wgRequest, $wgTitle, $wgParser;
$articleText = $text->getContent();
$referer = $_SERVER['HTTP_REFERER'];
if (preg_match('/<section begin=/', $articleText)) {
header("Location: $referer");
} else {
return true;
}
}
Unfortunately it doesn't work.
However, if I use the ArticleSave-Hook in the same case it does.
But the changes done, aren't saved of cause.
Anybody who can tell me the problem?
greets, demagggus