I have imported a daily dump of 121,000 pages from a wiki running 1.13.2. The importDump took 80 hours and when I ran rebuildAll, the resulting job queue was about 250,000 jobs. It took 10.5 hours to process them. After doing this, I downloaded a subsequent daily dump (from 6 days subsequent to the original daily dump) and imported it. That took about 30 minutes.
Now I want to update the internal data structures so my personal copy of the wiki is complete. I decided not to use rebuildAll again, since the daily dump has no images. Also, the wiki is postgres based, so rebuildtextindex doesn't work (it is only for MySQL). So, I thought I would use refreshLinks. However, that is running almost as slowly as the original importDump. After processing 2200 pages and interpolating forward, I estimate it will take about 40 hours to complete.
I have several questions:
+ Are there any settings that might make both importDump and refreshLinks do their work more quickly?
+ Is it necessary to run refreshLinks after an incremental importDump?
+ Are there other maintenance scripts I should run?
Thanks,
Dan Nessett
dan nessett wrote:
I have imported a daily dump of 121,000 pages from a wiki running 1.13.2. The importDump took 80 hours and when I ran rebuildAll, the resulting job queue was about 250,000 jobs. It took 10.5 hours to process them. After doing this, I downloaded a subsequent daily dump (from 6 days subsequent to the original daily dump) and imported it. That took about 30 minutes.
Now I want to update the internal data structures so my personal copy of the wiki is complete. I decided not to use rebuildAll again, since the daily dump has no images. Also, the wiki is postgres based, so rebuildtextindex doesn't work (it is only for MySQL). So, I thought I would use refreshLinks. However, that is running almost as slowly as the original importDump. After processing 2200 pages and interpolating forward, I estimate it will take about 40 hours to complete.
I have several questions:
Are there any settings that might make both importDump and refreshLinks do their work more quickly?
Is it necessary to run refreshLinks after an incremental importDump?
Are there other maintenance scripts I should run?
Thanks,
Dan Nessett
importDump should be already doing everything rebuildAll does, so you don't need to run it again.
Thanks Platonides. I have a follow up question.
Suppose I download and import a daily dump into a local test wiki. Then I modify some of the pages on the local copy. Finally, I subsequently download another daily dump and import it.
+ For pages I changed on my local copy and that are also marked as changed in the new daily dump, will importDump overwrite them or will it generate an error?
+ For pages I changed on my local copy that are not marked as changed in the new daily dump, will importDump leave them alone?
Thanks.
Dan Nessett
--- On Tue, 1/26/10, Platonides Platonides@gmail.com wrote:
From: Platonides Platonides@gmail.com Subject: Re: [Mediawiki-l] updating wiki internal data after importDump To: mediawiki-l@lists.wikimedia.org Date: Tuesday, January 26, 2010, 2:37 PM dan nessett wrote:
I have imported a daily dump of 121,000 pages from a
wiki running 1.13.2. The importDump took 80 hours and when I ran rebuildAll, the resulting job queue was about 250,000 jobs. It took 10.5 hours to process them. After doing this, I downloaded a subsequent daily dump (from 6 days subsequent to the original daily dump) and imported it. That took about 30 minutes.
Now I want to update the internal data structures so
my personal copy of the wiki is complete. I decided not to use rebuildAll again, since the daily dump has no images. Also, the wiki is postgres based, so rebuildtextindex doesn't work (it is only for MySQL). So, I thought I would use refreshLinks. However, that is running almost as slowly as the original importDump. After processing 2200 pages and interpolating forward, I estimate it will take about 40 hours to complete.
I have several questions:
- Are there any settings that might make both
importDump and refreshLinks do their work more quickly?
- Is it necessary to run refreshLinks after an
incremental importDump?
- Are there other maintenance scripts I should run?
Thanks,
Dan Nessett
importDump should be already doing everything rebuildAll does, so you don't need to run it again.
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
mediawiki-l@lists.wikimedia.org