After running importDump.php to completion, then running initStats.php to update the statistics for the newly imported database, I get the following output:
[root@gadugi /]# [root@gadugi /]# [root@gadugi /]# cd /wikidump/en [root@gadugi en]# [root@gadugi en]# php maintenance/initStats.php Refresh Site Statistics
Counting total edits...4798436 Counting number of articles...1980988 Counting total pages...4797798 Counting number of users...1 Counting number of admins...1 Counting number of images...1092505 Counting total page views...67977
Updating site statistics...done.
[root@gadugi en]#
If I subsequently invoke rebuildall.php against this database, it reports double the number of pages, runs up to about 277000 articles, then the php process goes to sleep and never wakes up. rebuildall.php reports the following wrong article count:
[root@gadugi en]# [root@gadugi en]# php maintenance/rebuildall.php ** Rebuilding fulltext search index (if you abort this will break searching; run this script again to fix): Rebuilding index fields for 9648325 pages... 1500
Jeff
On 30/03/07, Jeffrey V. Merkey jmerkey@wolfmountaingroup.com wrote:
php process goes to sleep and never wakes up. rebuildall.php reports the following wrong article count:
Be careful, there are two totals at work here:
* "Article" count, which is all pages in a content namespace (main namespace or $wgContentNamespaces) which aren't redirects, and which contain at least one internal link - this is what shows up as "good pages", "content pages" etc. * "Page" count, which is all pages in all namespaces, including redirects.
rebuildTextIndex.php is referring to all pages, since that's what it reindexes; initStats.php counts *both* and stores them in different columns in site_stats.
The totals might still be *wrong*, of course, but the definitions often confuse people.
Rob Church
Rob Church wrote:
On 30/03/07, Jeffrey V. Merkey jmerkey@wolfmountaingroup.com wrote:
php process goes to sleep and never wakes up. rebuildall.php reports the following wrong article count:
Be careful, there are two totals at work here:
- "Article" count, which is all pages in a content namespace (main
namespace or $wgContentNamespaces) which aren't redirects, and which contain at least one internal link - this is what shows up as "good pages", "content pages" etc.
- "Page" count, which is all pages in all namespaces, including redirects.
rebuildTextIndex.php is referring to all pages, since that's what it reindexes; initStats.php counts *both* and stores them in different columns in site_stats.
The totals might still be *wrong*, of course, but the definitions often confuse people.
Rob Church
This should be documented to avoid confusing people in the future. I will add this description to the Data Dumps page. I restarted rebuildall.php again after it crashed and it appears the lockup was caused by a low memory condition -- it was still running this morning (though quite SLLOOOOOOWWWWWWLLLYYYY ....) . I am re-running it on a system more free memory. This other problem appears to be PHP related. I wanted to check because off counters could cause aberrant program behavior.
Jeff
On 30/03/07, Jeff V. Merkey jmerkey@wolfmountaingroup.com wrote:
This should be documented to avoid confusing people in the future.
I feel obliged to point out that this distinction is plastered all over the mailing list and bug tracker, and I suspect has a page on MediaWiki.org, too...but I'll check on that last point.
Rob Church
Rob Church wrote:
On 30/03/07, Jeff V. Merkey jmerkey@wolfmountaingroup.com wrote:
This should be documented to avoid confusing people in the future.
I feel obliged to point out that this distinction is plastered all over the mailing list and bug tracker, and I suspect has a page on MediaWiki.org, too...but I'll check on that last point.
It's ok Rob,
A lot of "how to setup MediaWiki with Wikipedia Dumps" is not documented throroughly until I undertook the task, but its ok. Hopefully, when I get done with it, there will be enough material on meta to pull together a decent Wikibook on how to do all this.
:-)
Jeff
Rob Church
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org